4th October 2023

In this project, we began by preprocessing and transforming the data obtained from the Centers for Disease Control and Prevention (CDC) on U.S. county rates of diabetes, obesity, and inactivity which is lack of physical activity. Following this,  we seamlessly merged the datasets based on FIPS code and year, creating a consolidated dataset for comprehensive analysis. My primary focus was on exploring the relationship between the percentage of diabetic individuals and the percentages of obesity and physical inactivity. Employing linear regression, we delineated a model to predict the percentage of diabetic individuals based on the percentages of obesity and inactivity. The training of the model involved splitting the data into training and testing sets, with subsequent predictions made and evaluated on the test set. Visualizing the correlation between these variables was a pivotal step, and we used a three-dimensional scatter plot to provide a comprehensive overview of their relationships.

Leave a Reply

Your email address will not be published. Required fields are marked *