Today I tried to analyze the data from the CDC (Centers for Disease Control) datasets. The datasets contains information about diabetes, obesity, and inactivity rates in different counties of various states in the year 2018.
We can perform linear regression method to understand the data for this project. By using this method we can understand how the variables diabetes, obesity, and inactivity are affected by the other variables such as FIPS (Federal Information Processing Standards), County and State. We can even know the correlation and differences between them. Here the diabetes, obesity, and inactivity are dependent variables and FIPS, county and state are independent variables. Here we also perform some statistical analysis like mean, median, maximum, minimum rates and also find standard deviation of the data which helps in better understanding of data. We can also visualize the data as a histogram to understand its distribution and also helps us to observe the outliers. The outliers of the data can also be detected using the best fit line of linear regression method. Outliers are the data points that deviate far away from the predicted data point.