In Linear regression we build a model and use the best fit line to predict the value of a variable using the value of another variable and also find the relationship between variables by using the formula y=β0+β1x+ε where x is the independent variable, y is the dependent variable, ε is the error term. The variable we predict is called the dependent variable and the variable we use to predict is called the independent variable. As the data doesn’t stand in a straight line and is not normally distributed we have errors which is the vertical distance between the data point and the regression line so we include the error in the formula. We use the least square method to minimize the errors.
We explored the CDC diabetes data which has 3 variables and they are diabetes, obesity, and inactivity. To build a model that truly fits the data and is reliable we need to find the relationship between variables using linear regression and use R square method to evaluate the model.
Also in today class got to know about what is a residual and learned a new topic heteroscedasticity which I will try to learn more about it and get to know better by the next class.