Tuesday, April 28, 2015

Assignment 5 - Regression Analysis

Part I

Regression Analysis of Crime Rate and Free Lunches (fig. 1)
Crime rate is the dependent value and free lunches is the independent value. The null hypothesis is that there is not a linear relationship between free lunches and crime rate in the given areas. The alternative hypothesis is that there is a linear relationship between free lunches and crime rate in the given areas. We fail to reject the null hypothesis because there is a small relationship between free lunches and crime rate since the confidence value is under .05. The regression equation is y=1.685x+21.819. The percentage of persons getting a free lunch with a crime rate of 79.7 would be 34.35% according to this model. I am not very confident in this result due to the model's weak correlation and the r-squared value is low.

Part II

Introduction

For this next exercise we are to look at data from the UW system and find which variables best describe why students choose the school they do based on what county they are from. Some of the variables we will be looking at include number of people within the county that have some college, 2 years of college, college degree, graduate/professional degree, population, population 18-24, median household income. The focus will be on two specific schools that I picked, UW-Eau Claire and UW-Madison and focusing on the variables of percentage BS degree, median household income, and population normalized by distance from school. We will exclude any students that come from out-of-state in this analysis.

Methods

The first thing to do is to perform regression analysis on each of the three variables for both schools. For each regession analysis output we can tell if the variable is significant if the significant value is below .05. First I state the null and alternative hypotheses for both schools for each of the three variables.

Eau Claire student attendance and Population normalized by distance for Eau Claire.
  • The null hypothesis is that there is no significant relationship between Eau Claire student attendance and Population normalized by distance for Eau Claire. The alternative hypothesis is that there is a significant relationship between Eau Claire student attendance and Population normalized by distance for Eau Claire.
Regression Analysis of Eau Claire student attendance and Population normalized by distance from Eau Claire which is showing that there is a significant relationship because of the low significance value of .000 which is under the critical value. (fig. 2)
We reject the null hypothesis for the variable of population normalized by distance from Eau Claire because the regression analysis shows a value below the critical value.

Eau Claire student attendance and percentage of BS degrees.
  • The null hypothesis is that there is no significant relationship between Eau Claire student attendance and percentage of BS degrees. The alternative hypothesis is that there is a significant relationship between Eau Claire student attendance and percentage of BS degrees.
Regression Analysis of Eau Claire student attendance and percentage of BS degrees which is showing that there is a significant relationship because of the low significance value of .003 which is under the critical value. (fig. 3)
We reject the null hypothesis for the variable of percentage of BS degrees because the regression analysis shows a value below the critical value.

Eau Claire student attendance and median household income.
  • The null hypothesis is that there is no significant relationship between Eau Claire student attendance and median household income. The alternative hypothesis is that there is a significant relationship between Eau Claire student attendance and median household income.
Regression Analysis of Eau Claire student attendance and median household income which is showing that there is no significant relationship due to the high significance value of 0.104 which is above the critical value. (fig. 4)
We fail to reject the null hypothesis for the variable of mean household income because the regression analysis shows a value above the critical value.

Madison student attendance and Population normalized by distance for Madison. 
  • The null hypothesis is that there is no significant relationship between Madison student attendance and Population normalized by distance for Madison. The alternative hypothesis is that there is a significant relationship between Madison student attendance and Population normalized by distance for Madison.
Regression Analysis of Madison student attendance and Population normalized by distance from Madison which is showing that there is a significant relationship because of the low significance value of .000 which is under the critical value. (fig. 5)
We reject the null hypothesis for the variable of population normalized by distance from Madison because the regression analysis shows a value below the critical value.

Madison student attendance and percentage of BS degrees.
  • The null hypothesis is that there is no significant relationship between Madison student attendance and percentage of BS degrees. The alternative hypothesis is that there is a significant relationship between Madison student attendance and percentage of BS degrees.
Regression Analysis of Madison student attendance and percentage of BS degrees which is showing that there is a significant relationship because of the low significance value of .000 which is under the critical value. (fig. 6)
We reject the null hypothesis for the variable of percentage of BS degrees because the regression analysis shows a value below the critical value.

Madison student attendance and median household income.
  • The null hypothesis is that there is no significant relationship between Madison student attendance and median household income. The alternative hypothesis is that there is a significant relationship between Madison student attendance and median household income.
Regression Analysis of Madison student attendance and median household income which is showing that there is a significant relationship due to the low significance value of 0.001 which is below the critical value. (fig. 7)
We reject the null hypothesis for the variable of median household income because the regression analysis shows a value below the critical value.

Results

I will map the residuals for just those variables that were found to be significant and the above regression analyses show that all but one of the variables was found to be significant. To get the residuals I just needed to save the standardized residuals before performing the regression analysis and export the tables of residuals to ArcGIS.

Eau Claire student attendance and Population normalized by distance for Eau Claire. R2=.753
This variable had a significance value of .000 and shows a pattern of counties with large population centers having large attendance at UW-Eau Claire. (fig. 8)
Eau Claire student attendance and percentage of BS degrees. R2=.121
This variable had a significance value of .003 and shows a pattern that counties with high percentages of BS degrees have large attendance at UW-Eau Claire. (fig. 9)
Madison student attendance and Population normalized by distance for Madison. R2=.853
This variable had a significance value of .000 and shows a pattern of the largely populated Milwaukee area as having large attendance at UW-Madison. (fig. 10)
Madison student attendance and percentage of BS degrees. R2=.154
This variable had a significance value of .000 and shows a pattern that counties with high percentages of BS degrees have large attendance at UW-Madison. (fig. 11)
Madison student attendance and median household income. R2=.363
This variable had a significance value of .001 and shows a pattern of counties that have high median household income such as the Milwaukee and Madison areas have high attendance at UW-Madison. (fig. 12)

Discussion & Conclusion

The results from the residuals for Eau Claire show that the two variables of population normalized by distance to Eau Claire and percentage of BS degrees have a correlation with the students that attend UW-Eau Claire in each county. This means that students that go to UW-Eau Claire are more likely to come from populated areas and areas that have a lot of people with BS degrees. UW-Eau Claire students likely come from populated and educated areas across the state. Median household income does not have a significant relationship with student attendance at UW-Eau Claire meaning that income does not significantly influence students to attend UW-Eau Claire. The R2 value for the percentage of BS degrees is low however and this means that the BS degree variable has a weak relationship with student attendance at UW-Eau Claire. The population normalized by distance to Eau Claire variable has a high R2 and a very strong relationship with student attendance at UW-Eau Claire meaning this is the most influential predictor I looked at for why students choose to go to UW-Eau Claire.

The results from the residuals for Madison shows that all three of the variables of population normalized by distance to Eau Claire, percentage of BS degrees and median household income all have a correlation with the students that attend UW-Madison in each county. This means that students that go to UW-Madison are more likely to come from populated areas, areas that have a lot of people with BS degrees and areas with high median household income. UW-Madison students likely come from populated, educated and rich areas across the state. The R2 value for percentage of BS degrees and median household income have low R2 values meaning that they have weak relationships with student attendance at UW-Madison. The population normalized by distance to Madison variable has a very high R2 and a very strong relationship with student attendance at UW-Madison meaning this is the most influential predictor I looked at for why students choose to go to UW-Madison.

No comments:

Post a Comment