Thursday, February 26, 2015

Assignment 2 - Z-scores, Mean Center and Standard Disance

Introduction   

In this exercise we were asked to analyze tornado county data from Oklahoma and Kansas. We were given two sets of data, one of tornadoes from 1996-2006 and one of tornadoes from 2007-2012. We needed to see if there were patterns in the data and if there is an increase in tornado frequency between the two time periods. We are to approach this project as a independent researcher who needs to answer if these states should require more tornado shelters to be built due to increased danger. They want to make sure that it is worth the money it would take to build this shelters or if tornado frequencies are the same as they have already been.

We also need to look at where tornadoes are more likely to occur and if there is a pattern in the change between the two time periods. If there is a significant change in the occurrence and location of tornadoes between the late 1990s to early 2000s and the late 2000s to early 2010s we need to suggest weather or not we believe a new requirement for tornado shelters should be implemented.

Methods

The first part of the exercise we looked at the mean center, weighted center, standard distance and weighted standard distance. First calculate the mean center for each data set. The mean center is the average point of all the points in the data set. In other words, you take the average x-value of all the points and the average y-value and plot those x and y-values to find the point that is the mean center. To do this I chose the mean center tool under the spatial statistics toolbox and set the input to tornadoes (1996-2006). This gives me an output of a point that is the mean center for that data set. Then, repeat the process with the data set of tornadoes (2007-2012). 

Next calculate the weighted mean centers for each of the data sets. This gives the mean center a weight based on the width of the tornadoes so that large tornadoes have a larger effect on the location of the mean center. To do this select the same mean center tool from the spatial statistics toolbox. Then, set the input to tornadoes (1996-2006) and selected the weighted field to tornado width (feet). Then, repeat the process with the data set of tornadoes (2007-2012).

The maps below show the mean centers for each data set as well as weighted mean centers. Then include the tornadoes with graduated symbology based on the width of the tornadoes. The mean center and weighted mean center shows us weather or not the tornadoes have been moving and changing patterns from the first time period to the second. As you can see the mean center has moved north and the weighted mean center has moved east. This tells us that more tornadoes have been occurring in the northeastern parts of Kansas in the recent years than before.


The mean center and weighted mean center for the tornadoes data set (1996-2006) as well as the data set with graduated symbology of the tornadoes' width in feet. (figure 1)

The mean center and weighted mean center for the tornadoes data set (2007-2012) as well as the data set with graduated symbology of the tornadoes' width in feet. (figure 2)

The mean centers and weighted mean centers for both of the tornadoes data sets (1996-2006 & 2007-2012) as well as the data sets with graduated symbology of the tornadoes' width in feet. (figure 3)

The second part of the exercise had us calculate the standard distance for each set of data points. The standard distance gives us the average distance from the mean center that the points are in the specified data set. To do this we needed to use the standard distance tool in the spatial statistics toolbox. I set the input to tornadoes (1996-2006) and kept the default settings for the rest. Then repeat this for the data set of tornadoes (2007-2012). 

Below are the maps of each data set with the weighted mean center and the standard distance. This will show us the average distance from the mean center has change between the two time periods. As you can see the standard distance has shifted northeast. This shows the same results as the mean center and weighted mean center shifts in the first part and that more tornadoes are occurring in the northeastern parts of Kansas.


The standard distance and weighted mean center for the tornadoes data set (1996-2006) as well as the data set with graduated symbology of the tornadoes' width in feet. (figure 4)

The standard distance and weighted mean center for the tornadoes data set (2007-2012) as well as the data set with graduated symbology of the tornadoes' width in feet. (figure 5)

The standard distance and weighted mean center for both of the tornadoes data sets (1996-2006, 2007-2012) as well as the data sets with graduated symbology of the tornadoes' width in feet. (figure 6)

The last part was focusing on z-scores. A z-score is the specific standard deviation that a particular point has in relation to the larger data set that that point is a part of. For example we looked at three specific counties to see what their z-score was. The counties were Russell County, KS, Caddo County, OK and Alfalfa County, OK. To do this we took a feature class of the counties from each state which already had a join with the tornado data set (tornadoes 2007-2012) and gave us the number of tornadoes that fell in each state under a specified field. Below is a map of the counties based the number of standard deviations each county was in relation to all of the counties in two states. 


The standard deviation based on the number of tornadoes from each county in relation to all of the counties in both states. (figure 7)

We then need to find the mean and standard deviation of the counties based on the tornado count field in the counties feature class. To do this we simply go to classify under the symbology tab and they have the statistics for the data set. The mean is 4 and the standard deviation is 4.3. Given these values we are able to calculate the z-scores for the counties mentioned above by using the equation: 


z = (x - m) / s 
where z is the z-score, x is the given county's tornado count, m is the mean and s is the standard deviation.

Z-Scores:
Russell County, KS: 4.88
Caddo County, OK: 2.09
Alfalfa County, OK: 0.23

Given the information and the calculations that we have made from the data sets of tornadoes in Kansas and Oklahoma we are able to also find the probability of the number of tornadoes in these states. If we wanted to know what number of tornadoes could happen in any given county in both states in a given year we could simply take the probability and find the z-score associated with that probability and work backwards to find the number of tornadoes that fits that z-score. 

For example if we wanted to how many tornadoes would take place in a county in Kansas or Oklahoma 70% of the time in a given year. We would find the z-score that falls under 0.70 probability, which is -0.52. Next we would just multiply this z-score with the standard deviation and subtract the mean: (-0.52 * 4.3) + 4 = 1.764. Therefore, 70% of the time there are at least 1.7 tornadoes in any given county during any given year in Kansas and Oklahoma. 

Another instance would be to find how many tornadoes would happen 20% of the time. The z-score for 0.20 probability is 0.84. Then we calculate the number of tornadoes: (0.84 * 4.3) + 4 = 7.612. There 20% of the time there is at least 7.6 tornadoes in any given county during any given year in Kansas and Oklahoma.

Results

From the analysis on the mean centers, standard distances and z-scores of the number of tornadoes in Kansas and Oklahoma there are many patterns to be found. The shift found between the year of 1996-2006 and 2007-2012 to the northeast can be linked to the increase in the number of tornadoes in the eastern and northern parts of Kansas as well as the decrease in tornadoes in the west parts of Kansas as well as the southern and western parts of Oklahoma (as seen in figure 1, 2 and 3). The standard distance maps also conveyed this pattern of the tornadoes moving to the northeast parts of Kansas (figures 3, 4 and 5).

The map of standard deviations in counties in Kansas and Oklahoma (figure 7) shows the large amounts of tornadoes in middle Kansas and middle Oklahoma. This map shows information that the mean center and standard distance maps can not show such as the frequency of tornadoes in counties and where tornadoes mostly occur. This is much more helpful in determining whether or not to build more shelters due to increases in tornadoes.

Conclusion

The data does show a shift in tornado locations between the two time periods however I do not believe this shows a increase in tornado frequency or a change in location. The tornadoes are spread out in a random plot across the two states and do not show a pattern that they are increasing in frequency or moving to a certain area of the states. 

Another factor in the change in frequency question is the time periods that were analyzed. The first data set takes place over a decade (1996-2006) while the second data set only takes place over five years (2007-2012), therefore it is difficult to tell whether or not the tornadoes frequency is due to the change in time or just the amount of time that each data set was collected.

I would recommend to these state legislators that there is not enough evidence to require the building of more tornado shelters based on the increase of tornadoes in the two states. There is some vital information gained from this exercise and that is that those counties that are in the 1.5 standard deviation category (figure 7) should look into increases their tornado shelters because these counties are the most likely to be hit by tornadoes.

No comments:

Post a Comment