STAT 200 Guided Exercise 8 ANSWERS For On- Line Students, be sure to: Key Topics Submit your answers in a Word file to Sakai at the same place you downloaded the file Remember you can paste any Excel or JMP output into a Word File (use Paste Special for best results). Put your name and the Assignment # on the file name: e.g. Ilvento Guided8.doc Difference of Two Proportions Correlation and Covariance Bivariate Regression Answer as completely as you can and show your work. Then upload the file via Sakai to get credit. Problem 1. Two Sample Proportion Problem. Support for the President can be a tricky thing. The overall support can be thought of as a weighted average of the support by Republicans, Democrats, and Independents. A recent CBS News/New York Times Poll. Oct. 5-8, 2006. N=983 adults who were likely votes nationwide. MoE ± 3 (for all adults). The data are below. Do you approve or disapprove of the way George W. Bush is handling his job as president? We will focus on the proportion who approve. We expect large differences between Republicans and Democrats in support for Bush. Suppose Republican strategists are most concerned that the gap between Democratic likely voters and Independent likely voters is less than 22 points. We are moving toward an alternative hypothesis test whether the difference in proportions between Independent support for the President and Democratic support for the President is less than 22 points. a. Calculate the approval proportion for each group. Democrats p d = 31/314 =.0987 q d =.9013 Independents p i =104/384 =.2708 q i =.7292 b. Calculate the Standard Error for this Problem - it does not assume that the two proportions are equal since we are interested if the difference is less than 22. p * q p * q Approve Disapprove Unsure Republican 202 66 17 285 Democrat 31 274 9 314 Independent 104 253 27 384 337 593 53 983.0987 *.9013.2708*.7292 1 1 2 2 σ ( p 1 p2 ) = + n n σ ( p. 0282 1 p = + = 2 ) 1 2 314 384 1
c. Conduct the hypothesis test that the difference in proportions for approval of the President between Democrats and Independents is less than.22. Express this as pi pd =.22 for the null and alternative hypotheses. Use an alpha level of.05. Null Hypothesis pi - pd =.22 Alternative Hypothesis pi - pd <.22 one- tailed test Assumptions of Test Large sample difference of proportions test; assume normal Test Statistic (z*) z* = (.2708-.0987 -.22)/.0282 Rejection Region z.05. = - 1.645 Calculation of Test z* = - 1.696 Statistics Comparison of Test z* < z.05-1.696 < - 1.645 Statistics with Rejection Reject Ho: pi - pd =.22 Region What is the p- value for the test? p- value for test =.045 2. Bivariate regression and correlation. On the web site is an Excel file called TAMPALMS.xls. The data are from a study of residential property sales to appraisals. We will focus on sale price as the dependent variable (Y), and appraised price as the independent variable (X). Other variables are: PRICE The actual sale price of the property in $1,000s APRAISED The total appraised value in $1,000s S2ARATIO The ratio of the Sale Price to the Appraised Value APPLAND The appraised value of the land in $1,000s APPIMPROV The appraised value of improvements to the property in $1,000s a. First, let s look at the descriptive statistics on Sale Price and Appraised Value. Using all your knowledge from the course, I want you to briefly describe each variable (ideas: compare the mean and median, the range, and so forth). The mean level for PRICE is 236.56 ($236,560). The median is considerably below that at 190 ($190,000), so there are extreme values that skewed the data. From the Histogram you can see substantial right skew. The sale prices of the houses ranged from 59.00 ($59,000) to 957.50 ($957,500). However, the Interquartile range was $153,250 to $287,250, which represented the middle 50% of the observations. 2
The results for the APPAISED is similar, but the mean and median is lower - the appraisers underestimated the value of the homes. The mean level for APPRAISED is 201.75 ($201,750). The median is considerably below that at 167.07 ($167,070), so there are extreme values that skewed the data. From the Histogram you can see substantial right skew. The appraised values of the houses ranged from 64.98 ($64,980) to 929.40 ($929,400). However, the Interquartile range was $126,220 to $249,180, which represented the middle 50% of the observations. There is considerable variation in the data. The CV s are 58.95 and 62.89 respectively. b. I used JMP to create a correlation matrix of all the variables in the data set (including S to App Ratio, App Land and App Improve). Correlations PRICE APPRAISED S2ARATIO APPLAND APPIMPROV PRICE 1.000 0.972-0.012 0.889 0.973 APPRAISED 0.972 1.000-0.214 0.936 0.988 S2ARATIO - 0.012-0.214 1.000-0.276-0.171 APPLAND 0.889 0.936-0.276 1.000 0.879 APPIMPROV 0.973 0.988-0.171 0.879 1.000 a. What is the correlation between Sale Price and App Value? Briefly describe what this correlation means. r =.972. There is a strong positive relationship. As the sale price goes up, so does the appraised value b. What is the correlation App Land and S to App Ratio? Briefly describe what this means. r = -.276. There is a weak negative relationship. As the Appraised value for land goes up, the sale price to appraised value goes down c. I used JMP to make a scatter- gram of Sale Price (Y) to Appraised Value (X). Briefly describe the relationship. d. There is a strong, positive linear positive relationship between the PRICE and APRAISED. 3
e. I used JMP to generate the regression of Sale Price on Appraised Value. a) How many observations are in the data? 92 observations b) What is R- square for this model? Briefly explain what it means. R- squared =.945 94.5% of the variability in sale price is explained by knowing the appraised value. c) Prove that the bivariate regression R- square is the same as the correlation coefficient squared. r2 =.9722 =.945 d) What is the slope coefficient for App Value? Briefly explain what it means. Slope = 1.0687 A unit increase in appraised value results in a 1.07 increase in the sales price e) What is the intercept coefficient for this model? intercept = 20.9419 f. Solve the regression equation for a property with an appraised value of $150k (by this I mean use the coefficients from the regression output and solve the equation to come up with a predicted value for the Sale Price. Est Y = 20.94 + 150*1.07 EstY = 20.94 + 160.50 Est Y = 181.44 or $181,440 4
Problem 3. Bivariate regression and correlation. Model of Average Annual Precipitation An article in Geography (July 1980) used regression to predict average annual rainfall levels in California. Data on the following variables were collected for 30 meteorological weather stations scattered throughout California. For the group work we will focus on a bi- variate regression of Annual Percip on Latitude. You will have the option of examining all the variables for this problem for the last assignment Annual Precip DEPENDENT VARIABLE: Annual Precipitation in inches Altitude Latitude Distance Facing The altitude of the station in feet The latitude of the station in degrees Distance from the coast in miles I made this into a dummy variable. Stations on the Westward facing slopes of the California mountains were coded as 1, whereas stations on the leeward side were coded as 0 a. The following are the descriptive statistics on each variable. Briefly describe Annual Precipitation using the mean, median, std deviation and so Annual Precip Altitude Latitude Distance Facing forth. Mean 19.807 1375.300 37.027 78.700 0.433 Standard Error 3.035 382.812 0.487 12.653 0.092 Median 15.345 290.000 36.700 74.500 0.000 Mode 18.200 4152.000 33.800 1.000 0.000 Standard Deviation 16.621 2096.746 2.667 69.301 0.504 Sample Variance 276.264 4396344.631 7.110 4802.631 0.254 Kurtosis 3.051 0.777-1.091-1.192-2.062 Skewness 1.700 1.461 0.228 0.417 0.283 Range 73.210 6930.000 9.200 197.000 1.000 Minimum 1.660-178.000 32.700 1.000 0.000 Maximum 74.870 6752.000 41.900 198.000 1.000 Sum 594.220 41259.000 1110.800 2361.000 13.000 Count 30 30 30 30 30 The mean level of annual precipitation at the 20 weather stations is 19.81 inches per year. This value is larger than the median (15.35) and is pulled by large values in the data (the extreme is 74.87 inches and the range is 73.21). As a result the variance is relatively large and the CV is 83.90%. B. What is the interpretation for the mean for FACING? Since Facing is a dummy variable, the mean is the proportion that have the value 1, stations on the westward facing slopes. 43.3% of the stations face the west coast. 5
The following are the covariance matrix and the correlation matrix on the variables. Covariance Matrix Annual Precip Altitude Latitude Distance Face Annual Precip 276.26392 Altitude 10525.060 4396344.6 Latitude 25.56138 1291.5228 7.11030 Distance - 242.0067 83407.541 29.81862 4802.6310 Face 5.01016 53.24483-0.01540-17.10690 0.25402 Correlations Annual Precip Altitude Latitude Distance Face Annual Precip 1.0000 Altitude 0.3020 1.0000 Latitude 0.5767 0.2310 1.0000 Distance - 0.2101 0.5740 0.1614 1.0000 Face 0.5981 0.0504-0.0115-0.4898 1.0000 C. Confirm For Annual Precip and Latitude the following: The diagonal in the Covariance Matrix is the Variance. Variance for Annual Percip 276.264 Covariance of Annual Percip with itself 276.26392 or 276.264 Variance for Latitude 7.110 Covariance of Lattitude with itself 7.11030 or 7.110 D. Briefly describe the correlation between Annual Precip and Latitude. The Correlation between Annual Precip and Latitude is r =.577, which is moderate and positive. As the latitude increases, so does the annual precipitation E. Facing is a dummy variable. Stations on the Westward facing slopes of the California mountains were coded as 1, whereas stations on the leeward side were coded as 0. Interpret the correlation between Annual Precip and Facing. The correlation =.598 is moderate, positive Stations on the west side of the mountains tended to have higher annual rainfall 6
F. Now we will shift to the bivariate regression of Annual Precip on Latitude. Verify that R 2 in a bivariate regression is simply the correlation (r) squared. Interpret R 2 for this model. r =.577 r 2 =.577 2 =.3329 R 2 =.333 Solve the model for a Latitude of 37 degrees. Est Y = - 113.303 + 3.595(37) Est Y = - 113.303 + 133.01 Est Y = 19.71 inches 7