Tuesday, December 10, 2019

Statistical Analysis and Statistical Inference

Questions: 1. Statistical Inference Your relative or friend asks you if used car prices are generally higher for cars with automatic transmission than those with manual. Use Price and Transmission data (where A = Automatic transmission, M = Manual transmission) for all cars in your sample and an appropriate statistical inference technique to answer the following question On average is the price of cars, of the specified make and model for sale in the specified state, with automatic transmission higher than those with manual transmission? 2.Simple Linear Regression model Your friend or relative asks you how the value of the car that they decide to purchase will depreciate in value. Use Age (independent variable) and Price (dependent variable) to model the relationship between age of a used car and its price. Then to provide an answer on how how the value of the car that your friend or relative decides to purchase will depreciate in value explore this relationship by a. Plotting the data with a scatter plot. b. Calculating the least squares regression line, correlation coefficient and coefficient of determination. 3. Multiple Linear Regression model Your relative or friend now wants to know what other factors may have an influence on price. To explore this add Kilometres and Transmission as additional independent variables to the regression model developed in Question 2. Then explore the relationship between these variables by a. Calculating the multiple regression equation, multiple correlation coefficient, and coefficient of multiple determination b. Using appropriate tests to determine which independent variables make a significant contribution to the regression model. Hence, determine which independent variables to include in your model. Answers: (1). To answer this we hypothesis tests. We would like to test if the average price of automatic transmission cars is higher than average price for a manual transmission car. Since the number of observations is 91 and 34 for both categories we are safe to use z test for difference in means. We lay out the test as follows: Ho: A = M H1: A M The confidence level we choose is 95% so that type 1 error is 0.05. The critical value is1.96 and we use a 1 tail test- left tail test As shown in the table below the test value is -2.3. this is more than the value of 1.96 in absolute terms, which implies we can not accept null hypothesis. There is statistical support for the observation that manual transmission type cars are higher priced than automatic transmission ones. z-Test: Two Sample for Means Variable 1 Variable 2 Mean 22180.56 27243.82 Known Variance 1.28E+08 1.16E+08 Observations 91 34 Hypothesized Mean Difference 0 z -2.30662 P(Z=z) one-tail 0.010538 z Critical one-tail 1.644854 P(Z=z) two-tail 0.021076 z Critical two-tail 1.959964 (2). The scatter plot is shown below: The regression line is price = 30288 -1099.71*age This implies a negaive relation between age and price. As age rises price falls. The correlation coefficient is -0.38866 The value of coefficient of determination is .151056. This means that only 15.1056% of the variation in price is explained by variation in age. This is very low, and signals the need for more explanatory variables. The coefficient of age is -1099.71. this means that when age rises by 1 year the price of an average car falls by 1099.71. so the value will depreciate by $1099.71 each year. The coefficient of age is significant as shown by p value of almost zero. This is less than 0.05 using a 95% level of confidence. As the scatterplot shows an exponential trend gives a better fit with R^2 = 0.19, compared to a linear trend that we have used. Even a logarathimic trend line gives R^2 = 0.18, which is higher than linear. This clearly shows that linear trend is not useful when calculating depreciation of the car with price as the only explanatory variable. (3). The regression line is price = 36094.287 -343.298*age -0.129*kilometres +4555.587*transmission This implies a negaive relation between age and price. As age rises price falls. The value of R^2 is .72. This means that 72.02% of the variation in price is explained by variation in age, transmission and kilometers. This is a good value, and substantiates the recommendation of more variables. The coefficient of age is -343.298. This means that when age rises by 1 year the price of an average car falls by $343.298, assuming other variables remain unchanged. The coefficient of age is not significant as shown by p value of 0.09. This is more than 0.05 using a 95% level of confidence. The coefficient of kilometers is -0.129. This means that when a car runs for 100 more kilometers its price falls by .129*100 =$12.9, assuming other variables remains unchanged. The coefficient of kilometers is significant as shown by p value of almost 0. This is less than 0.05 using a 95% level of confidence. The coefficient of transmission tells us the effect of type of engine on price of a car. Using a dummy variable which = 0 for automatic transmission and 0 otherwise, we have shown that automatic transmission cars are higher priced by 4555.587. A manual transmission car is priced lower by $4555.587 as compared to an automatic transmission car. The p value of coefficient of transmission is almost zero, which implies that it is significant. SUMMARY OUTPUT OF Q2 Regression Statistics Multiple R 0.38866 R Square 0.151056 #NAME? Adjusted R Square 0.144154 0.38866 Standard Error 10494.74 Observations 125 ANOVA df SS MS F Significance F Regression 1 2.41E+09 2.41E+09 21.88592 7.5E-06 Residual 123 1.35E+10 1.1E+08 Total 124 1.6E+10 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 30288 1717.777 17.63209 1.81E-35 26887.76 33688.23 26887.76 33688.23 Age -1099.71 235.0694 -4.67824 7.5E-06 -1565.02 -634.405 -1565.02 -634.405 RESULTS FOR q3 Regression Statistics Multiple R 0.848704195 R Square 0.720298811 Adjusted R Square 0.713364071 Standard Error 6073.500645 Observations 125 ANOVA df SS MS F Significance F Regression 3 11494283917 3831427972 103.8681752 2.51847E-33 Residual 121 4463376620 36887410.08 Total 124 15957660536 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 36094.28785 1071.105465 33.69816422 2.4494E-63 33973.75209 38214.824 Age -343.2984356 201.6285883 -1.70262778 0.091205328 -742.4754032 55.878532 Kilometres -0.129153182 0.010444143 -12.36608723 3.27904E-23 -0.149830118 -0.1084762 Transmission 4555.587097 1604.267194 2.839668551 0.005299244 1379.517081 7731.6571

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.