Statistical analysis using R

Final Exam DPEE

Note:

· For demonstrating conceptual understanding, you are required to work on the model that is easier to handle or compute, not necessarily the more suitable (or more complicated) model for the dataset. Follow the question description.

· You don’t need to check the assumption of a model unless the question asks for it. For example, if the question asks you to make prediction based on a model, you don’t need to check the assumption for the model before making prediction.

· For any of the testing (hypothesis test) problem, define Ho/Ha, compute the test statistic, report the exact p value, and state the conclusion. The default alpha value is 5%, unless specify.

· Elaborate your reasoning clearly and show relevant plots, R results, and tables to support your opinion in each step and conclusion.

· Submit the Rmd file and the corresponding pdf file knitted from it, along with your answer, this format is similar to your homework.

· The data is real, just like the project you are working on. Hence it is possible that even after the remedial method has been done, the model is still not perfect. When this happens, evaluation will be based on the level you execute the methods covered in Stat512 to improve the model. Don’t worry if your model is not perfect, try your best to demonstrate the skill set you learn in this class.

Study the data with a linear analysis and complete the problems. The data set, dataDPEE.csv has 3 continues predictors and two categorical predictors.

Problem 1. Consider only the first order model with X1, X2 and X3, perform the following hypothesis.

a. (10) whether X1 can be dropped from the full model.

> dpeemod <- lm(y ~ x1 + x2 + x3)

> plot(dpee)

> summary(dpeemod)

Call:

lm(formula = y ~ x1 + x2 + x3)

Residuals:

Min 1Q Median 3Q Max

-15.948 -11.640 -1.480 6.402 31.650

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 90.79017 22.07408 4.113 0.00106 **

x1 -0.68731 0.47959 -1.433 0.17377

x2 -0.47047 0.24227 -1.942 0.07254 .

x3 -0.06845 0.46523 -0.147 0.88513

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 14.75 on 14 degrees of freedom

Multiple R-squared: 0.6257, Adjusted R-squared: 0.5455

F-statistic: 7.8 on 3 and 14 DF, p-value: 0.002652

Ho: X1 is significant and cannot be dropped from the model.

Ha: X1 can be dropped from the model.

Using the model Y~X1 + X2 + X3,

b. (10) whether X1 can be dropped from the model containing only X1 and X2.

Problem 2 (10) Consider the first order model with X1, X2 and X3, simultaneously estimate parameters (beta1, beta2 and beta3) with a confidence level of 75%.

Problem 3 (20) Perform appropriate analysis to diagnose the potential issues with the first order full mode with X1 X2 and X3, improve the model as much as possible with the methods covered in Stat512. You should also consider the assumption checking for your revised model.

Problem 4

a. (10) Compute AIC, BIC, and PRESSP to compare the following two models.

· The model on the first order terms for X1 and X2 and the interaction term X1X2.

· The model on the first order terms for X1, X2 and X3

Do they all yield the same better model? If not, explain.

b. (10) Select the model that you think is better to predict the mean response value, then predict the mean response for the following case, at a confident level of 99%.

Problem 5

X4 and X5 are two factors on Y.

a. (10) Is there any significant interaction effect between X4 and X5 on Y?

b. (10) With the ANOVA method, compute the 95% confidence interval for the following difference, respectively:

D1= The difference in the mean of Y when (X4=high, X5=less) and (X4=high, X5=more)

D2= The difference in the mean of Y when (X4=low, X5=less) and (X4=low, X5=more)

c. (10) With the ANOVA method, compute the 95% confidence interval for

D1-D2

Where D1 and D2 are described in b.

How is your result related to a?