Appendix C — Assignment C
Instructions
You may talk to a friend, discuss the questions and potential directions for solving them. However, you need to write your own solutions and code separately, and not as a group activity.
Make R code chunks to insert code and type your answer outside the code chunks, in the template provided. Ensure that the solution is written neatly enough to understand and grade.
Quarto-render the file as HTML to submit. For theoretical questions, you can either type the answer within the template and include the solutions in this file, or write the solution on paper, scan and submit separately.
The assignment is worth 100 points, and is due on 27th May 2025 at 11:59 pm.
Use a significance level \(\alpha = 5\%\) in all hypothesis tests, unless otherwise specified.
Five points are properly formatting the assignment. The breakdown is as follows:
- There aren’t excessively long outputs of extraneous information (e.g. no printouts of entire data frames without good reason, there aren’t long printouts of which iteration a loop is on, there aren’t long sections of commented-out code, etc.). There is no piece of unnecessary / redundant code, and no unnecessary / redundant text (2 pt)
- Final answers of each question are written clearly (1 pt).
- The proofs are legible, and clearly written with reasoning provided for every step. They are easy to follow and understand (2 pt)
10 points will be deducted in case the provided template is not used (for coding / text-answer questions), and/or the template is not rendered using Quarto markdown.
For questions involving derivations (Q1, Q4(d) in this assignment), you are allowed to do them on paper, scan and upload separately. However, you are welcome to type the derivations in this template.
To maintain family-wise error-rate, use Tukey’s correction, wherever needed. This is to ensure consistency of solutions.
Q1 Robustness to interaction effect
Consider the randomized complete block design, with no replicates. Suppose that there is a significant interaction between the treatments and the blocks. However, if no interaction is assumed between the treatments and the blocks, will the hypothesis test of testing if all treatments are the same hold?
Answer this question by considering 2 different cases:
Case 1: Fixed treatment effect, fixed block effect
Case 2: Fixed treatment effect, random block effect
Derive \(E(MSTr)\), and \(E(MSE)\) in both the cases to argue whether the hypothesis test is robust to the assumption of no interaction in the respective case.
(10 points)
Q2) RCBD (without replicates)
During the Covid-19 pandemic, several schools changed their mode of instruction from in-person to virtual or hybrid. As a result, educators decided to test the effect of virtual / hybrid / in-person education on the ACT scores of high school students.
The file act_scores1.csv
consists of ACT scores of high school students. The average ACT scores are provided for a set of 5 schools randomly selected from all US schools. For each school, the average ACT score is provided for students receiving in-person, virtual, and hybrid education respectively.
Here the teaching mode (i.e., in-person, virtual, or hybrid) and school are the factors effecting ACT score.
2a) Model
Write the model equation with ACT score as the response, and mean ACT score (population mean), teaching mode effects, school effects, and random error, as the independent variables.
(1 point)
2b) Fixed/random effects
Identify the teaching mode effect, and school effect as fixed or random. Justify your answer.
(2 points)
2c) School effect
Find the variance estimate of the school effect.
(2 points)
2d) Teaching mode effect
Find the variance of the estimate of the difference in any pair of teaching mode effects.
(2 points)
2e) Precision
Use the result of the previous question to obtain the width of the confidence interval of the difference in the estimates of any pair of teaching mode effects.
(2 points)
2f) Best teaching mode
Which teaching mode is the best for students (consider uncertainties)?
(1 point)
Q3) RCBD (with replicates)
Schools may be interacting with the teaching mode to effect the ACT scores. The file act_scores2.csv
consists of replicates of average ACT scores for a given school and teaching mode.
Note: These average ACT scores may be obtained at different period of times for a given school and teaching mode. However, we are not considering the time period as another potential factor effecting the ACT score for simplicity.
3a) Interaction effect
Should the interaction effect of teaching mode and school on the average ACT score be considered as fixed or random?
(1 point)
3b) Expectation
Write the expression for the expected mean teaching-mode sum-of-squares, the expected mean school sum-of-squares, and the expected interaction sum-of-squares. You do not need to show the derivation.
(3 points)
3c) Teaching mode, and School effect
Is there a statistically significant variation in school effects? Are all the teaching mode effects the same?
Hint: Conduct appropriate hypothesis tests, you will not get the result directly from the ANOVA table. Find the \(p\)-value for each test.
(4 points)
3d) Best teaching mode
Which teaching mode is the best for students (consider uncertainties)?
(2 points)
3e) Effects variance
Estimate the variance of the school effect, and the variance of the interaction effect.
(4 points)
3f) Teaching mode effect
Find the variance of the estimate of the difference in any pair of teaching mode effects.
(2 points)
3g) Precision
Use the result of the previous question to obtain the width of the confidence interval of the difference in the estimates of any pair of teaching mode effects. Did it increase or decrease as compared to that in case of RBCD without replicates [2(e)]? Why?
(4 points)
3h) Chance
What is the chance that a school with in-person education has a worse average ACT score than a school with virtual education?
Consider the degrees of freedom of the \(t\)-distribution as the minimum of the degrees of freedom of the terms contributing to the variance estimate in the denominator of the \(t\)-statistic.
(4 points)
Q4) Three-way ANOVA
Based on exploratory research, it was found that the income status of a student’s family is likely to effect the ACT score, and it may also interact with the teaching mode and school in effecting the ACT score. To analyze these effects, the average ACT score was collected for students of a given income status, school, and being taught with a particular teaching mode.
The file act_scores3.csv
consists of average ACT score for a given school, teaching mode, and income status of students. There are no replicates, i.e., there is only one observation for a given school, teaching mode, and income status combination.
4a) Model equation
Write the model equation.
(1 point)
4b) Two-factor interactions
Given that there are no replicates, will be possible to estimate two-factor interactions? Will it be possible to estimate three-factor interactions? Justify your answers.
(2 + 2 points)
4c) Interaction effect
Among the three two-factor interaction effects, which interaction effects are fixed, and which are random?
(3 points)
4d) Expectation
Derive the expressions for the:
Expected mean teaching-mode sum-of-squares [\(E(MS_{TM})\)],
Expected mean school sum-of-squares [\(E(MS_{School})\)],
Expected mean income sum-of-squares [\(E(MS_{income})\)],
Expected mean sum-of-squares of interaction between teaching mode and school [\(E(MS_{TM-School})\)],
Expected mean sum-of-squares of interaction between teaching mode and income status [\(E(MS_{TM-income})\)],
Expected mean sum-of-squares of interaction between school and income status [\(E(MS_{School-income})\)]
As the derivations will be similar, you are allowed to skip steps, and write expressions directly wherever you can.
(4 points for writing the model equation, and the equations for the relevant means (\(y_{ijk}, \bar{y}_{i..}, \bar{y}_{.j.}, \bar{y}_{..k}, \bar{y}_{ij.}, \bar{y}_{i.k}, \bar{y}_{.jk}, \bar{y}_{...}\)) - a total of 8 equations)
(1 point for writing the estimates of the main effects)
(3 points for writing the estimates of the interaction effects)
(6 \(\times\) 2 = 12 points for taking expectations and writing the derivations)
4e) Teaching mode, and School effect
Which two-factor interaction effects, and which main-effects are statistically significant?
Hint:
Conduct hypothesis tests, you will not get the result directly from the ANOVA table.
For testing the statistical significance of school effects, the denominator of the \(F\) statistic will be a linear combination of 3 mean sum-of-squares terms. Use the minimum degrees of freedom of the 3 mean sum-of-squares as the degrees of freedom of the denominator.
(6 points)
4f) Effects variance
Estimate the variance of the school effect, and the variance of the relevant interaction effects.
(3 points)
4g) Teaching mode effect
Find the variance of the estimates of the difference in any pair of teaching mode effects.
(2 points)
4h) Precision
Use the result of the previous question to obtain the width of the confidence interval of the difference in the estimates of any pair of teaching mode effects.
(2 points)
4i) Change in precision
Has the width of the confidence interval obtained in the previous question reduced as compared to that in 3(h)? If yes, then why? Given the same number of data points in Qs 3 and 4, why will the width reduce in 4(h) as compared to that in 3(g)?
(4 points)
4j) Best teaching method based on income
Which is the best teaching method based on the income status of the student, i.e., which is the best teaching method for:
High-income students,
Medium-income students,
Low-income students.
Include uncertainties into account, i.e., there may be multiple best teaching methods for students of a given income status.
(3 \(\times\) 2 = 6 points)