Appendix A — Assignment A

Instructions

You may talk to a friend, discuss the questions and potential directions for solving them. However, you need to write your own solutions and code separately, and not as a group activity.
Make R code chunks to insert code and type your answer outside the code chunks, in the template provided. Ensure that the solution is written neatly enough to understand and grade.
Quarto-render the file as HTML to submit. For theoretical questions, you can either type the answer within the template and include the solutions in this file, or write the solution on paper, scan and submit separately.
The assignment is worth 100 points, and is due on 28th April 2025 at 11:59 pm.
Use a significance level \(\alpha = 5\%\) in all hypothesis tests, unless otherwise specified.
Five points are properly formatting the assignment. The breakdown is as follows:

There aren’t excessively long outputs of extraneous information (e.g. no printouts of entire data frames without good reason, there aren’t long printouts of which iteration a loop is on, there aren’t long sections of commented-out code, etc.). There is no piece of unnecessary / redundant code, and no unnecessary / redundant text (2 pt)
Final answers of each question are written clearly (1 pt).
The proofs are legible, and clearly written with reasoning provided for every step. They are easy to follow and understand (2 pt)

10 points will be deducted in case the provided template is not used (for coding / text-answer questions), and/or the template is not rendered using Quarto markdown.
For questions involving derivations (Q2, Q3 and Q5 in this assignment), you are allowed to do them on paper, scan and upload separately. However, you are welcome to type the derivations in this template.

Q1

WCAS (Weinberg College of Arts and Sciences) is considering replacing keyboards used in offices of all its departments with new keyboards of a different brand. Let the current keyboard be denoted as A, and the new one denoted as B, where A and B can be assumed to be the brands of the keyboards. However, before making the purchase, WCAS is conducting an experiment to test if the new keyboard will increase the typing efficiency.

For the experiments mentioned in all the questions below, assume that:

The manuscripts used in the replications of the experiments are not the same, but are of a fixed length and font size.
The effect of typist on the typing time can be considered as a random variable with constant variance, and thus can be combined with the random error due to unknown factors. Typists within the same department as as similar as typists across departments.
The effect of manuscript on the typing time can be considered as a random variable. However, manuscripts within the same department are similar, while manuscripts across departments are relatively less similar.
The typing time is modeled as:

\(y = \mu_{keyboard} + \beta_{script} + \epsilon\),

where the effect of the keyboards is fixed, and the effect of the script is a random variable, whose variance will be higher if scripts from across multiple departments are used as compared to the case if scripts from within the same department are used in the experiment. However, a given script will have the same effect on the typing time if used multiple times in an experiment.

Q1(a)

The first experiment is conducted as follows.

16 manuscripts are distributed randomly among 16 different typists - one manuscript to each typist. The manuscripts are chosen randomly from different departments of WCAS. 8 typists are randomly selected to use keyboard A and the rest use keyboard B.

The times taken (in minutes) by the typists for the 2 keyboards are:

Keyboard A	Keyboard B
8.73	10.77
11.31	8.33
9.15	11.87
13.78	8.31
11.58	7.28
9.14	4.99
11.05	12.61
9.49	8.81

Is there evidence to support the claim that the time taken by keyboard B is less than that by keyboard A?

State the null and alternate hypothesis, compute the test statistic, conduct the test, and state the conclusion.

(4 points)

a <- c(8.73, 11.31,  9.15, 13.78, 11.58, 9.14, 11.05,  9.49)
b <- c(10.77,  8.33, 11.87,  8.31,  7.28,  4.99, 12.61,  8.81)

Q1(b)

The company that sells keyboards of brand B contested the conclusions made based on the above test, and asked WCAS to conduct another experiment, where the same script is tested on both the keyboards, and the typing times are compared for the same script across keyboards.

WCAS randomly distributed 8 scripts among 16 typists, where each script was shared by two typists - one of them typed it with keyboard A, and the other one typed the same script with keyboard B.

The times taken (in minutes) by the typists for the 2 keyboards are in the table below. Each row of the table shows time taken to type the same script.

Keyboard A	Keyboard B
9.13	7.58
10.31	9.11
6.95	8.02
12.77	12.75
10.26	8.97
8.3	6.65
12.07	10.34
12.24	11.24

Is there evidence to support the claim that the time taken by keyboard B is less than that by keyboard A?

State the null and alternate hypothesis, compute the test statistic, conduct the test, and state the conclusion.

(4 points)

a <- c(9.13, 10.31,  6.95, 12.77, 10.26,  8.3, 12.07, 12.24)
b <- c(7.58,  9.11,  8.02, 12.75,  8.97,  6.65, 10.34, 11.24)

Q1(c)

Were the conclusions based on the experiments in 1(a) different from the experiments in 1(b)? If yes, then why? If no, then why not? Explain in terms of design principles.

(4 points)

Q1(d)

Unconvinced by the results of the tests in 1(a) and 1(b), WCAS asked the Department of Statistics and Data Science (DSDS) to conduct both types of experiments (1(a) & 1(b)) again.

DSDS conducted experiments similar to the ones conducted in 1(a) and 1(b). However, they distributed the scripts available within their own department to typists. Note that the scripts within the same department will be similar as compared to scripts across departments.

For the first experiment, DSDS randomly distributed 16 scripts among 16 typists - one manuscript to each typist. The typists and manuscripts are chosen randomly from within DSDS. 8 typists are randomly selected to use keyboard A and the rest use keyboard B.

The times taken (in minutes) by the typists for the 2 keyboards are:

Keyboard A	Keyboard B
10.79	9.20
10.05	6.53
11.17	9.52
10.32	8.50
12.08	9.75
8.81	9.18
11.66	9.92
11.93	9.09

Is there evidence to support the claim that the time taken by keyboard B is less than that by keyboard A?

State the null and alternate hypothesis, compute the test statistic, conduct the test, and state the conclusion.

(4 points)

a <- c(10.79, 10.05, 11.17, 10.32, 12.08,  8.81, 11.66, 11.93)
b <- c(9.20, 6.53, 9.52, 8.50, 9.75, 9.18, 9.92, 9.09)

Q1(e)

Are the conclusions based on the experiment conducted by DSDS in 1(d) the same as that of the experiment conducted in 1(a)?

What do you think can be the reason of the conclusions being the same (if they are indeed the same) or different (if they are indeed different)?

(4 points)

Q1(f)

For the seconds experiment, DSDS randomly distributed 8 scripts among 16 typists, where each script was shared by two typists - one of them typed it with keyboard A, and the other one typed the same script with keyboard B.

The times taken (in minutes) by the typists for the 2 keyboards are in the table below. Each row of the table shows time taken to type the same script.

Keyboard A	Keyboard B
10.99	8.53
9.73	7.06
9.38	8.32
9.29	10.79
8.26	9.61
9.11	11.00
9.51	8.76
9.73	8.88

Is there evidence to support the claim that the time taken by keyboard B is less than that by keyboard A?

State the null and alternate hypothesis, compute the test statistic, conduct the test, and state the conclusion.

(4 points)

a <- c(10.99,  9.73,  9.38,  9.29,  8.26,  9.11,  9.51,  9.73)
b <- c(8.53,  7.06,  8.32, 10.79, 9.61, 11.00,  8.76,  8.88)

Q1(g)

Are the conclusions based on the experiment conducted by DSDS in 1(f) the same as that of the experiment conducted in 1(b)?

What do you think can be the reason of the conclusions being the same (if they are indeed the same) or different (if they are indeed different)?

(4 points)

Q1(h)

If the true difference between the mean typing time with the 2 keyboards is 60 seconds, compute the power of the test for each of the 4 tests - 1(a), 1(b), 1(d), and 1(f). You can assume the population variances to be the same for the populations considered in each experiment.

(8 points)

Q1(i)

Which test among 1(a), 1(b), 1(d), and 1(f) would you recommend in the future for a similar problem? Why?

Is there any potential disadvantage of the recommended test?

(4 points)

Q1(j)

Design an experiment that uses minimum resources (i.e., minimum replicates) to provide a power of at least 95% in detecting a potential difference of 60 seconds in the typing time taken by the keyboards. Mention the design. You may modify any of the designs in 1(a), 1(b), 1(d), or 1(f), or even combine the results of some of the experiments to come up with a more reliable design.

(4 points)

Q2

Consider another experiment to compare two keyboards A and B in terms of typing efficiency. In this experiment, six manuscripts 1-6 are given to the same typist.

Q2(a)

Use a statistical model to quantify the gains (1) using randomization (use the randomized design on slide 15 of the class presentation “1. Introduction.pdf”), and (2) using balanced randomization.

(4 points)

Q2(b)

Suppose the following sequence is obtained using balanced randomization:

\(1. A-B, 2. A-B, 3. A-B, 4. B-A, 5. B-A, 6. B-A.\)

Would you use it for the study? What aspect of the sequence makes you concerned?

Can you relate it to the possibility that the learning effect may decay over time? Assuming that is the case, derive an expression for the expected bias in the estimated typing time difference of the two keyboards, if you use this design.

How can you modify the design to reduce the expected bias?

(2 + 2 + 2 = 6 points)

Q3

A fertilizer is claimed to change the mean yield of a crop to 3 times the mean yield without the fertilizer minus 5 units, i.e.,

mean yield with fertilizer = 3 * (mean yield without fertilizer) - 5

We need to conduct a hypothesis test to verify this claim.

Assume that the variances of the yield of the two populations (with & without the fertilizer), \(\sigma_{Fertilizer}^2\) and \(\sigma_{NoFertilizer}^2\) are known.

Q3(a)

State the null and alternate hypothesis.

(4 points)

Q3(b)

Formulate the test statistic.

(4 points)

Q3(c)

Derive an expression for the ratio of the number of observations to be sampled from each of the populations, so as to maximize the power of the test. Assume a fixed budget of a total of \(N\) samples (i.e., sum of number of samples from both the populations is \(N\)).

(6 points)

Q4

In the early stages of research and development experimentation which type of error do you think is most important, type I or type II? Justify your answer.

(4 points)

Q5

Consider a hardness testing machine that presses a rod with a pointed tip (treatment) into a metal specimen (experimental unit) with a known force.

By measuring the depth of the depression caused by the tip, the hardness of the specimen is determined (observed response).

Two different tips (Tip 1 & Tip 2) are available for this machine, and although the precision (variability) of the measurements made by the two tips seems to be the same, it is suspected that one tip produces different mean hardness readings than the other.

Suppose that the observed hardness, \(y_{ij}\) in the experiments is modeled as follows:

\(y_{ij} = \mu_i + \beta_{j} + \epsilon_{ij}\),

where,

\(y_{ij}\) is the observed hardness of the \(j_{th}\) metal specimen based on the \(i^{th}\) tip,

\(\mu_i\) is the true mean hardness measured by the \(i^{th}\) tip,

\(\beta_{j} \sim N(0, \sigma_{\beta}^2)\) is a random variable, which is the effect of the nature of the metal specimen on its observed hardness (however, the nature of a given metal specimen will have the same effect on the hardness measured by both the tips if tested repeatedly),

\(\epsilon_{ij} \sim N(0, \sigma_{n}^2)\) is the random variable, which is the effect of unknown factors on the observed hardness \(y_{ij}\).

\(i = 1,2\) (There are 2 tips),

\(j = 1, ..., n\) (There are \(n\) replicates for each tip)

Q5(a)

Suppose that a randomly selected set of \(n\) metal specimens is assigned to tip 1, and another set of randomly selected \(n\) metal specimens is assigned to tip 2 (i.e., a total of 2\(n\) specimens are used in the experiment), and the appropriate test is used to test the hypothesis if the mean hardness measured by the two tips is the same.

Q5(a)(i)

Derive an expression for the expected bias of the estimated difference in the mean hardness measured by the 2 tips.

(4 points)

Q5(a)(ii)

Derive an expression for the expected variance of the estimated difference in the mean hardness measured by the 2 tips.

(4 points)

Q5(b)

Suppose that a randomly selected set of \(n\) metal specimens is assigned to both the tips (i.e., a total of \(n\) specimens are used in the experiment). For each specimen, hardness is observed at a point in the specimen using tip 1 and at another point within the same specimen using tip 2. The appropriate test is used to test the hypothesis if the mean hardness measured by the two tips is the same.

Q5(b)(i)

Derive an expression for the expected bias of the estimated difference in the mean hardness measured by the 2 tips.

(2 points)

Q5(b)(ii)

Derive an expression for the expected variance of the estimated difference in the mean hardness measured by the 2 tips.

(3 points)

Q5(c)

Derive the condition, when the expected precision of the estimated difference in means of the hardness measured by the two tips will be higher in the experimental design of 5(b) as compared to the experimental design in 5(a).

Hint: Derive the expression when the expected square of the width of the confidence interval based on the design in 5(b) is less than the expected square of the width of the confidence interval based on the design in 5(a).

(8 points)

Q5(d)

How does the difference in the precision of the estimated difference (in means of the hardness measured by the two tips) by the two experimental designs (5(a) and 5(b)) change as the number of replicates \(n\) are increased in each of those designs.

(2 points)