We will collect one sample, and use the theoretical properties of the sampling distribution (ie: the mean and standard error) to make inferences about the population!
| Statistic | Population parameter | Estimator | SE of estimator | Critical Value Distribution |
|---|---|---|---|---|
| Mean | \(\mu\) | \(\bar{x}\) | \(\frac{s}{\sqrt{n} }\) | \(t(df=n-1)\) |
| Difference in means | \(\mu_1-\mu_2\) | \(\bar{x}_1-\bar{x}_2\) | \(\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} }\) | \(t(df = min(n_1-1, n_2-1) )\) |
| Proportion | \(\pi\) | \(\hat{\pi}\) | \(\sqrt{\frac{\hat{\pi}(1-\hat{\pi})}{n} }\) | \(N(0,1)\) |
| Difference in proportions | \(\pi_1-\pi_2\) | \(\hat{\pi_1}-\hat{\pi_2}\) | \(\sqrt{\frac{\hat{\pi_1}(1-\hat{\pi_1})}{n_1} + \frac{\hat{\pi_2}(1-\hat{\pi_2})}{n_2} }\) | \(N(0,1)\) |
| Regression intercept | \(\beta_0\) | \(b_0\) | \(\sqrt{ s_y^2 [\frac{1}{n} + \frac{\bar{x}^2}{(n-1) s_x^2} ] }\) | \(t(df=n-2)\) |
| Regression slope | \(\beta_1\) | \(b_1\) | \(\sqrt{\frac{s_y^2}{(n-1) s_x^2} }\) | \(t(df=n-2)\) |
If the population standard deviation is known, \(\sigma\) (rare!) you use the normal distribution instead of the t-distribution. (ie: use \(\sigma\) instead of \(s\))
\[\hbox{STAT} = \frac{\hbox{Estimator} - \hbox{Mean}}{\hbox{SE}}\]
This is the SAME formula as Chapter 9.1 only this time instead of population parameters we are using the sampling distribution parameters!!
A confidence interval (CI) gives a range of plausible values for a parameter. It allows us to combine an estimator with a measure of its precision (i.e. its standard error).
\[CI = \hbox{Estimator} \pm \hbox{Critical Value} * SE\]
The critical value is calculated based on the desired confidence level (ie: 95%) using the appropriate distribution for the estimator ie: qnorm() or qt().
Sampling distribution properties are used to construct CIs for a population parameter.
Critical value vs STAT
You are interested in the average weight of penguins in Antarctica. Let’s say we are able to obtain a random sample of penguins in Antarctica (for example purposes this random sample will be the penguins dataset from the palmerpenguins package.)
Construct a 95% confidence interval for the average weight of penguins in Antarctica.
| Statistic | R Code |
|---|---|
| Mean | t.test(x = data$variable, conf.level = 0.95) |
| Proportion | prop.test(x = #, n = #, conf.level = 0.95, correct=FALSE) |
| Diff in mean | t.test(x = data1$variable, y = data2$variable, conf.level = 0.95) |
| Diff in proportion | prop.test(x = c(#, #), n = c(#, #), conf.level = 0.95, correct=FALSE ) |
| Regression | confint(model, level = 0.95) |
Correct Interpretation:
We are 95% confident that the average weight of penguins in Antarctica is between 4116.5 and 4287.1 g.
Incorrect Interpretations:
