Confidence Intervals
Chapter 10
(Part 1)

Overview

We will collect one sample, and use the theoretical properties of the sampling distribution (ie: the mean and standard error) to make inferences about the population!

Sampling Distribution Properties

Statistic	Population parameter	Estimator	SE of estimator	Critical Value Distribution
Mean	$\mu$	$\bar{x}$	$\frac{s}{\sqrt{n} }$	$t(df=n-1)$
Difference in means	$\mu_1-\mu_2$	$\bar{x}_1-\bar{x}_2$	$\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} }$	$t(df = min(n_1-1, n_2-1) )$
Proportion	$\pi$	$\hat{\pi}$	$\sqrt{\frac{\hat{\pi}(1-\hat{\pi})}{n} }$	$N(0,1)$
Difference in proportions	$\pi_1-\pi_2$	$\hat{\pi_1}-\hat{\pi_2}$	$\sqrt{\frac{\hat{\pi_1}(1-\hat{\pi_1})}{n_1} + \frac{\hat{\pi_2}(1-\hat{\pi_2})}{n_2} }$	$N(0,1)$
Regression intercept	$\beta_0$	$b_0$	$\sqrt{ s_y^2 [\frac{1}{n} + \frac{\bar{x}^2}{(n-1) s_x^2} ] }$	$t(df=n-2)$
Regression slope	$\beta_1$	$b_1$	$\sqrt{\frac{s_y^2}{(n-1) s_x^2} }$	$t(df=n-2)$

Standardization of a sampling distribution

\[\hbox{STAT} = \frac{\hbox{Estimator} - \hbox{Mean}}{\hbox{SE}}\]

This is the SAME formula as Chapter 9.1 only this time instead of population parameters we are using the sampling distribution parameters!!

Confidence Intervals

A confidence interval (CI) gives a range of plausible values for a parameter. It allows us to combine an estimator with a measure of its precision (i.e. its standard error).

\[CI = \hbox{Estimator} \pm \hbox{Critical Value} * SE\]

The critical value is calculated based on the desired confidence level (ie: 95%) using the appropriate distribution for the estimator ie: qnorm() or qt().
Sampling distribution properties are used to construct CIs for a population parameter.

Terminology

Critical value vs STAT

They both define a point on a standardized distribution.
Critical value (CV) generally acts as a pre-defined threshold
STAT (test statistic) is generally used for a value computed from sampling data.

Example

You are interested in the average weight of penguins in Antarctica. Let’s say we are able to obtain a random sample of penguins in Antarctica (for example purposes this random sample will be the penguins dataset from the palmerpenguins package.)

Construct a 95% confidence interval for the average weight of penguins in Antarctica.

R Code to obtain CIs from data sets

Statistic	R Code
Mean	t.test(x = data$variable, conf.level = 0.95)
Proportion	prop.test(x = #, n = #, conf.level = 0.95, correct=FALSE)
Diff in mean	t.test(x = data1$variable, y = data2$variable, conf.level = 0.95)
Diff in proportion	prop.test(x = c(#, #), n = c(#, #), conf.level = 0.95, correct=FALSE )
Regression	confint(model, level = 0.95)

Interpretation

Correct Interpretation:

We are 95% confident that the average weight of penguins in Antarctica is between 4116.5 and 4287.1 g.

Incorrect Interpretations:

95% of all penguins in Antarctica weigh between 4116.5 and 4287.1 g.
There is a 95% chance that the average weight of penguins in Antarctica is between 4116.5 and 4287.1 g.

Statistic	Population parameter	Estimator	SE of estimator	Critical Value Distribution
Mean	\(\mu\)	\(\bar{x}\)	\(\frac{s}{\sqrt{n} }\)	\(t(df=n-1)\)
Difference in means	\(\mu_1-\mu_2\)	\(\bar{x}_1-\bar{x}_2\)	\(\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} }\)	\(t(df = min(n_1-1, n_2-1) )\)
Proportion	\(\pi\)	\(\hat{\pi}\)	\(\sqrt{\frac{\hat{\pi}(1-\hat{\pi})}{n} }\)	\(N(0,1)\)
Difference in proportions	\(\pi_1-\pi_2\)	\(\hat{\pi_1}-\hat{\pi_2}\)	\(\sqrt{\frac{\hat{\pi_1}(1-\hat{\pi_1})}{n_1} + \frac{\hat{\pi_2}(1-\hat{\pi_2})}{n_2} }\)	\(N(0,1)\)
Regression intercept	\(\beta_0\)	\(b_0\)	\(\sqrt{ s_y^2 [\frac{1}{n} + \frac{\bar{x}^2}{(n-1) s_x^2} ] }\)	\(t(df=n-2)\)
Regression slope	\(\beta_1\)	\(b_1\)	\(\sqrt{\frac{s_y^2}{(n-1) s_x^2} }\)	\(t(df=n-2)\)

Confidence Intervals Chapter 10 (Part 1)