Confidence Intervals
Chapter 10
(Part 1)

Overview

We will collect one sample, and use the theoretical properties of the sampling distribution (ie: the mean and standard error) to make inferences about the population!

Sampling Distribution Properties

Statistic Population parameter Estimator SE of estimator Critical Value Distribution
Mean \(\mu\) \(\bar{x}\) \(\frac{s}{\sqrt{n} }\) \(t(df=n-1)\)
Difference in means \(\mu_1-\mu_2\) \(\bar{x}_1-\bar{x}_2\) \(\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} }\) \(t(df = min(n_1-1, n_2-1) )\)
Proportion \(\pi\) \(\hat{\pi}\) \(\sqrt{\frac{\hat{\pi}(1-\hat{\pi})}{n} }\) \(N(0,1)\)
Difference in proportions \(\pi_1-\pi_2\) \(\hat{\pi_1}-\hat{\pi_2}\) \(\sqrt{\frac{\hat{\pi_1}(1-\hat{\pi_1})}{n_1} + \frac{\hat{\pi_2}(1-\hat{\pi_2})}{n_2} }\) \(N(0,1)\)
Regression intercept \(\beta_0\) \(b_0\) \(\sqrt{ s_y^2 [\frac{1}{n} + \frac{\bar{x}^2}{(n-1) s_x^2} ] }\) \(t(df=n-2)\)
Regression slope \(\beta_1\) \(b_1\) \(\sqrt{\frac{s_y^2}{(n-1) s_x^2} }\) \(t(df=n-2)\)

Standardization of a sampling distribution

 

\[\hbox{STAT} = \frac{\hbox{Estimator} - \hbox{Mean}}{\hbox{SE}}\]

 

This is the SAME formula as Chapter 9.1 only this time instead of population parameters we are using the sampling distribution parameters!!

Confidence Intervals

A confidence interval (CI) gives a range of plausible values for a parameter. It allows us to combine an estimator with a measure of its precision (i.e. its standard error).

\[CI = \hbox{Estimator} \pm \hbox{Critical Value} * SE\]

  • The critical value is calculated based on the desired confidence level (ie: 95%) using the appropriate distribution for the estimator ie: qnorm() or qt().

  • Sampling distribution properties are used to construct CIs for a population parameter.

Terminology

Critical value vs STAT

  • They both define a point on a standardized distribution.
  • Critical value (CV) generally acts as a pre-defined threshold
  • STAT (test statistic) is generally used for a value computed from sampling data.

Example

You are interested in the average weight of penguins in Antarctica. Let’s say we are able to obtain a random sample of penguins in Antarctica (for example purposes this random sample will be the penguins dataset from the palmerpenguins package.)


Construct a 95% confidence interval for the average weight of penguins in Antarctica.

R Code to obtain CIs from data sets


Statistic R Code
Mean t.test(x = data$variable, conf.level = 0.95)
Proportion prop.test(x = #, n = #, conf.level = 0.95, correct=FALSE)
Diff in mean t.test(x = data1$variable, y = data2$variable, conf.level = 0.95)
Diff in proportion prop.test(x = c(#, #), n = c(#, #), conf.level = 0.95, correct=FALSE )
Regression confint(model, level = 0.95)

Interpretation

Correct Interpretation:

We are 95% confident that the average weight of penguins in Antarctica is between 4116.5 and 4287.1 g.


Incorrect Interpretations:

  • 95% of all penguins in Antarctica weigh between 4116.5 and 4287.1 g.
  • There is a 95% chance that the average weight of penguins in Antarctica is between 4116.5 and 4287.1 g.