Multiple Regression
Chapter 6.0 - 6.1

Today’s goals


  1. Regression with two numerical explanatory variables
  2. Understand coefficient interpretation
  3. Make predictions with a model

Multiple Regression

Definition: models the relationship between a response (y) variable and two or more explanatory (x) variables.

  • 2 numerical explanatory variables this is the “plane” of best fit.
  • 1 numerical + 1 categorical explanatory variable use a line of best fit separated by category.

EDA: Same as simple regression. Look at the raw data, calculate summary statistics/correlations, and use data visualizations.

MLR Correlation and Causation

Correlation: calculate the correlation between each pair of variables.

penguins %>% 
  filter(!is.na(flipper_length_mm)) %>% 
  select(body_mass_g, flipper_length_mm, bill_length_mm) %>%
  cor()
                  body_mass_g flipper_length_mm bill_length_mm
body_mass_g         1.0000000         0.8712018      0.5951098
flipper_length_mm   0.8712018         1.0000000      0.6561813
bill_length_mm      0.5951098         0.6561813      1.0000000


Causation: any change in the value of one variable will cause a change in the value of another variable, which means one variable makes the other happen.

Correlation does not mean causality!!

Multiple Regression Modeling

Multiple regression is the procedure that helps us disentangle issues with confounding (lurking) variables. That is, a procedure that will help sort out collinearity issues.

  • Simpson’s paradox: a trend appears in several groups of data but the trend disappears or reverses when groups are combined.
  • Model: In multiple linear regression, the significance of each term in the model depends on the other terms in the model.

Example: model

Predict body mass based on both flipper length and bill length.

model_penguins <- lm(body_mass_g ~ flipper_length_mm + bill_length_mm, data = penguins)
summary(model_penguins)$coefficients
                      Estimate Std. Error    t value     Pr(>|t|)
(Intercept)       -5736.897161 307.959128 -18.628762 7.796205e-54
flipper_length_mm    48.144859   2.011115  23.939388 7.564660e-75
bill_length_mm        6.047488   5.179831   1.167507 2.438263e-01

\[\widehat{bodymass} = b_0 +b_1*\text{flipper} + b_2*\text{bill_length}\]

Example: interpretations

\[\widehat{bodymass} = -5736.9 + 48.14*\text{flipper} + 6.05*\text{bill_length}\]

  • \(b_0\) intercept; the predicted/expected body mass when a penguin has flipper length 0 and bill length 0
  • \(b_1\) slope of flipper length; holding all other explanatory variables constant, for every 1 mm increase in flipper length, the expected body mass increases by \(48.14\)
  • \(b_2\) slope of bill length; holding all other explanatory variables constant, for every 1 mm increase in bill length, the expected body mass increases by \(6.05\)

Example: prediction

\[\widehat{bodymass} = -5736.9 + 48.14*\text{flipper} + 6.05*\text{bill_length}\]


You recently got a penguin and measured their flipper_length to be 150mm and their bill_length to be 40mm. What do you expect their body mass to be?