ggplot(data= my_data, aes(x = (optional categorical variable), y = var1)) +
geom_boxplot()Artwork by @allison_horst

There are 3 things that we typically focus on and describe/compare when inspecting a boxplot:
Consider the titanic dataset, which contains information about passengers on the titanic. If a passenger survived, then the variable Survived = 1.
Rows: 32
Columns: 4
$ Class <chr> "1st", "2nd", "3rd", "Crew", "1st", "2nd", "3rd", "Crew", "1s…
$ Sex <chr> "Male", "Male", "Male", "Male", "Female", "Female", "Female",…
$ Age <chr> "Child", "Child", "Child", "Child", "Child", "Child", "Child"…
$ Survived <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1…
Is Survived a categorical or numerical variable?
R will read in numerical columns as “numbers” even though these numbers are supposed to represent “categories”.
To fix this we need to use the factor() function.
Typing factor(Survived) would turn the variable into a “factor”.
A barplot is used to visualize the distribution (frequencies) of a single categorical variable.
geom_bar() is used when we have the raw data and counting how many observations are in each category has to be done (list not yet counted).geom_col() is used when we directly have counts of each category in our dataset (pre-counted).When describing a barplot we look for…
![]()
Consider the Palmer Penguins dataset.
# A tibble: 6 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
# ℹ 2 more variables: sex <fct>, year <int>
We are interested in plotting the distribution of species.
Are we using geom_bar() or geom_col()?
What if we want to visualize the distribution of sex in each of the species. There are 4 main ways to visualize multiple levels within a categorical data:

“factor” vs “character” variable
