Hands-on Exercise 1

Author

Dabbie Neo

Published

April 14, 2023

Modified

June 4, 2023

1. Getting Started

Install and launching R packages

The code chunk below uses p_load() of pacman package to check if tidyverse packages are installed in the computer. If they are, then they will be launched into R.

pacman::p_load(tidyverse)

Importing the data

exam_data <- read_csv("data/Exam_data.csv")
Rows: 322 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): ID, CLASS, GENDER, RACE
dbl (3): ENGLISH, MATHS, SCIENCE

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

2. Exercises

Plotting a simple chart

Using geombar() to plot a bar chart

ggplot(data=exam_data, 
       aes(x=RACE)) +
  geom_bar()

Using geom_dotplot() to plot a dot plot

ggplot(data=exam_data, 
       aes(x = MATHS)) +
  geom_dotplot(binwidth=2.5,         
               dotsize = 0.5) +      
  scale_y_continuous(NULL,           
                     breaks = NULL)

Using geom_histogram() to create a histogram

ggplot(data=exam_data, 
       aes(x = MATHS)) +
  geom_histogram() 
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Working with Theme

Changing theme of bar chart

ggplot(data=exam_data, aes(x=RACE)) +
  geom_bar() +
  coord_flip() +
  theme_minimal() +
  theme(
    panel.background = element_rect(fill = "lightblue", colour = "lightblue", 
                                    size = 0.5, linetype = "solid"),
    panel.grid.major = element_line(size = 0.5, linetype = 'solid', colour = "white"), 
    panel.grid.minor = element_line(size = 0.25, linetype = 'solid', colour = "white"))
Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
ℹ Please use the `linewidth` argument instead.
Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
ℹ Please use the `linewidth` argument instead.

Designing Data-driven Graphics for Analysis

Exercise 1

ggplot(data=exam_data, 
       aes(x=reorder(RACE,RACE,
                     function(x)-length(x)))) +
  geom_bar() +
  ylim(0,220) +
  geom_text(stat="count", 
      aes(label=paste0(..count.., ", ", 
      round(..count../sum(..count..)*100, 1), "%")),
      vjust=-1) +
  xlab("Race") +
  ylab("No. of\nPupils") +
  theme(axis.title.y=element_text(angle = 0))
Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(count)` instead.

Exercise 2 : Adding mean and median lines

ggplot(data=exam_data, 
       aes(x= MATHS)) +
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue") +
  geom_vline(aes(xintercept=mean(MATHS, na.rm=T)),
             color="red", 
             linetype="dashed", 
             size=1) +
  geom_vline(aes(xintercept=median(MATHS, na.rm=T)),
             color="grey30",
             linetype="dashed", 
             size=1)
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

Exercise 3: Distribution of English scores for all pupils

d <- exam_data   
d_bg <- d[, -3]  

ggplot(d, aes(x = ENGLISH, fill = GENDER)) +
  geom_histogram(data = d_bg, fill = "grey", alpha = .5) +
  geom_histogram(colour = "black") +
  facet_wrap(~ GENDER) +
  guides(fill = FALSE) +  
  theme_bw()
Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
of ggplot2 3.3.4.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Exercise 4

ggplot(data=exam_data, 
       aes(x=MATHS, y=ENGLISH)) +
  geom_point() +
  coord_cartesian(xlim=c(0,100),
                  ylim=c(0,100)) +
  geom_hline(yintercept=50,
             linetype="dashed",
             color="grey60",
             size=1) + 
  geom_vline(xintercept=50, 
             linetype="dashed",
             color="grey60",
             size=1)