November 16, 2022

Graphs in R

Multiple packages:

  • Basic R plot()
    • Helpful for first data exploration
  • ggplot2
    • Allows for many modifications

Quick example

Base plots can be quick

hist(diamonds$price)

But so is ggplot

ggplot(diamonds)+
  geom_histogram(aes(x = price))

Doesn’t look as pretty, but…

ggplot(diamonds)+
  geom_histogram(aes(x = price))

ggplot(diamonds, aes(price, fill = cut)) +
  geom_histogram()

What is ggplot2 and why use it?

A package that is

  • elegant and versatile
  • grammar of graphics
  • layered structure that lets you build block by block
  • different from base graphics (base R)
    • curve could be steep
      • but looks amazing when it works!
  • more information:

Prepare library

Options:

  • tidyverse
    • includes many packages (ggplot2, dplyr)
    • time to load
  • ggplot2
library(ggplot2)

library(tidyverse)

A Graph is a Graph is a Graph

  • Most visualisations will boil down to:

Building Blocks of a Graph

 x y
 6 1
 3 3
 5 4
 1 5
 2 6

Building Blocks of a Graph

Building Blocks of a Graph

Why?

  • Understanding breakdown of your graph is important:
  • This is core of ggplot2!
  • Will make plotting of perfect graph easier
  • Process of adding layers

Explore data

colnames(diamonds)
##  [1] "carat"   "cut"     "color"   "clarity" "depth"   "table"   "price"  
##  [8] "x"       "y"       "z"

Explore data

summary(diamonds)
     carat               cut        color        clarity          depth      
 Min.   :0.2000   Fair     : 1610   D: 6775   SI1    :13065   Min.   :43.00  
 1st Qu.:0.4000   Good     : 4906   E: 9797   VS2    :12258   1st Qu.:61.00  
 Median :0.7000   Very Good:12082   F: 9542   SI2    : 9194   Median :61.80  
 Mean   :0.7979   Premium  :13791   G:11292   VS1    : 8171   Mean   :61.75  
 3rd Qu.:1.0400   Ideal    :21551   H: 8304   VVS2   : 5066   3rd Qu.:62.50  
 Max.   :5.0100                     I: 5422   VVS1   : 3655   Max.   :79.00  
                                    J: 2808   (Other): 2531                  
     table           price             x                y         
 Min.   :43.00   Min.   :  326   Min.   : 0.000   Min.   : 0.000  
 1st Qu.:56.00   1st Qu.:  950   1st Qu.: 4.710   1st Qu.: 4.720  
 Median :57.00   Median : 2401   Median : 5.700   Median : 5.710  
 Mean   :57.46   Mean   : 3933   Mean   : 5.731   Mean   : 5.735  
 3rd Qu.:59.00   3rd Qu.: 5324   3rd Qu.: 6.540   3rd Qu.: 6.540  
 Max.   :95.00   Max.   :18823   Max.   :10.740   Max.   :58.900  
                                                                  
       z         
 Min.   : 0.000  
 1st Qu.: 2.910  
 Median : 3.530  
 Mean   : 3.539  
 3rd Qu.: 4.040  
 Max.   :31.800  
                 

Setup

Tell ggplot what data set to use

ggplot(diamonds)

Setup

Tell ggplot what data set to use

  • nothing, because ggplot only knows which data set
ggplot(diamonds)

Adding axis

ggplot(diamonds, aes (x = carat, y = price))

Adding axis

ggplot(diamonds, aes (x = carat, y = price))

Adding layers

ggplot(diamonds, aes(x = carat, y = price)) +
  geom_point()

Adding layers

ggplot(diamonds, aes(x = carat, y = price)) +
  geom_point()

What is happening:

ggplot(diamonds, aes(x = carat, y = price)) +
  geom_point()
  • ggplot(diamonds) loads data frame
  • plus (+) tells ggplot() that there is more to add
  • geo_point() defines type of plot
  • aes(x = , y =) defines the variables

Change colour

ggplot(diamonds, aes(x = carat, y = price, colour = cut)) +
  geom_point()

Change colour

ggplot(diamonds, aes(x = carat, y = price, colour = cut)) +
  geom_point()

Adding more layers

ggplot(diamonds, aes(x = carat, y = price, colour = cut)) +
  geom_point() +
  geom_smooth()

Adding more layers

ggplot(diamonds, aes(x = carat, y = price, colour = cut)) +
  geom_point() +
  geom_smooth()

Alternative coding

ggplot(diamonds)+
  geom_point(aes(x = carat, y = price, colour = cut)) +
  geom_smooth(aes(x = carat, y = price, colour = cut))

Alternative coding

ggplot(diamonds)+
  geom_point(aes(x = carat, y = price, colour = cut)) +
  geom_smooth(aes(x = carat, y = price, colour = cut))

Change geom_smooth

ggplot(diamonds) +
  geom_point(aes(x = carat, y = price, colour = cut)) +
  geom_smooth(aes(x = carat, y = price))

Change geom_smooth

ggplot(diamonds) +
  geom_point(aes(x = carat, y = price, colour = cut)) +
  geom_smooth(aes(x = carat, y = price))

Change shape

ggplot(diamonds) +
  geom_point(aes(x = carat, y = price, colour = cut, shape = cut)) +
  geom_smooth(aes(x = carat, y = price))

Change shape

ggplot(diamonds) +
  geom_point(aes(x = carat, y = price, colour = cut, shape = cut)) +
  geom_smooth(aes(x = carat, y = price))

Your turn!

  • Use mpg data set
  • Explore the column names
  • Explore data with base R plot
  • Scatterplot of engine size (displ) & fuel efficiency (hwy)
  • Colour by type of car (class)
  • Add an average line

Result column names

colnames(mpg)
##  [1] "manufacturer" "model"        "displ"        "year"         "cyl"         
##  [6] "trans"        "drv"          "cty"          "hwy"          "fl"          
## [11] "class"

Result histogram

hist(mpg$displ)

hist(mpg$hwy)

Result scatterplot

ggplot(mpg) + 
  geom_point(aes(x = displ, y = hwy, colour = class)) +
  geom_smooth(aes(x = displ, y = hwy))

Info

Access to slides:

  • Slides will be available on github

  • Look for Purdue R cafe on github.com

Next R cafe

  • More ggplot
  • Make it pretty:
    • Axis
    • Legend
    • Title
  • Other graph types

Questions or Suggestions?