title | subtitle |
---|---|
Getting Graphic |
Base graphics and `ggplot2` |
It appears you don't have a PDF plugin for this browser. No biggie... you can click here to download the PDF file.
Download the PDF of the presentation
The R Script associated with this page is available here. Download this file and open it (or copy-paste into a new script) with RStudio so you can follow along.
In this module, we'll primarily use the mtcars
data object. The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).
A data frame with 32 observations on 11 variables.
Column name | Description |
---|---|
mpg | Miles/(US) gallon |
cyl | Number of cylinders |
disp | Displacement (cu.in.) |
hp | Gross horsepower |
drat | Rear axle ratio |
wt | Weight (lb/1000) |
qsec | 1/4 mile time |
vs | V/S |
am | Transmission (0 = automatic, 1 = manual) |
gear | Number of forward gears |
carb | Number of carburetors |
Here's what the data look like:
library(ggplot2);library(knitr)
kable(head(mtcars))
mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
---|---|---|---|---|---|---|---|---|---|---|---|
Mazda RX4 | 21.0 | 6 | 160 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
Mazda RX4 Wag | 21.0 | 6 | 160 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
Datsun 710 | 22.8 | 4 | 108 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
Hornet 4 Drive | 21.4 | 6 | 258 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
Hornet Sportabout | 18.7 | 8 | 360 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
Valiant | 18.1 | 6 | 225 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 |
R has a set of 'base graphics' that can do many plotting tasks (scatterplots, line plots, histograms, etc.)
plot(y=mtcars$mpg,x=mtcars$wt)
Or you can use the more common formula notation:
plot(mpg~wt,data=mtcars)
And you can customize with various parameters:
plot(mpg~wt,data=mtcars,
ylab="Miles per gallon (mpg)",
xlab="Weight (1000 pounds)",
main="Fuel Efficiency vs. Weight",
col="red"
)
Or switch to a line plot:
plot(mpg~wt,data=mtcars,
type="l",
ylab="Miles per gallon (mpg)",
xlab="Weight (1000 pounds)",
main="Fuel Efficiency vs. Weight",
col="blue"
)
See ?plot
for details.
Check out the help for basic histograms.
?hist
Plot a histogram of the fuel efficiencies in the mtcars
dataset.
hist(mtcars$mpg)
The grammar of graphics: consistent aesthetics, multidimensional conditioning, and step-by-step plot building.
- Data: The raw data
geom_
: The geometric shapes representing dataaes()
: Aesthetics of the geometric and statistical objects (color, size, shape, and position)scale_
: Maps between the data and the aesthetic dimensions
data
+ geometry,
+ aesthetic mappings like position, color and size
+ scaling of ranges of the data to ranges of the aesthetics
stat_
: Statistical summaries of the data that can be plotted, such as quantiles, fitted curves (loess, linear models), etc.coord_
: Transformation for mapping data coordinates into the plane of the data rectanglefacet_
: Arrangement of data into grid of plotstheme
: Visual defaults (background, grids, axes, typeface, colors, etc.)
For example, a simple scatterplot:
Add variable colors and sizes:
First, create a blank ggplot object with the data and x-y geometry set up.
p <- ggplot(mtcars, aes(x=wt, y=mpg))
summary(p)
## data: mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb [32x11]
## mapping: x = ~wt, y = ~mpg
## faceting: <ggproto object: Class FacetNull, Facet, gg>
## compute_layout: function
## draw_back: function
## draw_front: function
## draw_labels: function
## draw_panels: function
## finish_data: function
## init_scales: function
## map_data: function
## params: list
## setup_data: function
## setup_params: function
## shrink: TRUE
## train_scales: function
## vars: function
## super: <ggproto object: Class FacetNull, Facet, gg>
p
p + geom_point()
Or you can do both at the same time:
ggplot(mtcars, aes(x=wt, y=mpg)) +
geom_point()
p +
geom_point(aes(colour = factor(cyl)))
p +
geom_point(aes(shape = factor(cyl)))
p +
geom_point(aes(size = qsec))
p +
geom_point(aes(colour = factor(cyl),size = qsec))
p +
geom_point(aes(colour = factor(cyl),size = qsec,shape=factor(gear)))
p + geom_point() +
geom_smooth(method="lm")
## `geom_smooth()` using formula 'y ~ x'
p + geom_point() +
geom_smooth(method="loess")
## `geom_smooth()` using formula 'y ~ x'
p + geom_point(aes(colour = cyl)) +
scale_colour_gradient(low = "blue")
p + geom_point(aes(shape = factor(cyl))) +
scale_shape(solid = FALSE)
ggplot(mtcars, aes(wt, mpg)) +
geom_point(colour = "red", size = 3)
d <- ggplot(diamonds, aes(carat, price))
d + geom_point(alpha = 0.2)
Varying alpha useful for large data sets
d +
geom_point(alpha = 0.1)
d +
geom_point(alpha = 0.01)
- points
- A smooth ('loess') curve
- a "rug" to the plot
p <- ggplot(mtcars, aes(x=wt, y=mpg))
Show Solution
p <- ggplot(mtcars, aes(factor(cyl), mpg))
p + geom_point()
p +
geom_jitter()
p +
geom_violin()
p +
geom_violin() + geom_jitter(position = position_jitter(width = .1))
Will return to this when we start working with raster maps.
Visualize a data transformation
- Each stat creates additional variables with a common
..name..
syntax - Often two ways:
stat_bin(geom="bar")
ORgeom_bar(stat="bin")
Old Faithful Geyser Data on duration and waiting times.
library("MASS")
data(geyser)
m <- ggplot(geyser, aes(x = duration, y = waiting))
See ?geyser
for details.
m +
geom_point()
m +
geom_point() + stat_density2d(geom="contour")
Check ?geom_density2d()
for details
m +
geom_point() + stat_density2d(geom="contour") +
xlim(0.5, 6) + ylim(40, 110)
Update limits to show full contours. Check ?geom_density2d()
for details
m + stat_density2d(aes(fill = ..level..), geom="polygon") +
geom_point(col="red")
Check ?geom_density2d()
for details
Edit plot m
to include:
- The point data (with red points) on top
- A
binhex
plot of the Old Faithful data
Experiment with the number of bins to find one that works.
See ?stat_binhex
for details.
#install.packages("hexbin")
library(hexbin)
m <- ggplot(geyser, aes(x = duration, y = waiting))
Show Solution
b=ggplot(mpg,aes(fl))+
geom_bar( aes(fill = fl)); b
b + scale_fill_grey( start = 0.2, end = 0.8,
na.value = "red")
a <- ggplot(mpg, aes(x=hwy,y=cty,col=displ)) +
geom_point(); a
a + scale_color_gradient( low = "red",
high = "yellow")
a + scale_color_gradient2(low = "red", high = "blue",
mid = "white", midpoint = 4)
a + scale_color_gradientn(
colours = rainbow(10))
b +
scale_fill_brewer( palette = "Blues")
Edit the contour plot of the geyser data:
- Reduce the size of the points
- Use a sequential brewer palette (select from colorbrewer2.org)
- Add informative x and y labels
m <- ggplot(geyser, aes(x = duration, y = waiting)) +
stat_density2d(aes(fill = ..level..), geom="polygon") +
geom_point(col="red")
Note: scale_fill_distiller()
rather than scale_fill_brewer()
for continuous data
Show Solution
m + stat_density2d(aes(fill = ..level..), geom="polygon") +
geom_point(size=.75)+
scale_fill_distiller(palette="OrRd",
name="Kernel\nDensity")+
xlim(0.5, 6) + ylim(40, 110)+
xlab("Eruption Duration (minutes)")+
ylab("Waiting time (minutes)")
Or use geom=tile
for a raster representation.
m + stat_density2d(aes(fill = ..density..), geom="tile",contour=F) +
geom_point(size=.75)+
scale_fill_distiller(palette="OrRd",
name="Kernel\nDensity")+
xlim(0.5, 6) + ylim(40, 110)+
xlab("Eruption Duration (minutes)")+
ylab("Waiting time (minutes)")
Create noisy exponential data
set.seed(201)
n <- 100
dat <- data.frame(
xval = (1:n+rnorm(n,sd=5))/20,
yval = 10^((1:n+rnorm(n,sd=5))/20)
)
Make scatter plot with regular (linear) axis scaling
sp <- ggplot(dat, aes(xval, yval)) + geom_point()
sp
Example from R Cookbook
log10 scaling of the y axis (with visually-equal spacing)
sp + scale_y_log10()
ggplot(diamonds, aes(clarity, fill=cut)) + geom_bar()
ggplot(diamonds, aes(clarity, fill=cut)) + geom_bar(position="dodge")
Use facets to divide graphic into small multiples based on a categorical variable.
facet_wrap()
for one variable:
ggplot(mpg, aes(x = cty, y = hwy, color = factor(cyl))) +
geom_point()+
facet_wrap(~year)
facet_grid()
: two variables
ggplot(mpg, aes(x = cty, y = hwy, color = factor(cyl))) +
geom_point()+
facet_grid(year~cyl)
Small multiples (via facets) are very useful for visualization of timeseries (and especially timeseries of spatial data.)
Set default display parameters (colors, font sizes, etc.) for different purposes (for example print vs. presentation) using themes.
Quickly change plot appearance with themes.
library(ggthemes)
Or build your own!
p=ggplot(mpg, aes(x = cty, y = hwy, color = factor(cyl))) +
geom_jitter() +
labs(
x = "City mileage/gallon",
y = "Highway mileage/gallon",
color = "Cylinders"
)
p
p + theme_solarized()
p + theme_solarized(light=FALSE)
p + theme_excel()
p + theme_economist()
XKCD: A webcomic of romance, sarcasm, math, and language.
Note: the following code will only work if you have the xkcd font installed. See `xkcd::vignette("xkcd-intro")` for details.library(xkcd)
ggplot(mtcars, aes(mpg, wt)) +
geom_point() +
geom_smooth()+
ylab("Weight")+xlab("Miles per Gallon")+
theme_xkcd()
Save a ggplot
with sensible defaults:
ggsave(filename, plot = last_plot(), scale = 1, width, height)
Save any plot with maximum flexibility:
pdf(filename, width, height) # open device
ggplot() # draw the plot(s)
dev.off() # close the device
Formats
- jpeg
- png
- tif
and more...
Now complete the first task here by yourself or in small groups.
Perhaps R's best documented package: docs.ggplot2.org
Sources:
Licensing: