Books

The Grammar of Graphics

• Data: Raw data that we'd like to visualize
• Geometrics: shapes that we use to visualize data
• Aesthetics: Properties of geometries (size, color, etc)
• Scales: Mapping between geometries and aesthetics

Scatterplot aesthetics

geom_point(). The aesthetics is geom dependent.

• x, y
• shape
• color
• size. It is not always to put 'size' inside aes(). See an example at Legend layout.
• alpha
library(ggplot2)
library(tidyverse)
set.seed(1)
x1 <- rbinom(100, 1, .5) - .5
x2 <- c(rnorm(50, 3, .8)*.1, rnorm(50, 8, .8)*.1)
x3 <- x1*x2*2
# x=1:100, y=x1, x2, x3
tibble(x=1:length(x1), T=x1, S=x2, I=x3) %>%
tidyr::pivot_longer(-x) %>%
ggplot(aes(x=x, y=value)) +
geom_point(aes(color=name))

# Cf
matplot(1:length(x1), cbind(x1, x2, x3), pch=16,
col=c('cornflowerblue', 'springgreen3', 'salmon'))


Help

> library(ggplot2)
Need help? Try Stackoverflow: https://stackoverflow.com/tags/ggplot2


Some examples

Examples from 'R for Data Science' book - Aesthetic mappings

ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
# the 'mapping' is the 1st argument for all geom_* functions, so we can safely skip it.
# template
ggplot(data = <DATA>) +
<GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))

# add another variable through color, size, alpha or shape
ggplot(data = mpg) +
geom_point(aes(x = displ, y = hwy, color = class))

ggplot(data = mpg) +
geom_point(aes(x = displ, y = hwy, size = class))

ggplot(data = mpg) +
geom_point(aes(x = displ, y = hwy, alpha = class))

ggplot(data = mpg) +
geom_point(aes(x = displ, y = hwy, shape = class))

ggplot(data = mpg) +
geom_point(aes(x = displ, y = hwy), color = "blue")

# add another variable through facets
ggplot(data = mpg) +
geom_point(aes(x = displ, y = hwy)) +
facet_wrap(~ class, nrow = 2)

# add another 2 variables through facets
ggplot(data = mpg) +
geom_point(aes(x = displ, y = hwy)) +
facet_grid(drv ~ cyl)

Examples from 'R for Data Science' book - Geometric objects, lines and smoothers

# Points
ggplot(data = mpg) +
geom_point(aes(x = displ, y = hwy)) # we can add color to aes()

# Line plot
ggplot() +
geom_line(aes(x, y))  # we can add color to aes()

# Smoothed
# 'size' controls the line width
ggplot(data = mpg) +
geom_smooth(aes(x = displ, y = hwy), size=1)

# Points + smoother, add transparency to points, remove se
# We add transparency if we need to make smoothed line stands out
#                    and points less significant
# We move aes to the '''mapping''' option in ggplot()
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(alpha=1/10) +
geom_smooth(se=FALSE)

# Colored points + smoother
ggplot(data = mpg, aes(x = displ, y = hwy)) +
geom_point(aes(color = class)) +
geom_smooth()

Examples from 'R for Data Science' book - Transformation, bar plot

# y axis = counts
# bar plot
ggplot(data = diamonds) +
geom_bar(aes(x = cut))
# Or
ggplot(data = diamonds) +
stat_count(aes(x = cut))

# y axis = proportion
ggplot(data = diamonds) +
geom_bar(aes(x = cut, y = ..prop.., group = 1))

# bar plot with 2 variables
ggplot(data = diamonds) +
geom_bar(aes(x = cut, fill = clarity))

facet_wrap and facet_grid to create a panel of plots

• The statement facet_grid() can be defined without a data. For example
mylayout <- list(ggplot2::facet_grid(cat_y ~ cat_x))
mytheme <- c(mylayout,
list(ggplot2::theme_bw(), ggplot2::ylim(NA, 1)))
# we haven't defined cat_y, cat_x variables
ggplot() + geom_line() +
mylayout

• Multiclass predictive modeling for #TidyTuesday NBER papers
• changing the facet_wrap labels using labeller in ggplot2. The solution is to create a labeller function as a function of a variable x (or any other name as long as it's not the faceting variables' names) and then coerce to labeller with as_labeller.

lattice::xyplot

df <- data.frame(x = rnorm(100), y = rnorm(100), group = sample(c("A", "B"), 100, replace = TRUE))

# Use the xyplot() function to create the plot
# with each group represented by a different color
# result is 1 plot only
# no annotation
xyplot(y ~ x, data = df, groups = group)

df <- data.frame(x = rnorm(100), y = rnorm(100),
group = sample(c("A", "B"), 100, replace = TRUE),
time = sample(c("T1", "T2"), 100, replace = TRUE))

# 2 plots grouped by time
# two colors (defined by group) was used in each plot
# no annotation
xyplot(y ~ x | time, groups = group, data = df)


For more complicated plot, we can use the panel parameter.

Color palette

Display color palettes

• Use barplot()
pal <- c("#E41A1C", "#377EB8", "#4DAF4A", "#984EA3", "#FF7F00")
# pal <- sample(colors(), 10) # randomly pick 10 colors

barplot(rep(1, length(pal)), col = pal, space = 0,
axes = FALSE, border = NA)
par()$usr # [1] -0.20 5.20 -0.01 1.00  • Use heatmap() pal <- c("#E41A1C", "#377EB8", "#4DAF4A", "#984EA3", "#FF7F00") pal <- matrix(pal, nr=2) # acknowledge a nice warning message # [,1] [,2] [,3] # [1,] "#E41A1C" "#4DAF4A" "#FF7F00" # [2,] "#377EB8" "#984EA3" "#E41A1C" pal_matrix <- matrix(seq_along(pal), nr=nrow(pal), nc=ncol(pal)) heatmap(pal_matrix, col = pal, Rowv = NA, Colv = NA, scale = "none", ylab = "", xlab = "", main = "", margins = c(5, 5)) # 2 rows, 3 columns with labeling on two axes par()$usr
# [1] 0 1 0 1


• Use image()
pal <- palette() # R 4.0 has a new default palette
# The old colors are highly saturated and vary enormousely
# in terms of luminance
# [1] "black"   "#DF536B" "#61D04F" "#2297E6" "#28E2E5" "#CD0BBC" "#F5C710"
# [8] "gray62"
pal_matrix <- matrix(seq_along(pal), nr=1)
image(pal_matrix, col = pal, axes = FALSE)
# 8 rows, 1 column, but no labeling
# Starting from bottom, left.

[1] 3 4 1 2
Browse[2]> coordinates$chr [1] "20" "8" "16" "16"  • Differences of scale_color_gradient() and scale_color_continuous() • scale_color_gradient() (more common than scale_color_continuous) is used to map a continuous variable to a color gradient. It takes two arguments: low and high, which specify the colors for the minimum and maximum values of the variable, respectively. The gradient is automatically generated between these two colors. ggplot(data = diamonds, aes(x = carat, y = price, color = depth)) + geom_point() + scale_color_gradient(low = "blue", high = "red")  • scale_color_continuous() (useful if we want to specify the labels to display on legend) does not automatically generate the color scale. Instead, it requires the user to specify the values to which the colors should be mapped. The limits argument sets the minimum and maximum values for the variable, and the breaks argument specifies the values at which breaks occur. ggplot(data = diamonds, aes(x = carat, y = price, color = depth)) + geom_point() + scale_color_continuous(name = "Depth", limits = c(40, 80), breaks = c(40, 60, 80), labels = c("Shallow", "Moderate", "Deep"), # display on legend type = "gradient")  ylim and xlim in ggplot2 in axes Use one of the following • + scale_x_continuous(limits = c(-5000, 5000)) • + coord_cartesian(xlim = c(-5000, 5000)) • + xlim(-5000, 5000) Emulate ggplot2 default color palette The above can be created by R >= 4.0.0 using the command scales::show_col(palette.colors(palette = "ggplot2")). We should ignore the 1st color (black). Also if n>=5, the colors do not match with the result of show_col(hue_pal()(5)) . Answer 1 It is just equally spaced hues around the color wheel. Emulate ggplot2 default color palette gg_color_hue <- function(n) { hues = seq(15, 375, length = n + 1) hcl(h = hues, l = 65, c = 100)[1:n] } n = 4 cols = gg_color_hue(n) dev.new(width = 4, height = 4) plot(1:n, pch = 16, cex = 2, col = cols) Answer 2 (better, it shows the color values in HEX). It should be read from left to right and then top to down. scales package library(scales) show_col(hue_pal()(4)) # ("#F8766D", "#7CAE00", "#00BFC4", "#C77CFF") # (Salmon, Christi, Iris Blue, Heliotrope) show_col(hue_pal()(3)) # ("#F8766D", "#00BA38", "#619CFF") # (Salmon, Dark Pastel Green, Cornflower Blue) show_col(hue_pal()(2)) # ("#F8767D", "#00BFC4") = (salmon, iris blue) # see https://www.htmlcsscolor.com/ for color names See also the last example in ggsurv() where the KM plots have 4 strata. The colors can be obtained by scales::hue_pal()(4) with hue_pal()'s default arguments. R has a function called colorName() to convert a hex code to color name; see roloc package on CRAN. How to change the default color palette in geom_XXX transform scales Class variables • "Set1" is a good choice. See RColorBrewer::display.brewer.all() • For ordinal variable, brewer.pal(n, "Spectral") is good. But the middle color is too light. So I modify the middle color brewer.pal(5, "Spectral") cols[3] <- "#D4C683" # middle of "#FDAE61" and "#ABDDA4"  Red, Green, Blue alternatives • Red: "maroon" Heatmap for single channel How to Make a Heatmap of Customers in R, source code on github. geom_tile() and geom_text() were used. Heatmap in ggplot2 from https://r-charts.com/. # White <----> Blue RColorBrewer::display.brewer.pal(n = 8, name = "Blues") Heatmap for dual channels library(RColorBrewer) # Red <----> Blue display.brewer.pal(n = 8, name = 'RdBu') # Hexadecimal color specification brewer.pal(n = 8, name = "RdBu") plot(1:8, col=brewer_pal(palette = "RdBu")(8), pch=20, cex=4) # Blue <----> Red plot(1:8, col=rev(brewer_pal(palette = "RdBu")(8)), pch=20, cex=4) Don't rely on color to explain the data Don't use very bright or low-contrast colors, accessibility Create your own scale_fill_FOO and scale_color_FOO Themes and background for ggplot2 Background • Export plot in .png with transparent background in base R plot. x = c(1, 2, 3) op <- par(bg=NA) plot (x) dev.copy(png,'myplot.png') dev.off() par(op)  • Transparent background with ggplot2 library(ggplot2) data("airquality") p <- ggplot(airquality, aes(Solar.R, Temp)) + geom_point() + geom_smooth() + # set transparency theme( panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_rect(fill = "transparent",colour = NA), plot.background = element_rect(fill = "transparent",colour = NA) ) p ggsave("airquality.png", p, bg = "transparent")  • ggplot2 theme background color and grids ggplot() + geom_bar(aes(x=, fill=y)) + theme(panel.background=element_rect(fill='purple')) + theme(plot.background=element_blank()) ggplot() + geom_bar(aes(x=, fill=y)) + theme(panel.background=element_blank()) + theme(plot.background=element_blank()) # minimal background like base R # the grid lines are not gone; they are white so it is the same as the background ggplot() + geom_bar(aes(x=, fill=y)) + theme(panel.background=element_blank()) + theme(plot.background=element_blank()) + theme(panel.grid.major.y = element_line(color="grey")) # draw grid line on y-axis only ggplot() + geom_bar() + theme_bw() # very similar to theme_light() # have grid lines ggplot() + geom_bar() + theme_classic() # similar to base R graphic # no borders on top and right ggplot() + geom_bar() + theme_minimal() # no edge ggplot() + geom_bar() + theme_void() # no grid, no edge ggplot() + geom_bar() + theme_dark()  ggthmr ggthmr package Font size For example to make the subtitle font size smaller my_ggp + theme(plot.sybtitle = element_text(size = 8)) # Default font size seems to be 11 for title/subtitle  Remove x and y axis titles Rotate x-axis labels, change colors Counter-clockwise theme(axis.text.x = element_text(angle = 90, size=5, hjust=1)  Add axis on top or right hand side • Specify a secondary axis, sec_axis(). This new function was added in ggplot2 2.2.0; see here. • Create secondary x-axis in ggplot2. dup_axis(name, breaks, labels). Note that ggplot2 uses breaks while base R plot uses at. See R → Include labels on the top axis/margin: axis(). # Bottom x-axis is the quantiles and the top x-axis is the original values Fn <- ecdf(mtcars$mpg)
mtcars %>% dplyr::mutate(quantile = Fn(mpg)) %>%
ggplot(aes(x= quantile, y= disp)) +
geom_point() +
scale_x_continuous(name = "quantile of mpg",
breaks=c(.25, .5, .75, 1.0),
labels = c("0.25", "0.50", "0.75", "1.00"),
sec.axis = dup_axis(name = "mpg",
breaks = c(.25, .5, .75, 1.0),
labels = quantile(mtcars$mpg, c(.25, .5, .75, 1.0))))  • How to add line at top panel border of ggplot2 mtcars %>% ggplot(aes(x= mpg, y= disp)) + geom_point() + annotate(geom = 'segment', y = Inf, yend = Inf, color = 'green', x = -Inf, xend = Inf, size = 4)  • ggplot2: Secondary Y axis • Dual Y axis with R and ggplot2 Remove labels ggthemes package ggplot() + geom_bar() + theme_solarized() # sun color in the background theme_excel() theme_wsj() theme_economist() theme_fivethirtyeight()  rsthemes thematic Common plots Scatterplot Handling overlapping points (slides) and the ebook Fundamentals of Data Visualization by Claus O. Wilke. Scatterplot with histograms aes(color) groups Bubble Chart Ellipse ggside: scatterplot + marginal density plot ggextra: scatterplot + marginal histogram/density Line plots Ridgeline plots, mountain diagram Histogram Histograms is a special case of bar plots. Instead of drawing each unique individual values as a bar, a histogram groups close data points into bins. ggplot(data = txhousing, aes(x = median)) + geom_histogram() # adding 'origin =0' if we don't expect negative values. # adding 'bins=10' to adjust the number of bins # adding 'binwidth=10' to adjust the bin width Histogram vs barplot from deeply trivial. Boxplot Be careful that if we added scale_y_continuous(expand = c(0,0), limits = c(0,1)) to the code, it will change the boxplot if some data is outside the range of (0, 1). The console gives a warning message in this case. Base R method dim(df) # 112436 x 2 mycol <- c("#F8766D", "#7CAE00", "#00BFC4", "#C77CFF") # mycol defines colors of 4 levels in df$Method (a factor)
boxplot(df$value ~ df$Method, col = mycol, xlab="Method")


Color fill/scale_fill_XXX

n <- 100
k <- 12
set.seed(1234)
cond <- factor(rep(LETTERS[1:k], each=n))
rating <- rnorm(n*k)
dat <- data.frame(cond = cond, rating = rating)

p <- ggplot(dat, aes(x=cond, y=rating, fill=cond)) +
geom_boxplot()

p + scale_fill_hue() + labs(title="hue default") # Same as only p
p + scale_fill_hue(l=40, c=35) + labs(title="hue options")
p + scale_fill_brewer(palette="Dark2") + labs(title="Dark2")
p + colorspace::scale_fill_discrete_qualitative(palette = "Dark 3") + labs(title="Dark 3")
p + scale_fill_brewer(palette="Accent") + labs(title="Accent")
p + scale_fill_brewer(palette="Pastel1") + labs(title="Pastel1")
p + scale_fill_brewer(palette="Set1") + labs(title="Set1")
p + scale_fill_brewer(palette="Spectral") + labs(title ="Spectral")
p + scale_fill_brewer(palette="Paired") + labs(title="Paired")
# cbbPalette <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
# p + scale_fill_manual(values=cbbPalette)


ColorBrewer palettes RColorBrewer::display.brewer.all() to display all brewer palettes.

Reference from ggplot2. scale_fill_binned, scale_fill_brewer, scale_fill_continuous, scale_fill_date, scale_fill_datetime, scale_fill_discrete, scale_fill_distiller, scale_fill_gradient, scale_fill_gradientc, scale_fill_gradientn, scale_fill_grey, scale_fill_hue, scale_fill_identity, scale_fill_manual, scale_fill_ordinal, scale_fill_steps, scale_fill_steps2, scale_fill_stepsn, scale_fill_viridis_b, scale_fill_viridis_c, scale_fill_viridis_d

Jittering - plot the data on top of the boxplot

• What is a boxplot
• Quick look
# Only 1 variable
ggplot(data.frame(Wi), aes(y = Wi)) +
geom_boxplot()

# Two variable, one of them is a factor
ggplot() + geom_jitter(mapping = aes(x, y))

# Box plot
ggplot() + geom_boxplot(mapping = aes(x, y))
• geom_jitter()
• geom_jitter can affect both X and Y values.
tibble(x=1:4, y=1:4) %>% ggplot(aes(x, y)) + geom_jitter()

• https://stackoverflow.com/a/17560113
• How to make scatterplot with geom_jitter plot reproducible?
set.seed(1); data %>%
ggplot() +
geom_jitter(aes(T.categ, sex, colour = status))

• Boxplot with jittered data points in ggplot2
• # df2 is n x 2
ggplot(df2, aes(x=nboot, y=boot)) +
geom_boxplot(outlier.shape=NA) + #avoid plotting outliers twice
geom_jitter(aes(color=nboot), position=position_jitter(width=.2, height=0, seed=1)) +
labs(title="", y = "", x = "nboot")

If we omit the outlier.shape=NA option in geom_boxplot(), we will get the following plot where some outliers will appear twice. (Another option is outlier.color = NA; see extra point at boxplot with jittered points (ggplot2)).

• Base plot approach Batch effects and confounders
• Another base plot approach. boxplot() + stripchart(). See Stripchart in R, How to Create a Strip Chart in R. Consider to add outline = FALSE to boxplot() to avoid drawing outliers in boxplot() when stripchart() has been added.
ylim <- range(df$estimate, na.rm = TRUE) boxplot(estimate~type, data=df, xlab=NULL, ylab=NULL, ylim=ylim, outline=F) set.seed(1) stripchart(estimate~type, data=df, method = "jitter", pch=19, col=c("salmon", "orange", "yellowgreen", "green"), vertical=TRUE, add=TRUE) Groups of boxplots • How to Make Grouped Boxplot with Jittered Data Points in ggplot2. Use the color parameter in ggplot(aes()). • Boxplot With Jittered Points in R • How To Make Grouped Boxplots with ggplot2?, A review of Longitudinal Data Analysis in R. Use the fill parameter such as mydata %>% ggplot(aes(x=Factor1, y=Response, fill=factor(Factor2))) + geom_boxplot()  • Another method is to use ggpubr::ggboxplot(). Papers TumorPurity. ggboxplot(df, "dose", "len", fill = "dose", palette = c("#00AFBB", "#E7B800", "#FC4E07"), add.params=list(size=0.1), notch=T, add = "jitter", outlier.shape = NA, shape=16, size = 1/.pt, x.text.angle = 30, ylab = "Silhouette Values", legend="right", ggtheme = theme_pubr(base_size = 8)) + theme(plot.title = element_text(size=8,hjust = 0.5), text = element_text(size=8), title = element_text(size=8), rect = element_rect(size = 0.75/.pt), line = element_line(size = 0.75/.pt), axis.text.x = element_text(size = 7), axis.line = element_line(colour = 'black', size = 0.75/.pt), legend.title = element_blank(), legend.position = c(0,1), legend.justification = c(0,1), legend.key.size = unit(4,"mm"))  p-values on top of boxplots Violin plot and sina plot geom_density: Kernel density plot A panel of density plots • Common xlim for all subplots ggplot(data = mpg, aes(x = hwy)) + geom_density() + facet_wrap(~ class)  • Each subplot has its own xlim ggplot(data = mpg, aes(x = hwy)) + geom_density() + facet_wrap(~ class, scales = "free_x")  Bivariate analysis with ggpair GGally::ggpairs barplot/bar plot Ordered barplot and facet • ?reorder. This, as relevel(), is a special case of simply calling factor(x, levels = levels(x)[....]). R> bymedian <- with(InsectSprays, reorder(spray, count, median)) # bymedian will replace spray (a factor) # The data is not changed except the order of levels (a factor) # In this case, the order is determined by the median of count from each spray level # from small to large. R> InsectSprays[1:3, ] count spray 1 10 A 2 7 A 3 20 A R> bymedian [1] A A A A A A A A A A A A B B B B B B B B B B B B C C C C C C C C C C C C D D D D D D D [44] D D D D D E E E E E E E E E E E E F F F F F F F F F F F F attr(,"scores") A B C D E F 14.0 16.5 1.5 5.0 3.0 15.0 Levels: C E D A F B R> InsectSprays$spray
[1] A A A A A A A A A A A A B B B B B B B B B B B B C C C C C C C C C C C C D D D D D D D
[44] D D D D D E E E E E E E E E E E E F F F F F F F F F F F F
Levels: A B C D E F
R> boxplot(count ~ bymedian, data = InsectSprays,
xlab = "Type of spray", ylab = "Insect count",
main = "InsectSprays data", varwidth = TRUE,
col = "lightgray")

Scatterplot

tibble(y=sample(6), x=letters[1:6]) %>%
ggplot(aes(reorder(x, -y), y)) + geom_point(size=4)

• Sorting the x-axis in bargraphs using ggplot2 or this one from Deeply Trivial. reorder(fac, value) was used.
ggplot(df, aes(x=reorder(x, -y), y=y)) + geom_bar(stat = 'identity')

df$order <- 1:nrow(df) # Assume df$y is a continuous variable and df$fac is a character/factor variable # and we want to show factor according to the way they appear in the data # (not following R's order even the variable is of type "character" not "factor") # We like to plot df$fac on the y-axis and df$y on x-axis. Fortunately, # ggplot2 will draw barplot vertically or horizontally depending the 2 variables' types # The reason of using "-order" is to make the 1st name appears on the top ggplot(df, aes(x=y, y=reorder(fac, -order))) + geom_col() ggplot(df, aes(x=reorder(x, desc(y)), y=y)), geom_col() • Predict #TidyTuesday giant pumpkin weights with workflowsets. fct_reorder() • Reordering and facetting for ggplot2. tidytext::reorder_within() was used. • Chapter2 of data.table cookbook. reorder(fac, value) was used. • PCA and UMAP with tidymodels • A simple example dat <- structure(list(gene = c("CAPN9", "CSF3R", "HPN", "KCNA5", "MTMR7", "NRG3", "SMTNL2", "TMPRSS6"), coef = c(-1.238, -0.892, -0.224, -0.057, 0.133, 0.377, 0.436, 0.804)), row.names = c("4976", "6467", "12355", "13373", "18143", "19010", "23805", "25602"), class = "data.frame") # Base R plot par(mar=c(4,6,4,1)) barplot(dat$coef, names = dat$gene, horiz = T, las=1, main='base R', xlab = "Coefficients") # GGplot2 dat %>% ggplot(aes(y=gene, x=coef)) + geom_col(fill = 'gray') + theme(axis.ticks.y = element_blank()) + theme(panel.background = element_blank(), axis.line.x = element_line(colour = 'black')) + labs(x="Coefficients", y = '', title = "ggplot2")  , Proportion barplot Back to back barplot Pyramid Chart Flip x and y axes coord_flip() Rotate x-axis labels ggplot(mydf) + geom_col(aes(x = model, y=value, fill = method), position="dodge")+ theme(axis.text.x = element_text(angle = 45, hjust=1, size= 8))  Starts at zero scale_y_continuous(expand = c(0,0), limits = c(0, YourLimit))  Add patterns Barplot with colors for a 2nd variable By default, the barplots are stacked on top of each other. Use geom_col(position = "dodge") if we want the barplots to be side-by-side. df <- data.frame(group = c("A", "A", "B", "B", "C", "C"), count = c(3, 4, 5, 6, 7, 8), fill = c("red", "blue", "red", "blue", "red", "blue")) ggplot(df, aes(x = group, y = count, fill = fill)) + geom_col(position = "dodge")  Barplot with color gradient Barplot with only horizontal gridlines Barplot with text at the end Polygon and map plot geom_step: Step function Connect observations: geom_path(), geom_step() Example: KM curves (without legend) library(survival) sf <- survfit(Surv(time, status) ~ x, data = aml) sf str(sf) # the first 10 forms one strata and the rest 10 forms the other ggplot() + geom_step(aes(x=c(0, sf$time[1:10]), y=c(1, sf$surv[1:10])), col='red') + scale_x_continuous('Time', limits = c(0, 161)) + scale_y_continuous('Survival probability', limits = c(0, 1)) + geom_step(aes(x=c(0, sf$time[11:20]), y=c(1, sf$surv[11:20])), col='black') # cf: plot(sf, col = c('red', 'black'), mark.time=FALSE)  Same example but with legend (see Construct a manual legend for a complicated plot) cols <- c("NEW"="#f04546","STD"="#3591d1") ggplot() + geom_step(aes(x=c(0, sf$time[1:10]), y=c(1, sf$surv[1:10]), col='NEW')) + scale_x_continuous('Time', limits = c(0, 161)) + scale_y_continuous('Survival probability', limits = c(0, 1)) + geom_step(aes(x=c(0, sf$time[11:20]), y=c(1, sf$surv[11:20]), col='STD')) + scale_colour_manual(name="Treatment", values = cols)  To control the line width, use the size parameter; e.g. geom_step(aes(x, y), size=.5). The default size is .5 (where to find this info?). To allow different line types, use the linetype parameter. The first level is solid line, the 2nd level is dashed, ... We can change the default line types by using the scale_linetype_manual() function. See Line Types in R: The Ultimate Guide for R Base Plot and GGPLOT. Coefficients, intervals, errorbars Comparing similarities / differences between groups Special plots Dot plot & forest plot Lollipop plot geom_segment() + geom_point() ggpubr:: ggdotchart() Correlation Analysis Different Bump plot: plot ranking over time Gauge plots Sankey diagrams Horizon chart Circos plots Aesthetics • We can create a new aesthetic name in aes(aesthetic = variable) function; for example, the "text2" below. In this case "text2" name will not be shown; only the original variable will be used. library(plotly) g <- ggplot(tail(iris), aes(Petal.Length, Sepal.Length, text2=Species)) + geom_point() ggplotly(g, tooltip = c("Petal.Length", "text2"))  Aesthetics finder aes_string() group GUI/Helper packages ggedit & ggplotgui – interactive ggplot aesthetic and theme editor esquisse (French, means 'sketch'): creating ggplot2 interactively A 'shiny' gadget to create 'ggplot2' charts interactively with drag-and-drop to map your variables. You can quickly visualize your data accordingly to their type, export to 'PNG' or 'PowerPoint', and retrieve the code to reproduce the chart. The interface introduces basic terms used in ggplot2: • x, y, • fill (useful for geom_bar, geom_rect, geom_boxplot, & geom_raster, not useful for scatterplot), • color (edges for geom_bar, geom_line, geom_point), • size, • facet, split up your data by one or more variables and plot the subsets of data together. It does not include all features in ggplot2. At the bottom of the interface, • Labels & title & caption. • Plot options. Palette, theme, legend position. • Data. Remove subset of data. • Export & code. Copy/save the R code. Export file as PNG or PowerPoint. ggcharts ggeasy ggx https://github.com/brandmaier/ggx Create ggplot in natural language Interactive plotly ggiraph ggiraph: Make 'ggplot2' Graphics Interactive ggconf: Simpler Appearance Modification of 'ggplot2' Plotting individual observations and group means subplot Adding/Inserting an image to ggplot2 See also ggbernie which uses a different way ggplot2::layer() and a self-defined geom (geometric object). Easy way to mix/combine multiple graphs on the same page annotation_custom • predcurvePlot.R from TreatmentSelection. One issue is the font size is large for the text & labels at the bottom. The 2nd issue is the bottom part of the graph/annotation (marker value scale) can be truncated if the window size is too large. If the window is too small, the bottom part can overlap with the top part. p <- p + theme(plot.margin = unit(c(1,1,4,1), "lines")) # hard coding p <- p + annotation_custom() # axis for marker value scale p <- p + annotation_custom() # label only  • Similar plot but without using base R graphic. One issue is the text is not below the scale (this can be fixed by par(mar) & mtext(text, side=1, line=4)) and the 2nd issue is the same as ggplot2's approach. axis(1,at= breaks, label = round(quantile(x1, prob = breaks/100), 1),pos=-0.26) # hard coding  • Another common problem is the plot saved by pdf() or png() can be truncated too. I have a better luck with png() though. grid gridExtra Force a regular plot object into a Grob for use in grid.arrange gridGraphics package make one panel blank/create a placeholder # Method 1: Blank ggplot() + theme_void() # Method 2: Display N/A ggplot() + theme_void() + geom_text(aes(0,0,label='N/A'))  Overall title Remove vertical/horizontal grids but keep ticks patchwork Common legend library(ggplot2) library(patchwork) p1 <- ggplot(df1, aes(x = x, y = y, colour = group)) + geom_point(position = position_jitter(w = 0.04, h = 0.02), size = 1.8) p2 <- ggplot(df2, aes(x = x, y = y, colour = group)) + geom_point(position = position_jitter(w = 0.04, h = 0.02), size = 1.8) # Method 1: p1 + p2 + plot_layout(guides = "collect") + theme(legend.position = "bottom") # one legend on the bottom # Method 2: p1 + p2 + plot_layout(guides = "collect") # one legend on the RHS # Method 2: p1 + theme(legend.position="none") + p2 # legend (based on p2) is on the RHS # Method 3: p1 + p2 + theme(legend.position="none") # legend (based on p1) is in the middle!!  Overall title egg • egg (ggarrange()): Extensions for 'ggplot2', to Align Plots, Plot insets, and Set Panel Sizes. Same author of gridExtra package. egg depends on gridExtra. Common x or y labels Base R plot vs ggplot2 • My summary base-R ggplot2 plot(x, y, col) geom_point(aes(x, y, color, shape)) xlim scale_x_continuous(limits) log="x" scale_x_continuous(trans="log10") xlab mtext("Var", cex, line, adj, las, side) scale_x_discrete(name="sample size") labs(x) xlab() main labs(x, y, title, colour) ggtitle() axis(2, labels) scale_y_continuous(labels, breaks) scale_x_discrete(labels) ? scale_color_discrete('new color title') ? scale_shape_discrete('new shape title') col scale_color_manual(name, values = NamedVector) pch, cex geom_point(pch, size) plot(mpg, disp, col=factor(cyl)) legend("topleft", legend = sort(unique(cyl)), col=1:3, pch=1) # discrete case ggplot(mtcars, aes(mpg, disp, color = factor(cyl))) + geom_point() + labs(color = "Number of Cylinders") text() geom_text() ? theme(title = element_text(size=8), legend.title = element_blank(), legend.position = "none", legend.key = element_blank(), plot.title = element_text(hjust = 0.5), plot.sybtitle = element_text(size = 8)) las in plot(), barplot() text(x, y, labs, srt=45) theme(axis.text.x = element_text(angle = 90)) matplot() geom_line() + geom_point() plot(type = 'l'), points() geom_line() + geom_point() barplot() geom_bar() par(mfrow) facet_grid() labs for x and y axes x and y labels https://stackoverflow.com/questions/10438752/adding-x-and-y-axis-labels-in-ggplot2 or the Labels part of the cheatsheet You can set the labels with xlab() and ylab(), or make it part of the scale_*.* call. labs(x = "sample size", y = "ngenes (glmnet)") scale_x_discrete(name="sample size") scale_y_continuous(name="ngenes (glmnet)", limits=c(100, 500))  Change tick mark labels name-value pairs See several examples (color, fill, size, ...) from opioid prescribing habits in texas. Prevent sorting of x labels The idea is to set the levels of x variable. junk # n x 2 table colnames(junk) <- c("gset", "boot") junk$gset <- factor(junk$gset, levels = as.character(junk$gset))
ggplot(data = junk, aes(x = gset, y = boot, group = 1)) +
geom_line() +
theme(axis.text.x=element_text(color = "black", angle=30, vjust=.8, hjust=0.8))


Legends

Legend title

• labs() function
p <- ggplot(df, aes(x, y)) + geom_point(aes(colour = z))
p + labs(x = "X axis", y = "Y axis", colour = "Colour\nlegend")
# Use color to represent the legend title

p <- ggplot(df) + geom_col(aes(x=x, y=y, fill=cat), position = "dodge")
p + labs(x = "X", y = "Y", fill = "Category")
# Use fill to represent the legend title

• scale_colour_manual()
scale_colour_manual("Treatment", values = c("black", "red"))

• scale_color_discrete() and scale_shape_discrete(). See Combine colors and shapes in legend.
df <- data.frame(x = 1:3, y = 1:3, z = c("a", "b", "c"))
ggplot(df, aes(x, y)) + geom_point(aes(shape = z, colour = z), size=5) +
scale_color_discrete('new title') + scale_shape_discrete('new title')


Layout: move the legend from right to top/bottom of the plot or inside the plot or hide it

gg + theme(legend.position = "top")

# Useful in the boxplot case
gg + theme(legend.position="none")

gg + theme(legend.position = c(0.87, 0.25))

# Customize the edge color and background color
gapminder %>%
ggplot(aes(gdpPercap,lifeExp, color=continent)) +
geom_point() +
scale_x_log10()+
theme(legend.position = c(0.87, 0.25),
legend.background = element_rect(fill = "white", color = "black"))


Guide functions for finer control (legend, axis, color scales)

• https://ggplot2-book.org/scales.html#guide-functions The guide functions, guide_colourbar() and guide_legend(), offer additional control over the fine details of the legend.
• guide_legend() allows the modification of legends for scales, including fill, color, and shape. This function can be used in scale_fill_manual(), scale_fill_continuous(), ... functions.
scale_fill_manual(values=c("orange", "blue"),
guide=guide_legend(title = "My Legend Title",
nrow=1,  # multiple items in one row
label.position = "top", # move the texts on top of the color key
keywidth=2.5)) # increase the color key width


The problem with the default setting is it leaves a lot of white space above and below the legend. To change the position of the entire legend to the bottom of the plot, we use theme().

theme(legend.position = 'bottom')

• guides()
• Legend. For example, to remove the legend title:
ggplot(mtcars, aes(x = mpg, y = disp, color = factor(cyl))) +
geom_point() +
guides(color = guide_legend(title = NULL))

• Axis. For example, to change the angle of the x-axis labels:
ggplot(mtcars, aes(x = mpg, y = disp)) +
geom_point() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
guides(x = guide_axis(angle = 45))

• Color scales. For example, to change the number of color breaks:
ggplot(mtcars, aes(x = mpg, y = disp, color = hp)) +
geom_point() +
guides(color = guide_colorbar(nbin = 10))


Legend symbol background

ggplot() + geom_point(aes(x, y, color, size)) +
theme(legend.key = element_blank())
# remove the symbol background in legend


Legend size

data <- data.frame(x = 1:5, y = 1:5, label = c("A", "B", "C", "D", "E"))
ggplot(data, aes(x, y, color = as.factor(label))) +
geom_point() +
labs(title = "Legend Size Example with Theme Modification",
color = "Label") +
theme(
legend.text = element_text(size = 12),
legend.title = element_text(size = 14)
)


ggtitle()

Centered title

See the Legends part of the cheatsheet.

ggtitle("MY TITLE") +
theme(plot.title = element_text(hjust = 0.5))


Subtitle

ggtitle("My title",
subtitle = "My subtitle")


Aspect ratio

?coord_fixed

p <- ggplot(mtcars, aes(mpg, wt)) + geom_point()
p + coord_fixed() # plot is compressed horizontally
p  # fill up plot region


Time series plot

set.seed(45)
nc <- 9
df <- data.frame(x=rep(1:5, nc), val=sample(1:100, 5*nc),
variable=rep(paste0("category", 1:nc), each=5))
# plot
# http://colorbrewer2.org/#type=qualitative&scheme=Paired&n=9
ggplot(data = df, aes(x=x, y=val)) +
geom_line(aes(colour=variable)) +
scale_colour_manual(values=c("#a6cee3", "#1f78b4", "#b2df8a", "#33a02c", "#fb9a99", "#e31a1c", "#fdbf6f", "#ff7f00", "#cab2d6"))


Versus old fashion

dat <- matrix(runif(40,1,20),ncol=4) # make data
matplot(dat, type = c("b"),pch=1,col = 1:4) #plot
legend("topleft", legend = 1:4, col=1:4, pch=1) # optional legend

geom_point()

See Scatterplot.

df <- data.frame(x=1:3, y=1:3, color=c("red", "green", "blue"))
# Use I() to set aes values to the identify of a value from your data table
ggplot(df, aes(x,y, color=I(color))) + geom_point(size=10) # no color legend
# VS
ggplot(df, aes(x,y, color=color)) + geom_point(size=10) # color is like a class label


geom_bar(), geom_col(), stat_count()

• geom_bar: Counts the number of cases at each x position and makes the height of the bar proportional to the count (or sum of weights if supplied)
• geom_col: Leaves the data as is and makes the height of the bar proportional to the value in the data
Function Default Statistic Purpose
geom_bar() stat_count()
df2 <- data.frame(cat = c("A", "A", "A", "B", "B",
"B", "B", "B", "C", "C", "C", "C", "C", "C"))
ggplot(df2, aes(x = cat)) + geom_bar()
# Same as
# barplot(table(df2$cat))  geom_col() stat_identity() df <- data.frame(group = c("A", "B", "C"), count = c(3, 5, 6)) ggplot(df, aes(x = group, y = count)) + geom_col() # Same as # barplot(df$count, names.arg = dfgroup)  geom_col(position = 'dodge') # same as geom_bar(stat = 'identity', position = 'dodge')  geom_bar() can not specify the y-axis. To specify y-axis, use geom_col(). ggplot() + geom_col(mapping = aes(x, y))  Add colors to the plot df <- data.frame(group = c("A", "B", "C"), count = c(3, 5, 6), fill = c("red", "green", "blue")) ggplot(df, aes(x = group, y = count, fill = fill)) + geom_col()  Add numbers to the plot Ordered barplot and reorder() stat_function() stat_summary() stat_smooth(), geom_smooth() ggplot(data = mtcars, aes(x = wt, y = mpg)) + geom_point() + stat_smooth(method = "glm", formula = "y ~ x", method.args = list(family = poisson(link = "log")), se = FALSE, color = "red") + labs(x = "Weight", y = "Miles per gallon")  To control the smoothness, use the "span" parameter. To disable the confidence interval, use "se = F". geom_smooth(method = 'loess', se = FALSE, span = 0.3)  geom_area() Square shaped plot ggplot() + theme(aspect.ratio=1) # do not adjust xlim, ylim xylim <- range(c(x, y)) ggplot() + coord_fixed(xlim=xylim, ylim=xylim)  geom_line() See also aes(..., group, ...). Connect Paired Points with Lines in Scatterplot Use geom_line() to create a square bracket to annotate the plot Interaction plot geom_segment() Line segments, arrows and curves. See an example in geom_errorbar section below. Cf annotate("segment", ...) geom_errorbar(): error bars set.seed(301) x <- rnorm(10) SE <- rnorm(10) y <- 1:10 par(mfrow=c(2,1)) par(mar=c(0,4,4,4)) xlim <- c(-4, 4) plot(x[1:5], 1:5, xlim=xlim, ylim=c(0+.1,6-.1), yaxs="i", xaxt = "n", ylab = "", pch = 16, las=1) mtext("group 1", 4, las = 1, adj = 0, line = 1) # las=text rotation, adj=alignment, line=spacing par(mar=c(5,4,0,4)) plot(x[6:10], 6:10, xlim=xlim, ylim=c(5+.1,11-.1), yaxs="i", ylab ="", pch = 16, las=1, xlab="") arrows(x[6:10]-SE[6:10], 6:10, x[6:10]+SE[6:10], 6:10, code=3, angle=90, length=0) mtext("group 2", 4, las = 1, adj = 0, line = 1) • Forest plot example using geom_errorbarh() geom_rect(), geom_bar() Note that we can use scale_fill_manual() to change the 'fill' colors (scheme/palette). The 'fill' parameter in geom_rect() is only used to define the discrete variable. ggplot(data=) + geom_bar(aes(x=, fill=)) + scale_fill_manual(values = c("orange", "blue"))  geom_raster() and geom_tile() Waterfall plot geom_linerange Circle Circle in ggplot2 ggplot(data.frame(x = 0, y = 0), aes(x, y)) + geom_point(size = 25, pch = 1) Annotation Add a horizontal/vertical line geom_hline(yintercept=1000) geom_vline(xintercept=99)  text annotations, annotate() and geom_text(): ggrepel package • https://ggplot2-book.org/annotations.html annotate("text", label="Toyota", x=3, y=100) annotate("segment", x = 2.5, xend = 4, y = 15, yend = 25, colour = "blue", size = 2) geom_text(aes(x, y, label), data, size, vjust, hjust, nudge_x)  • Text annotations in ggplot2 p + geom_text(aes(x = -115, y = 25, label = "Map of the United States"), stat = "unique") p + geom_label(aes(x = -115, y = 25, label = "Map of the United States"), stat = "unique") # include border around the text • Use the nudge_y parameter to avoid the overlap of the point and the text such as ggplot() + geom_point() + geom_text(aes(x, y, label), color='red', data, nudge_y=1)  • What do hjust and vjust do when making a plot using ggplot? 0 means left-justified 1 means right-justified. This is necessary if we have multiples lines in text. By default, it will center-justified. • Volcano plots, EnhancedVolcano package • Visualization of Volcano Plots in R • AI library(ggplot2) library(ggrepel) set.seed(123) data <- data.frame( gene = paste("Gene", 1:1000, sep = "_"), log2FoldChange = rnorm(1000), pvalue = runif(1000) ) datapvalue[1:20] <- runif(20, 0, .001)
data$padj <- p.adjust(data$pvalue, method = "BH") # Adjusted p-values

significant_genes <- subset(data, padj < 0.05 & abs(log2FoldChange) > 1)

ggplot(data, aes(x = log2FoldChange, y = -log10(padj))) +
geom_point(aes(color = padj < 0.05 & abs(log2FoldChange) > 1), alpha = 0.5) +
scale_color_manual(values = c("grey", "red")) +
theme_minimal() +
labs(title = "Volcano Plot", x = "Log2 Fold Change", y = "-Log10 Adjusted P-Value") +
geom_label_repel(
data = significant_genes,
aes(label = gene),
size=3,
box.padding = 0.25,     # default
point.padding = 1e-06,  # default
max.overlaps = 10       # default
)


Text wrap

p <- ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point()

# Solution 1: Not work with Chinese characters
wrapper <- function(x, ...) paste(strwrap(x, ...), collapse = "\n")
# The a label
my_label <- "Some arbitrarily larger text"
# and finally your plot with the label
p + annotate("text", x = 4, y = 25, label = wrapper(my_label, width = 5))

# Solution 2: Not work with Chinese characters
library(RGraphics)
library(ggplot2)
p <- ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point()
grob1 <-  splitTextGrob("Some arbitrarily larger text")
p + annotation_custom(grob = grob1,  xmin = 3, xmax = 4, ymin = 25, ymax = 25)

# Solution 3: stringr::str_wrap()
my_label <- "太極者無極而生。陰陽之母也。動之則分。靜之則合。無過不及。隨曲就伸。人剛我柔謂之走。我順人背謂之黏。"
p <- ggplot() + geom_point() + xlim(0, 400) + ylim(0, 300) # 400x300 e-paper
p + annotate("text", x = 0, y = 200, hjust=0, size=5,
label = stringr::str_wrap(my_label, width =30)) +
theme_bw () +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
axis.title = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank())


Other geoms

geomtextpath

geomtextpath- Create curved text in ggplot2

Save the plots -- ggsave()

ggsave(). Note svglite package is required, see R Graphics Cookbook. The svglite package provides more standards-compliant output.

By default the units of width & height is inch no matter what output formats we choose.

(3/24/2022) If I save the plot in the svg format using RStudio GUI (Export -> As as Image...) or by the svg() function, the svg plot can't be converted to a png file by ImageMagick. But if I save the plot by using the ggsave() command, the svg plot can be converted to a png file.

$convert -resize 100% Rerrorbar.svg tmp.png convert-im6.q16: non-conforming drawing primitive definition path' @ error/draw.c/RenderMVGContent/4300.$ convert -resize 100% Rerrorbar2.svg tmp.png # Works


(1/31/2022) For some reason, the text in legend in svg files generated by ggsave() looks fine in browsers but when I insert it into ppt, the word "Sensitive" becomes "Sensitiv e". However, the svg files generated by svg() command looks fine in browsers AND in ppt.

ggsave() will save a plot with the width/height based on the current graphical device if we don't specify them. That's why after we issue ggsave() it will tell us the image size (inch). So in order to have a fixed width/height, we need to specify them explicitly. See

My experience is ggsave() is better than png() because ggsave() makes the text larger when we save a file with a higher resolution.

...
ggsave("filename.png", object, width=8, height=4)
# vs
png("filename.png", width=1200, height=600)
...
dev.off()


We can specify dpi to increase the resolution if we use the png format (svg is not affected); see Chapter 14.5 Outputting to Bitmap (PNG/TIFF) Files from R Graphics Cookbook.

g1 <- ggplot(data = mydf)
g1
ggsave("myfile.png", g1, height = 7, width = 8, units = "in", dpi = 300)

I got an error - Error in loadNamespace(name) : there is no package called ‘svglite’. After I install the package, everything works fine.

ggsave("raw-output.bmp", p, width=4, height=3, dpi = 100)
# Will generate 4*100 x 3*100 pixel plot


Note:

• For saving to "png" file, increasing dpi (from 72 to 300) will increase font & point size. dpi/ppi is not an inherent property of an image.
• If we don't specify any parameters and without resizing the graphics device size, then "png" file created by ggsave() will contain much more pixels compared to "svg" file (e.g. 1200 vs 360).
• How ggsave() decides width/height if a svg file was used in an Rmd file? A: 7x7 from my experiment. So the font/point size will be smaller compared to a 4x4 inch output.
• When I created an svg file in Linux with 4x4 inch (width x height), the file is 360 x 360 pixels when I right click the file to get the properties of the file. But macOS cannot return this number nor am I able to find this number from the svg file??

Multiple pages in pdf

https://stackoverflow.com/a/53698682. The key is to save the plot in an object and use the print() function.

pdf("FileName", onefile = TRUE)
for(i in 1:I) {
p <- ggplot()
print(p)
}
dev.off()
`

Other tips/FAQs

ggplot2 does not appear to work when inside a function

https://stackoverflow.com/a/17126172. Use print() or ggsave(). When you use these functions interactively at the command line, the result is automatically printed, but in source() or inside your own functions you will need an explicit print() statement.

ggstatsplot

ggstatsplot: ggplot2 Based Plots with Statistical Details

Some packages depend on ggplot2

dittoSeq from Bicoonductor

Python

plotnine: A Grammar of Graphics for Python.

plotnine is an implementation of a grammar of graphics in Python, it is based on ggplot2. The grammar allows users to compose plots by explicitly mapping data to the visual objects that make up the plot.