Ggplot2: Difference between revisions

From 太極
Jump to navigation Jump to search
(121 intermediate revisions by the same user not shown)
Line 162: Line 162:

== facet_wrap and facet_grid to create a panel of plots ==
== facet_wrap and facet_grid to create a panel of plots ==
* '''facet_wrap'''(, nrow=4, ncol=3) in ggplot2 provides a solution similar to par(mfrow=c(4, 3)) in base R.
* Another example [ Polls v results]
* Another example [ Polls v results]
Line 179: Line 180:
[ Multiclass predictive modeling for #TidyTuesday NBER papers]
[ Multiclass predictive modeling for #TidyTuesday NBER papers]
<li>[ changing the facet_wrap labels using labeller in ggplot2]. The solution is to create a '''labeller''' function as a function of a variable x (or any other name as long as it's not the faceting variables' names) and then coerce to labeller with '''as_labeller'''.
== lattice::xyplot ==
df <- data.frame(x = rnorm(100), y = rnorm(100), group = sample(c("A", "B"), 100, replace = TRUE))
# Use the xyplot() function to create the plot
# with each group represented by a different color
# result is 1 plot only
# no annotation
xyplot(y ~ x, data = df, groups = group)
df <- data.frame(x = rnorm(100), y = rnorm(100),
                group = sample(c("A", "B"), 100, replace = TRUE),
                time = sample(c("T1", "T2"), 100, replace = TRUE))
# 2 plots grouped by time
# two colors (defined by group) was used in each plot
# no annotation
xyplot(y ~ x | time, groups = group, data = df)
For more complicated plot, we can use the '''panel''' parameter.

= Color palette =
= Color palette =
Line 188: Line 213:
* [ Ten simple rules to colorize biological data visualization]
* [ Ten simple rules to colorize biological data visualization]
* [ a MEGA thread about all the ways you can choose a palette] May 2021
* [ a MEGA thread about all the ways you can choose a palette] May 2021
* [ How to select Colors for Data Visualizations?]
== Top color palettes ==
* [ Top R Color Palettes to Know for Great Data Visualization]
* [ Top R Color Palettes to Know for Great Data Visualization]

== Color blind ==
== Display color palettes ==
[ colorblindcheck]: Check Color Palettes for Problems with Color Vision Deficiency
<li>Use barplot()
pal <- c("#E41A1C", "#377EB8", "#4DAF4A", "#984EA3", "#FF7F00")
# pal <- sample(colors(), 10) # randomly pick 10 colors
barplot(rep(1, length(pal)), col = pal, space = 0,
        axes = FALSE, border = NA)
# [1] -0.20  5.20 -0.01  1.00

== Color picker ==
<li>Use heatmap()
pal <- c("#E41A1C", "#377EB8", "#4DAF4A", "#984EA3", "#FF7F00")
pal <- matrix(pal, nr=2) # acknowledge a nice warning message
#      [,1]      [,2]      [,3]   
# [1,] "#E41A1C" "#4DAF4A" "#FF7F00"
# [2,] "#377EB8" "#984EA3" "#E41A1C"
pal_matrix <- matrix(seq_along(pal), nr=nrow(pal), nc=ncol(pal))
heatmap(pal_matrix, col = pal, Rowv = NA, Colv = NA, scale = "none",
        ylab = "", xlab = "", main = "", margins = c(5, 5))
# 2 rows, 3 columns with labeling on two axes
# [1] 0 1 0 1

<li>Use image()
> library(colourpicker)
pal <- palette() # R 4.0 has a new default palette
> plotHelper(colours=5)
                # The old colors are highly saturated and vary enormousely
                # in terms of luminance
# [1] "black"  "#DF536B" "#61D04F" "#2297E6" "#28E2E5" "#CD0BBC" "#F5C710"
# [8] "gray62"
pal_matrix <- matrix(seq_along(pal), nr=1)
image(pal_matrix, col = pal, axes = FALSE)
# 8 rows, 1 column, but no labeling
# Starting from bottom, left.

Listening on
par()$usr  # change with the data dim
text(0, (par()$usr[4]-par()$usr[3])/8*c(0:7),
    labels = pal)

== Color names, Complementary/Inverted colors ==
<li>Use [ scales::show_col()]
* [ ColorNameR] - A tool for transforming coordinates in a color space to common color names using data from the Royal Horticultural Society and the International Union for the Protection of New Varieties of Plants.  
* [ ColorHexa]
== colors() ==
In R, colors() is a function that returns a character vector of color names available in R.
To obtain the hexadecimal codes for all colors obtained by colors()
rgb_values <- col2rgb(colors())
# Convert the RGB values to hexadecimal codes
hex_codes <- apply(rgb_values, 2,
                  function(x) rgb(x[1], x[2], x[3],
                  maxColorValue = 255))
# View the first few hexadecimal codes
== palette() ==
* [ A New palette() for R 4.0]
* [ ?palette] and [ the dev version]
* [ 4 for 4.0.0 – Four Useful New Features in R 4.0.0]
* [ Improved color palettes in R]
== rainbow ==
* [ ?rainbow]
* Below compare the effects of 's' and 'v' parameters. '''s (saturation)''' and '''v (value)''': These parameters control the color intensity and brightness, respectively. See also [ HSL and HSV] from wikipedia.
** '''Saturation (s)''': Determines how '''vivid''' or muted the colors are. A value of 1 (default) means fully saturated colors, while lower values reduce the intensity.
** '''Value (v)''': Controls the '''brightness'''. A value of 1 (default) results in full brightness, while lower values make the colors darker.
[[File:Rainbow default.png|250px]] [[File:Rainbow s05.png|250px]] [[File:Rainbow v05.png|250px]]
== Color blind ==
[ colorblindcheck]: Check Color Palettes for Problems with Color Vision Deficiency
== Color picker ==
> library(colourpicker)
> plotHelper(colours=5)
Listening on
== Color names, Complementary/Inverted colors ==
* [ ColorNameR] - A tool for transforming coordinates in a color space to common color names using data from the Royal Horticultural Society and the International Union for the Protection of New Varieties of Plants.  
* [ ColorHexa]

== colorspace package ==
== colorspace package ==
Line 217: Line 330:
** why it does not "Set 1"?  
** why it does not "Set 1"?  
** the "Dark 2" colors are not the same as in [ RColorBrewer].
** the "Dark 2" colors are not the same as in [ RColorBrewer].
== cols4all ==
* You can use '''cols4all''' palettes in ggplot2.
<syntaxhighlight lang='rsplus'>
c4a_gui() # it will create a shiny interface (but R will not be used at the same time)
c4a_types() # understand abbreviation
c4a_series() # 16 series like brewer, hcl, tableau, viridis, etc
c4a_overview() # how many palettes per series x types
c4a_palettes(type = "div", series = "hcl") # What palettes are available
# Give me the colors
c4a("hcl.purple_green", 11)
c4a("brewer.accent", 2)    # the 1st one on the website
# Plot the colors
c4a_plot("hcl.purple_green", 11, = TRUE)

== *paletteer package ==
== *paletteer package ==
* [ The paletteer package offers direct access to 1759 color palettes, from 50 different packages!]
* [ The paletteer package offers direct access to 1759 color palettes, from 50 different packages!]
* [ paletteer], [ paletteer_d()] function for getting discrete palette by package and name.
* [ paletteer], [ paletteer_d()] function for getting discrete palette by package and name.  
* Interactive and choose 'sort by length'
* [ Palettes sorted by type (Sequential/Diverging/Qualitative)]
* [ *More examples with a gallery]
* [ *More examples with a gallery]

<syntaxhighlight lang='rsplus'>
#67001FFF #B2182BFF #D6604DFF #F4A582FF #FDDBC7FF #F7F7F7FF  
#67001FFF #B2182BFF #D6604DFF #F4A582FF #FDDBC7FF #F7F7F7FF  
Line 240: Line 376:
                                   "versicolor" = "#5C88DAFF",  
                                   "versicolor" = "#5C88DAFF",  
                                   "virginica" = "#84BD00FF"))
                                   "virginica" = "#84BD00FF"))

== ggsci ==
== ggsci ==
Line 274: Line 410:
== Pride palette ==
== Pride palette ==
[ Show Pride on Your Plots]. [ gglgbtq] package
[ Show Pride on Your Plots]. [ gglgbtq] package
== unikn ==
* [ unikn]: Enabling corporate design elements in R (with colors and color-related functions). The curve plot is interesting.
* [ 12 ggplot extensions for snazzier R graphics]

== Colour related aesthetics: colour, fill and alpha ==
== Colour related aesthetics: colour, fill and alpha ==
Line 283: Line 423:
ggplot(aes(x, y)) +
ggplot(aes(x, y)) +
For base R, we can use the '''alpha''' parameter [ rgb(,,,alpha)],
plot(x, y, col=rgb(0,0,0, alpha=.1))
polygon(df, col=adjustcolor(c("red", "blue"), alpha.f=.3))

Line 297: Line 442:
ggplot(mtcars, aes(x=hp, y=mpg)) +
ggplot(mtcars, aes(x=hp, y=mpg)) +
   geom_point(aes(shape=factor(cyl), colour=factor(cyl))) +
   geom_point(aes(shape=factor(cyl), colour=factor(cyl))) +
   scale_shape_discrete("Cylinders") +
   scale_shape_discrete("Cylinders") + # change the legend title from 'factor(cyl)' to 'Cylinders'
   scale_colour_discrete("Cylinders") # combine shape and colour in one legend; avoid another legend for colour
<li>[ GGPLOT Point Shapes Best Tips] </li>
<li>[ GGPLOT Point Shapes Best Tips] </li>
<li>Simulated data
df <- data.frame(x = rnorm(100), y = rnorm(100),
                Treatment = rep(c("Before", "After"), each = 50),
                Response = rep(c("Sensitive", "Resistant"), each = 50),
                Subject = rep(1:50, times = 2))
ggplot(df, aes(x = x, y = y, shape = Treatment, color = Response)) +
  geom_point() +
  geom_line(aes(group = Subject), alpha = 0.5) +  # Add lines connecting the same subject
  scale_shape_manual(values = c(16, 17)) +  # You can choose different shapes
  scale_color_manual(values = c("blue", "red")) +  # You can choose different colors
  theme_minimal() +
  labs(title = "Scatterplot with Different Shapes and Colors",
      x = "X-axis label",
      y = "Y-axis label",
      shape = "Treatment",
      color = "Response")

Line 309: Line 473:
* [ scales 1.2.0]
* [ scales 1.2.0]

=== ggplot2::scale - axes/axis, legend ===
=== ggplot2::scale_* - axes/axis, legend === and [ reference of all scale_* functions]. Modifies the scales of the axes, such as the x- and y-axes, color, size, etc.

Naming convention: <span style="color: red">'''scale_AestheticName_NameDataType'''</span> where  
Naming convention: <span style="color: red">'''scale_AestheticName_NameDataType'''</span> where  
* AestheticName can be '''x, y, color, fill, size, shape, ...'''
* AestheticName can be '''x, y, color, fill, size, shape, ...'''
* NameDataType can be '''continuous, discrete''', '''manual''' or '''gradient'''.
* NameDataType can be '''continuous, discrete''', '''manual''' or '''gradient'''.
* Table of common functions
{| class="wikitable"
! scale_AestheticName_NameDataType
| scale_x_continuous<br />scale_x_discrete
| scale_x_log10
| scale_color_continuous, <br />scale_color_gradient<br />scale_color_discrete<br />scale_color_brewer<br />scale_color_manual<br />scale_color_paletteer_d
| scale_shape_discrete
| scale_fill_brewer, <br />scale_fill_continuous,<br />scale_fill_discrete, <br />scale_fill_gradient<br />scale_fill_grey, <br />scale_fill_hue<br />scale_fill_manual,<br />scale_colour_viridis_d

Line 349: Line 535:
<li>[ How to change the color in geom_point or lines in ggplot]
<li>See an example at [[#geom_linerange|geom_linerange]] where we have to specify the ''limits'' parameter in order to make "8" < "16" < "20"; otherwise it is 16 < 20 < 8.
ggplot() +
Browse[2]> order(coordinates$chr)
  geom_point(data = data, aes(x = time, y = y, color = sample),size=4) +
[1] 3 4 1 2
  scale_color_manual(values = c("A" = "black", "B" = "red"))
Browse[2]> coordinates$chr
[1] "20" "8" "16" "16"
ggplot(data = data, aes(x = time, y = y, color = sample)) +
  geom_point(size=4) +
  geom_line(aes(group = sample)) +
  scale_color_manual(values = c("A" = "black", "B" = "red"))
<li>See an example at [[#geom_linerange|geom_linerange]] where we have to specify the ''limits'' parameter in order to make "8" < "16" < "20"; otherwise it is 16 < 20 < 8.
<li>Differences of scale_color_gradient() and scale_color_continuous()
* '''scale_color_gradient()''' (more common than scale_color_continuous) is used to map a continuous variable to a color gradient. It takes two arguments: low and high, which specify the colors for the minimum and maximum values of the variable, respectively. The gradient is automatically generated between these two colors.
ggplot(data = diamonds, aes(x = carat, y = price, color = depth)) +
  geom_point() +
  scale_color_gradient(low = "blue", high = "red")
* '''scale_color_continuous()''' (useful if we want to specify the labels to display on legend) does not automatically generate the color scale. Instead, it requires the user to specify the values to which the colors should be mapped. The limits argument sets the minimum and maximum values for the variable, and the breaks argument specifies the values at which breaks occur.
Browse[2]> order(coordinates$chr)
ggplot(data = diamonds, aes(x = carat, y = price, color = depth)) +
[1] 3 4 1 2
    geom_point() +
Browse[2]> coordinates$chr
    scale_color_continuous(name = "Depth",
[1] "20" "8" "16" "16"
                            limits = c(40, 80),
                            breaks = c(40, 60, 80),
                            labels = c("Shallow", "Moderate", "Deep"), # display on legend
                            type = "gradient")
Line 380: Line 572:

=== Emulate ggplot2 default color palette ===
=== Emulate ggplot2 default color palette ===
It is just equally spaced hues around the color wheel.
The above can be created by R >= 4.0.0 using the command '''scales::show_col(palette.colors(palette = "ggplot2"))'''. We should ignore the 1st color (black). Also if n>=5, the colors do not match with the result of '''show_col(hue_pal()(5))''' .
'''Answer 1''' It is just equally spaced hues around the color wheel.
[ Emulate ggplot2 default color palette]
[ Emulate ggplot2 default color palette]
'''Answer 1'''
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
gg_color_hue <- function(n) {
gg_color_hue <- function(n) {
Line 413: Line 607:
R has a function called colorName() to convert a hex code to color name; see [ roloc] package on [ CRAN].
R has a function called colorName() to convert a hex code to color name; see [ roloc] package on [ CRAN].

=== transform scales ===
=== How to change the default color palette in geom_XXX ===
[ How to make that crazy Fox News y axis chart with ggplot2 and scales]
<li>[ Simple custom colour palettes with R ggplot graphs]
<li>Change the color palette for all plots
<li>Create a Custom Theme
# Define a custom theme with a specific color palette
custom_theme <- theme_minimal() +
  scale_fill_manual(values = c("red", "blue", "green", "purple")) +
  scale_color_manual(values = c("red", "blue", "green", "purple"))

== Class variables ==
# Set the custom theme as the default
"Set1" is a good choice. See [ RColorBrewer::display.brewer.all()]
<li>[ ggthemr] package
<li>[ rcartocolor] package

== Red, Green, Blue alternatives ==
<li>Change the color palette for the current plot only:
* Red: "maroon"
<li>Using scale_fill_manual() and scale_color_manual()

== Heatmap for single channel ==
data <- data.frame(
[ How to Make a Heatmap of Customers in R], [ source code] on github. geom_tile() and geom_text() were used. [ Heatmap in ggplot2] from
  category = c("A", "B", "C", "D"),
  value = c(3, 5, 2, 8)
ggplot(data, aes(x = category, y = value, fill = category)) +
<syntaxhighlight lang='rsplus'>
  geom_bar(stat = "identity") +
# White <----> Blue
  scale_fill_manual(values = c("red", "blue", "green", "purple")) +
RColorBrewer::display.brewer.pal(n = 8, name = "Blues")
<li>Using scale_fill_brewer() and scale_color_brewer()
== Heatmap for dual channels ==
<pre> <syntaxhighlight lang='rsplus'>
# Red <----> Blue
display.brewer.pal(n = 8, name = 'RdBu')
# Hexadecimal color specification
brewer.pal(n = 8, name = "RdBu")

plot(1:8, col=brewer_pal(palette = "RdBu")(8), pch=20, cex=4)
ggplot(data, aes(x = category, y = value, fill = category)) +
  geom_bar(stat = "identity") +
  scale_fill_brewer(palette = "Set3") +
<li>Using scale_fill_viridis() and scale_color_viridis()

# Blue <----> Red
ggplot(data, aes(x = category, y = value, fill = category)) +
plot(1:8, col=rev(brewer_pal(palette = "RdBu")(8)), pch=20, cex=4)
  geom_bar(stat = "identity") +
  scale_fill_viridis(discrete = TRUE) +
<li>Using scale_fill_hue() and scale_color_hue()
ggplot(data, aes(x = category, y = value, fill = category)) +
  geom_bar(stat = "identity") +
  scale_fill_hue(h = c(0, 360), l = 65, c = 100) +
<li>[ How to change the color in geom_point or lines in ggplot]
ggplot() +
  geom_point(data = data, aes(x = time, y = y, color = sample),size=4) +
  scale_color_manual(values = c("A" = "black", "B" = "red"))

ggplot(data = data, aes(x = time, y = y, color = sample)) +
  geom_point(size=4) +
  geom_line(aes(group = sample)) +
  scale_color_manual(values = c("A" = "black", "B" = "red"))

== Don't rely on color to explain the data ==
=== transform scales ===
[ ggpattern]
[ How to make that crazy Fox News y axis chart with ggplot2 and scales]

== Don't use very bright or low-contrast colors, accessibility ==
== Class variables ==
* [ Color Contrast Accessibility Validator]
* [ Google Lighthouse]
<li>"Set1" is a good choice. See [ RColorBrewer::display.brewer.all()]
<li>For ordinal variable, brewer.pal(n, "Spectral") is good. But the middle color is too light. So I modify the middle color
brewer.pal(5, "Spectral")
cols[3] <- "#D4C683" # middle of "#FDAE61" and "#ABDDA4"

== Create your own scale_fill_FOO and scale_color_FOO ==
== Red, Green, Blue alternatives ==
[ Custom colour palettes for {ggplot2}]
* Red: "maroon"

= Themes and background for ggplot2 =
== Heatmap for single channel ==
* [ ggplot2 Theme Elements Demonstration]
[ How to Make a Heatmap of Customers in R], [ source code] on github. geom_tile() and geom_text() were used. [ Heatmap in ggplot2] from

== Background ==
<syntaxhighlight lang='rsplus'>
<li>[ Export plot in .png with transparent background] in base R plot.
# White <----> Blue
RColorBrewer::display.brewer.pal(n = 8, name = "Blues")
x = c(1, 2, 3)
op <- par(bg=NA)
plot (x)

== Heatmap for dual channels == <syntaxhighlight lang='rsplus'>
# Red <----> Blue
display.brewer.pal(n = 8, name = 'RdBu')
<li>[ Transparent background with ggplot2]
# Hexadecimal color specification
brewer.pal(n = 8, name = "RdBu")
plot(1:8, col=brewer_pal(palette = "RdBu")(8), pch=20, cex=4)
# Blue <----> Red
plot(1:8, col=rev(brewer_pal(palette = "RdBu")(8)), pch=20, cex=4)

p <- ggplot(airquality, aes(Solar.R, Temp)) +
== Don't rely on color to explain the data ==
    geom_point() +
[ ggpattern]
    geom_smooth() +
    # set transparency
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.background = element_rect(fill = "transparent",colour = NA),
        plot.background = element_rect(fill = "transparent",colour = NA)
ggsave("airquality.png", p, bg = "transparent")
<li>[ ggplot2 theme background color and grids]
ggplot() + geom_bar(aes(x=, fill=y)) +
          theme(panel.background=element_rect(fill='purple')) +

ggplot() + geom_bar(aes(x=, fill=y)) +
== Don't use very bright or low-contrast colors, accessibility ==
          theme(panel.background=element_blank()) +
* [ Color Contrast Accessibility Validator]
          theme(plot.background=element_blank()) # minimal background like base R
* [ Google Lighthouse]
          # the grid lines are not gone; they are white so it is the same as the background

ggplot() + geom_bar(aes(x=, fill=y)) +
== Create your own scale_fill_FOO and scale_color_FOO ==
          theme(panel.background=element_blank()) +
[ Custom colour palettes for {ggplot2}]
          theme(plot.background=element_blank()) +
          theme(panel.grid.major.y = element_line(color="grey"))
          # draw grid line on y-axis only

ggplot() + geom_bar() +
= Themes and background for ggplot2 =
          theme_bw()  # very similar to theme_light()
* [ Getting started with theme()] 2023/11/23
* [ ggplot2 Theme Elements Demonstration]

ggplot() + geom_bar() +
== Background ==
          theme_minimal() # no edge
<li>[ Export plot in .png with transparent background] in base R plot.
x = c(1, 2, 3)
op <- par(bg=NA)
plot (x)

ggplot() + geom_bar() +
          theme_void() # no grid, no edge
ggplot() + geom_bar() +
<li>[ Transparent background with ggplot2]

== ggthmr ==
p <- ggplot(airquality, aes(Solar.R, Temp)) +
[ ggthmr] package
    geom_point() +
    geom_smooth() +
    # set transparency
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.background = element_rect(fill = "transparent",colour = NA),
        plot.background = element_rect(fill = "transparent",colour = NA)
ggsave("airquality.png", p, bg = "transparent")
<li>[ ggplot2 theme background color and grids]
ggplot() + geom_bar(aes(x=, fill=y)) +
          theme(panel.background=element_rect(fill='purple')) +

== Font size ==
ggplot() + geom_bar(aes(x=, fill=y)) +
          theme(panel.background=element_blank()) +
* [ Change Font Size of ggplot2 Plot in R (5 Examples) | Axis Text, Main Title & Legend]
          theme(plot.background=element_blank()) # minimal background like base R
* [ What is the default font for ggplot2]
          # the grid lines are not gone; they are white so it is the same as the background
* [ Fonts] from Cookbook for R

For example to make the subtitle font size smaller
ggplot() + geom_bar(aes(x=, fill=y)) +
          theme(panel.background=element_blank()) +
my_ggp + theme(plot.sybtitle = element_text(size = 8))  
          theme(plot.background=element_blank()) +
# Default font size seems to be 11 for title/subtitle
          theme(panel.grid.major.y = element_line(color="grey"))
          # draw grid line on y-axis only
ggplot() + geom_bar() +
          theme_bw()  # very similar to theme_light()
                      # have grid lines
ggplot() + geom_bar() +
          theme_classic() # similar to base R graphic
                      # no borders on top and right
ggplot() + geom_bar() +
          theme_minimal() # no edge

== Remove x and y axis titles ==
ggplot() + geom_bar() +
[ ggplot2 title : main, axis and legend titles]
          theme_void() # no grid, no edge

== Rotate x-axis labels, change colors ==
ggplot() + geom_bar() +
theme(axis.text.x = element_text(angle = 90)

[ customize ggplot2 axis labels with different colors]
== ggthmr ==
[ ggthmr] package
== Font size ==
* [ Change Font Size of ggplot2 Plot in R (5 Examples) | Axis Text, Main Title & Legend]
* [ What is the default font for ggplot2]
* [ Fonts] from Cookbook for R
For example to make the subtitle font size smaller
my_ggp + theme(plot.sybtitle = element_text(size = 8))
# Default font size seems to be 11 for title/subtitle
== Remove x and y axis titles ==
[ ggplot2 title : main, axis and legend titles]
== Rotate x-axis labels, change colors ==
theme(axis.text.x = element_text(angle = 90, size=5, hjust=1)
[ customize ggplot2 axis labels with different colors]

== Add axis on top or right hand side ==
== Add axis on top or right hand side ==
Line 618: Line 900:
=== aes(color) ===
=== aes(color) ===
<li><span style="color: blue">Discrete colors</span>. [ ?scale_colour_brewer]. [ How to fix 'continuous value supplied to discrete scale' in with scale_color_brewer]. [ Change ggplot2 Color & Fill Using scale_brewer Functions & RColorBrewer Package in R]
ggplot(mpg, aes(x = hwy, y = cty)) +
  geom_point(aes(color = class), palette = "Set2")
ggplot(mpg, aes(x = displ, y = hwy, colour = manufacturer)) +
  geom_point() +
  scale_colour_brewer(palette = "Set3")
<li><span style="color: blue">Continuous colors</span>. The default color scale is [ ?scale_colour_gradient] with prespecified 'low' and 'high' colors. [ ?scale_colour_continuous].
ggplot(mpg, aes(x = displ, y = hwy, color = cty)) +
  geom_point(size = 2) +
  scale_color_continuous("City Miles Per Gallon")
# scale_color_continuous("City MPG Rating", low = "springgreen3", high = "red")
<li>[ ggplot2 colors : How to change colors automatically and manually?] (mainly the scatterplot and box plots)
<li>[ Colour related aesthetics: colour, fill, and alpha]
<li>[ Colour related aesthetics: colour, fill, and alpha]
Line 631: Line 930:
<li>[ How to highlight data in ggplot2] </li>
<li>[ How to highlight data in ggplot2] </li>
=== groups ===
* [ How To Add Regression Line per Group to Scatterplot in ggplot2?] '''geom_smooth()'''
* Multiple fitted lines in one plot
[[File:Geom smooth ex.png|250px]]

=== Bubble Chart ===
=== Bubble Chart ===
Line 639: Line 943:
* [ ggplot2::stat_ellipse()]
* [ ggplot2::stat_ellipse()]
* [ How can a data ellipse be superimposed on a ggplot2 scatterplot?]. Hint: use the [ ellipse] package.
* [ How can a data ellipse be superimposed on a ggplot2 scatterplot?]. Hint: use the [ ellipse] package.
=== ggside: scatterplot + marginal density plot ===
* [ ggside] package
=== ggextra: scatterplot + marginal histogram/density ===

== Line plots ==
== Line plots ==
Line 651: Line 962:
* [ Elegant Visualization of Density Distribution in R Using Ridgeline]
* [ Elegant Visualization of Density Distribution in R Using Ridgeline]
* [ An example] from ''Scientific Reports''.
* [ An example] from ''Scientific Reports''.
* [ CP 1919 / PSR B1919+21 Dataset]

== Histogram ==
== Histogram ==
Line 663: Line 975:

[ Histogram vs barplot] from deeply trivial.
[ Histogram vs barplot] from deeply trivial.
=== Multiple variables ===
* [ How can I plot two histograms together in R?]
* [ How to Plot Multiple Histograms in R]

== Boxplot ==
== Boxplot ==
Line 668: Line 984:

=== Base R method ===
=== Base R method ===
[ Box Plots - R Base Graphs]
<li>[ Box Plots - R Base Graphs]
# Use default color palette
colors <- palette()[1:6] # "black"  "#DF536B" "#61D04F" "#2297E6" "#28E2E5" "#CD0BBC"
# Boxplot with default colors
boxplot(count ~ spray, data = InsectSprays, col = colors)
<li>If we like to add jitters to the boxplot, we can use points() + jitter(); this [ this example]. However, we need to hide outliers created by boxplot() by adding '''outline = FALSE'''
dim(df) # 112436 x 2
boxplot(count ~ spray, data = InsectSprays, col = colors, outline = FALSE)
mycol <- c("#F8766D", "#7CAE00", "#00BFC4", "#C77CFF")
# par("usr")[1:2] confirms the locations of x-axis are 1, 2, 3, ...
# mycol defines colors of 4 levels in df$Method (a factor)
boxplot(df$value ~ df$Method, col = mycol, xlab="Method")
points(jitter(as.integer(InsectSprays$spray) ), InsectSprays$count, pch=16)
<li>We can follow [[R#reorder(),_levels()_and_boxplot()|this]] to use the reorder() function to reorder the groups on the x-axis by their group mean/median.
<li>If we like to rotate the boxplot by 90 degrees, we can add ''', horizontal = TRUE''' to boxplot() function.
InsectSprays$newFac <- with(InsectSprays, reorder(spray, count, FUN=median))
boxplot(count ~ newFac, data = InsectSprays, col = "lightgray", horizontal = TRUE, outline = FALSE)
set.seed(1); points(InsectSprays$count, jitter(as.integer(InsectSprays$newFac) ),  pch=16)
<li>Another base plot approach to create a jittered boxplot is to use boxplot() + stripchart(). See [ Stripchart in R], [ How to Create a Strip Chart in R]. Consider to add '''outline = FALSE''' to boxplot() to avoid drawing outliers in boxplot() when stripchart() has been added.
<syntaxhighlight lang='rsplus'>
ylim <- range(df$estimate, na.rm = TRUE)
boxplot(estimate~type, data=df, xlab=NULL, ylab=NULL, ylim=ylim, outline=F)
stripchart(estimate~type, data=df, method = "jitter",
pch=19, col=c("salmon", "orange", "yellowgreen", "green"),
vertical=TRUE, add=TRUE)

=== Color fill/scale_fill_XXX ===
=== Color fill/scale_fill_XXX ===
Line 736: Line 1,082:
<li>  </li>
<li>[ Boxplot with jittered data points in ggplot2] </li>
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
# df2 is n x 2  
# df2 is n x 2  
Line 744: Line 1,090:
   labs(title="", y = "", x = "nboot")
   labs(title="", y = "", x = "nboot")
If we omit the '''outlier.shape=NA''' option in geom_boxplot(), we will get the following plot. (Another option is '''outlier.color = NA''').
If we omit the <span style="color: red">outlier.shape=NA</span> option in '''geom_boxplot()''', we will get the following plot where some outliers will appear twice. (Another option is '''outlier.color = NA'''; see [ extra point at boxplot with jittered points (ggplot2)]).

Line 750: Line 1,096:
<li>Base plot approach  
<li>Base plot approach  
[ Batch effects and confounders]
[ Batch effects and confounders]
<li>Another base plot approach. boxplot() + stripchart(). See [ Stripchart in R], [ How to Create a Strip Chart in R].

=== Groups of boxplots ===
=== Groups of boxplots ===
* [ How to Make Grouped Boxplot with Jittered Data Points in ggplot2]
* [ How To Make Grouped Boxplots with ggplot2?]. Use the '''fill''' parameter such as  
<li>[ How to Make Grouped Boxplot with Jittered Data Points in ggplot2]. Use the '''color''' parameter in ggplot(aes()).
<li>[ Boxplot With Jittered Points in R]
<li>[ How To Make Grouped Boxplots with ggplot2?], [ A review of Longitudinal Data Analysis in R]. Use the '''fill''' parameter such as  
mydata %>%
mydata %>%
Line 763: Line 1,109:
<li>Another method is to use [ ggpubr::ggboxplot()]. Papers [ TumorPurity].
Another method is to use [ ggpubr::ggboxplot()].
ggboxplot(df, "dose", "len",
ggboxplot(df, "dose", "len",
Line 784: Line 1,129:
           legend.key.size = unit(4,"mm"))
           legend.key.size = unit(4,"mm"))

=== p-values on top of boxplots ===
=== p-values on top of boxplots ===
* [ Add P-values and Significance Levels to ggplots]
* [ How to Perform Multiple Paired T-tests in R]
<li>[ Add P-values and Significance Levels to ggplots]
* [ Add Significance Level and Stars to Plot in R]
* ggpubr::stat_compare_means()
:<syntaxhighlight lang='rsplus'>
my_comparisons <- list( c("6", "8"), c("4", "6"), c("4", "8") )
ggboxplot(mtcars, x = "cyl", y = "mpg",
          color = "cyl", add = "jitter", palette = "jco") +
    stat_compare_means(comparisons = my_comparisons)+ # method="t.test", default is "wilcox.test"
    stat_compare_means(label.y = 45) # y-axis loc of overall p-value
<li>[ How to Perform Multiple Paired T-tests in R]
* ggpubr::stat_pvalue_manual()
<li>[ Add Significance Level and Stars to Plot in R]
* ggsignif::geom_signif()
:<syntaxhighlight lang='rsplus'>
ggplot(mtcars, aes(factor(cyl), mpg)) +
  geom_boxplot() +
    comparisons = list(
      c("4","6"), c("4","8")
    y_position = c(34, 35, 36)
<li>[ How to draw the boxplot with significant level?]
* ggsignif package or geom_line() function.
<li>Paper examples
* [ Fig 5A,B]
* [ Fig 2B]
<li>Manually do it - [ signibox] package (small).

== Violin plot and sina plot ==
== Violin plot and sina plot ==
[ sina plot] from the [ ggforce] package.
<li> It is similar to a box plot, with the addition of a rotated kernel '''density plot''' on each side.
<li>[ geom_violin()]
<li>[ Violin plot with mean/median in ggplot2], [ stat_summary()]
<li>[ sina plot] from the [ ggforce] package.
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
Line 798: Line 1,180:

<li>[ An example]

== Kernel density plot ==
== geom_density: Kernel density plot ==
Line 805: Line 1,189:
ggplot(iris, aes(x = Sepal.Length, fill = Species, col = Species)) +
ggplot(iris, aes(x = Sepal.Length, fill = Species, col = Species)) +
       geom_density(alpha = 0.4)
       geom_density(alpha = 0.4)
And two densities (black & red colors)
mydata <- data.frame(var1 = rnorm(100), var2 = rnorm(100, mean = 2))
# Create the plot
ggplot(data = mydata, aes(x = var1)) +
  geom_density() +
  geom_density(aes(x = var2), color = "red")
<li>As you can see the default colors are so terrible. A better choice is [[#ggokabeito|ggokabeito]] color scales. </li>
<li>As you can see the default colors are so terrible. A better choice is [[#ggokabeito|ggokabeito]] color scales. </li>
<li>[ Density plot + histogram]
* [ Your Lopsided Model is Out to Get You]
<li>[ Your Lopsided Model is Out to Get You] & [ WVPlots] package
<li>Overlay histograms with density plots
<li>Overlay histograms with density plots
Line 826: Line 1,218:
=== A panel of density plots ===
<li>Common xlim for all subplots
ggplot(data = mpg, aes(x = hwy)) +
    geom_density() +
    facet_wrap(~ class)
<li>Each subplot has its own xlim
ggplot(data = mpg, aes(x = hwy)) +
    geom_density() +
    facet_wrap(~ class, scales = "free_x")

Line 832: Line 1,240:

== GGally::ggpairs ==
== GGally::ggpairs ==
* [ How to Create and Interpret Pairs Plots in R]. [ pairs()]
* graphics::pairs()
** [ How to Create and Interpret Pairs Plots in R]. [ pairs()]
** [ Mastering Data Visualization with Pairs Plots in Base R]. Adding colors and regression lines,.
* [ All vignettes] launched by GGally::vig_ggally()  
* [ All vignettes] launched by GGally::vig_ggally()  
* [ Kmeans Clustering of Penguins]
* [ Kmeans Clustering of Penguins]
Line 945: Line 1,355:
=== Rotate x-axis labels ===
=== Rotate x-axis labels ===
* [ How To Rotate x-axis Text Labels in ggplot2?]
* [ How To Rotate x-axis Text Labels in ggplot2?]
* [ What do hjust and vjust do when making a plot using ggplot?] 0 means left-justified 1 means right-justified.
* [ What do hjust and vjust do when making a plot using ggplot?]  
** 0 means left-justified 1 means right-justified.
** Left-justified means the starting point (left edge) of the text is placed at the specified x-coordinate. So text appeared on the right side of the point.
** Right-justified means the end point (right edge) of the text is placed at the specified x-coordinate. So text appeared on the left side of the point.
** Default hjust/vjust is 0.5
ggplot(mydf) + geom_col(aes(x = model, y=value, fill = method), position="dodge")+
ggplot(mydf) + geom_col(aes(x = model, y=value, fill = method), position="dodge")+
   theme(axis.text.x = element_text(angle = 45, hjust=1))
   theme(axis.text.x = element_text(angle = 45, hjust=1, size= 8))

Line 957: Line 1,370:
scale_y_continuous(expand = c(0,0), limits = c(0, YourLimit))
scale_y_continuous(expand = c(0,0), limits = c(0, YourLimit))
* [ How does ggplot scale_continuous expand argument work?]

=== Add patterns ===
=== Add patterns ===
* [ ggpattern] package
* [ ggpattern] package
* [ ggpartten填充柱状图]
* [ ggpartten填充柱状图]
=== Barplot with colors for a 2nd variable ===
[ How to basic: bar plots]
By default, the barplots are stacked on top of each other. Use '''geom_col(position = "dodge")''' if we want the barplots to be side-by-side.
df <- data.frame(group = c("A", "A", "B", "B", "C", "C"),
      count = c(3, 4, 5, 6, 7, 8),
      fill = c("red", "blue", "red", "blue", "red", "blue"))
ggplot(df, aes(x = group, y = count, fill = fill)) +
      geom_col(position = "dodge")
[ Base R approach].

=== Barplot with color gradient ===
=== Barplot with color gradient ===
Line 970: Line 1,401:

=== Barplot with only horizontal gridlines ===
=== Barplot with only horizontal gridlines ===
[[File:Geom bar3.png|250px]]
[[File:Geom bar3.png|250px]] [[File:Geom bar4.png|250px]]

=== Barplot with text at the end ===
=== Barplot with text at the end ===
Line 976: Line 1,407:
* [ A Quick How-to on Labelling Bar Graphs in ggplot2]
* [ A Quick How-to on Labelling Bar Graphs in ggplot2]
* [ How to label a barplot bar with positive and negative bars with ggplot2] (Looks good but 2012)
* [ How to label a barplot bar with positive and negative bars with ggplot2] (Looks good but 2012)
* [ plitting a stacked bar plot simple]
* Examples from publications
* Examples from publications
** Draw a panel of barplots with common labels?
** Draw a panel of barplots with common labels?

[[File:Geom bar1.png|250px]] [[File:Geom bar2.png|250px]]
[[File:Geom bar1.png|250px]] [[File:Geom bar2.png|250px]]
== Waterfall plot ==
* [ Waterfall charts in ggplot2 with waterfalls package]
* [ ggplot2: Waterfall Charts] geom_rect()
* [ Waterfall Charts in Oncology Trials - Ride the Wave]. '''Drug response'''
** Collected data is compared to the data taken at '''baseline''' to determine if drug has some activity or not. Also each patient is assigned in to different categories based on overall response
** Y-axis = % of change from baseline in the '''tumor size''' for each patient
** We want to create this plot by grouping different patients based on their overall response category (eg 'Earth Death' or 'Complete Response') and fill the bars of such patients with different colors so it is easy to identify different groups.
* [ Understanding Waterfall Plots]
* A waterfall plot for drug BYL719 and color it based on the mutation status of the CDK13 gene, see [ Xeva] vignette.

== Polygon and map plot ==
== Polygon and map plot ==
* Base R method. ?polygon.

== geom_step: Step function ==
== geom_step: Step function ==
Line 1,036: Line 1,460:

= Special plots =
= Special plots =
* [ 5 Extremely Useful Plots For Data Scientists That You Never Knew Existed].
** Chord Diagram
** Sunburst Chart
** Hexbin Plot
** Sankey Diagram
** Stream Graph/ Theme River
== Dot plot & forest plot ==
== Dot plot & forest plot ==
* Wikipedia  
* Wikipedia  
Line 1,132: Line 1,563:
== Aesthetics finder ==, [ video]

== aes_string() ==
== aes_string() ==
Line 1,275: Line 1,709:
=== Overall title ===
=== Overall title ===
[ multiple ggplots overall title]
[ multiple ggplots overall title]
=== Remove vertical/horizontal grids but keep ticks ===
[ removeGrid()]

== patchwork ==
== patchwork ==
Line 1,292: Line 1,729:

# Method 1:
# Method 1:
p1 + p2 + theme(legend.position = "bottom") + plot_layout(guides = "collect")
p1 + p2 + plot_layout(guides = "collect") + theme(legend.position = "bottom")  
                                           # two legends on the RHS
                                           # one legend on the bottom
# Method 2:
# Method 2:
p1 + p2 + plot_layout(guides = "collect") # two legends on the RHS
p1 + p2 + plot_layout(guides = "collect") # one legend on the RHS
# Method 2:
# Method 2:
p1 + theme(legend.position="none") + p2  # legend (based on p2) is on the RHS
p1 + theme(legend.position="none") + p2  # legend (based on p2) is on the RHS
Line 1,312: Line 1,749:
=== Common x or y labels ===
=== Common x or y labels ===
* [ how to add common x and y labels to a grid of plots]. Another solution is on the egg package's [ vignette].
* [ how to add common x and y labels to a grid of plots]. Another solution is on the egg package's [ vignette].
= Base R plot vs ggplot2 =
* My summary
:{| class="wikitable"
|- style="background-color:#ffffc7;"
! base-R
! ggplot2
| plot(x, y, col)
| geom_point(aes(x, y, color, shape))
| xlim
| scale_x_continuous(limits)
| log="x"
| scale_x_continuous(trans="log10")
| xlab<br />mtext("Var", cex, line, adj, las, side)
| scale_x_discrete(name="sample size")<br />labs(x)<br />xlab()
| main
| labs(x, y, title, colour)<br />ggtitle()
| axis(2, labels)
| scale_y_continuous(labels, breaks)<br />scale_x_discrete(labels)
| ?
| scale_color_discrete('new color title')
| ?
| scale_shape_discrete('new shape title')
| col
| scale_color_manual(name, <br />  values = NamedVector)
| pch, cex
| geom_point(pch, size)
| plot(mpg, disp, col=factor(cyl))<br />legend("topleft", <br />    legend = sort(unique(cyl)), <br />    col=1:3, pch=1)<br /># discrete case
| ggplot(mtcars, <br />    aes(mpg, disp, color = factor(cyl))) +<br />    geom_point() +<br />    labs(color = "Number of Cylinders")
| text()
| geom_text()
| ?
| theme(title = element_text(size=8),<br />  legend.title = element_blank(),<br />  legend.position = "none", <br />  legend.key = element_blank(),<br />  plot.title = element_text(hjust = 0.5),<br />  plot.sybtitle = element_text(size = 8))
| las in plot(), barplot()<br />text(x, y, labs, srt=45)
| theme(axis.text.x = element_text(angle = 90))
| matplot()
| geom_line() + geom_point()
| plot(type = 'l'), points()
| geom_line() + geom_point()
| barplot()
| geom_bar()
| par(mfrow)
| facet_grid()
* [ Comparing ggplot2 and R Base Graphics]

= labs for x and y axes =
= labs for x and y axes =
Line 1,331: Line 1,832:
== name-value pairs ==
== name-value pairs ==
See several examples (color, fill, size, ...) from [ opioid prescribing habits in texas].
See several examples (color, fill, size, ...) from [ opioid prescribing habits in texas].
= Footnote =
[ Add Footnote to ggplot2]

= Prevent sorting of x labels =
= Prevent sorting of x labels =
Line 1,353: Line 1,857:
p <- ggplot(df, aes(x, y)) + geom_point(aes(colour = z))
p <- ggplot(df, aes(x, y)) + geom_point(aes(colour = z))
p + labs(x = "X axis", y = "Y axis", colour = "Colour\nlegend")
p + labs(x = "X axis", y = "Y axis", colour = "Colour\nlegend")
      # Use color to represent the legend title
p <- ggplot(df) + geom_col(aes(x=x, y=y, fill=cat), position = "dodge")
p + labs(x = "X", y = "Y", fill = "Category")
      # Use fill to represent the legend title
Line 1,368: Line 1,877:
== Remove NA factor level from color legend ==
Use '''na.translate = F''' in scale_color_XXX(). See [ ggplot: remove NA factor level in legend]

== Layout: move the legend from right to top/bottom of the plot or inside the plot or hide it ==
== Layout: move the legend from right to top/bottom of the plot or inside the plot or hide it ==
Line 1,376: Line 1,888:
gg + theme(legend.position="none")
gg + theme(legend.position="none")

gg + theme(legend.position = c(0.87, 0.25))
gg + theme(legend.position = c(0.87, 0.25)) +
    guides(colour = guide_legend(nrow = 1))

# Customize the edge color and background color
# Customize the edge color and background color
Line 1,387: Line 1,900:

== Guide functions for finer control ==
== Guide functions for finer control (legend, axis, color scales) == The guide functions, guide_colourbar() and guide_legend(), offer additional control over the fine details of the legend.
<li> The guide functions, guide_colourbar() and guide_legend(), offer additional control over the fine details of the legend.
[ guide_legend()] allows the modification of legends for scales, including fill, color, and shape.
<li>[ guide_legend()] allows the modification of legends for scales, including fill, color, and shape. This function can be used in scale_fill_manual(), scale_fill_continuous(), ... functions.
This function can be used in scale_fill_manual(), scale_fill_continuous(), ... functions.
scale_fill_manual(values=c("orange", "blue"),  
scale_fill_manual(values=c("orange", "blue"),  
Line 1,406: Line 1,916:
theme(legend.position = 'bottom')
theme(legend.position = 'bottom')
<li>[ guides()]
* Legend. For example, to remove the legend title:
ggplot(mtcars, aes(x = mpg, y = disp, color = factor(cyl))) +
  geom_point() +
  guides(color = guide_legend(title = NULL))
* Axis. For example, to change the angle of the x-axis labels:
ggplot(mtcars, aes(x = mpg, y = disp)) +
  geom_point() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  guides(x = guide_axis(angle = 45))
* Color scales. For example, to change the number of color breaks:
ggplot(mtcars, aes(x = mpg, y = disp, color = hp)) +
  geom_point() +
  guides(color = guide_colorbar(nbin = 10))

== Legend symbol background ==
== Legend symbol background ==
Line 1,419: Line 1,950:
== Legend size ==
== Legend size ==
[ How to Change Legend Size in ggplot2 (With Examples)]
[ How to Change Legend Size in ggplot2 (With Examples)]
data <- data.frame(x = 1:5, y = 1:5, label = c("A", "B", "C", "D", "E"))
ggplot(data, aes(x, y, color = as.factor(label))) +
  geom_point() +
  labs(title = "Legend Size Example with Theme Modification",
      color = "Label") +
    legend.text = element_text(size = 12),
    legend.title = element_text(size = 14)

= ggtitle() =
= ggtitle() =
Line 1,476: Line 2,018:

= geom_point() =
= geom_point() =
See [[Ggplot2#Scatterplot|Scatterplot]].
df <- data.frame(x=1:3, y=1:3, color=c("red", "green", "blue"))
df <- data.frame(x=1:3, y=1:3, color=c("red", "green", "blue"))
# Use I() to set aes values to the identify of a value from your data table
# Use I() to set aes values to the identify of a value from your data table
ggplot(df, aes(x,y, color=I(color))) + geom_point(size=10)
ggplot(df, aes(x,y, color=I(color))) + geom_point(size=10) # no color legend
# VS
# VS
ggplot(df, aes(x,y, color=color)) + geom_point(size=10) # color is like a class label
ggplot(df, aes(x,y, color=color)) + geom_point(size=10) # color is like a class label
== groups ==
* [ How To Add Regression Line per Group to Scatterplot in ggplot2?] '''geom_smooth()'''

= geom_bar(), geom_col(), stat_count() =
= geom_bar(), geom_col(), stat_count() =
* geom_bar: Counts the number of cases at each x position and makes the height of the bar proportional to the count (or sum of weights if supplied)
* geom_col: Leaves the data as is and makes the height of the bar proportional to the value in the data
{| class="wikitable"
! Function !! Default Statistic !! Purpose
| geom_bar() || stat_count() || <pre>
df2 <- data.frame(cat = c("A", "A", "A", "B", "B",
  "B", "B", "B", "C", "C", "C", "C", "C", "C"))
ggplot(df2, aes(x = cat)) + geom_bar()
# Same as
# barplot(table(df2$cat))
| geom_col() || stat_identity() || <pre>
df <- data.frame(group = c("A", "B", "C"),
                count = c(3, 5, 6))
ggplot(df, aes(x = group, y = count)) + geom_col()
# Same as
# barplot(df$count, names.arg = df$group)

Line 1,498: Line 2,061:
ggplot() + geom_col(mapping = aes(x, y))
ggplot() + geom_col(mapping = aes(x, y))
== Add colors to the plot ==
df <- data.frame(group = c("A", "B", "C"),
                count = c(3, 5, 6),
                fill = c("red", "green", "blue"))
ggplot(df, aes(x = group, y = count, fill = fill)) +

== Add numbers to the plot ==
== Add numbers to the plot ==
[ An example]
[ An example]
== Simple example ==
Original [[File:Geom bar simple.png|200px]] 
fct_reorder() [[File:Geom bar reorder.png|200px]].

== Ordered barplot and reorder() ==
== Ordered barplot and reorder() ==
Line 1,512: Line 2,089:
= stat_summary() =
= stat_summary() =
= stat_smooth(), geom_smooth() =
[ ?geom_smooth, ?stat_smooth]
ggplot(data = mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  stat_smooth(method = "glm", formula = "y ~ x",
              method.args = list(family = poisson(link = "log")),
              se = FALSE, color = "red") +
  labs(x = "Weight", y = "Miles per gallon")
To control the smoothness, use the "span" parameter. To disable the confidence interval, use "se = F".
geom_smooth(method = 'loess', se = FALSE, span = 0.3)
== geom_ribbon ==
* Useful for adding confidence interval. [ geom_ribbon()] Ribbons and area plots.
* [ Shadowing your ggplot2 lines. Forecasting confidence interval in R use case]
* Example
df <- data.frame(
  X = seq(0, 100, by = 5),  # Pathologist estimate
  Y = seq(0, 100, by = 5) + rnorm(21, 0, 5)  # XXX prediction
# Choice 1: Calculate the lower and upper bounds of the confidence interval
df$lower_bound <- 0.863 * df$X  # 13.7% below X
df$upper_bound <- 1.137 * df$X  # 13.7% above X
# Choice 2: Constant width for the confidence band
c <- 13.7
df$lower_bound <- df$X - c
df$upper_bound <- df$X + c
# Plotting
ggplot(df, aes(x = X, y = Y)) +
  geom_point() +
  geom_ribbon(aes(ymin = lower_bound, ymax = upper_bound), fill = "blue", alpha = 0.2) +
  geom_smooth(method = "lm", color = "red", se = FALSE) +
  labs(x = "Pathologist Estimate", y = "XXX Prediction") +

= geom_area() =
= geom_area() =
Line 1,533: Line 2,153:
== Use geom_line() to create a square bracket to annotate the plot ==
== Use geom_line() to create a square bracket to annotate the plot ==
[ Barchart with Significance Tests]
[ Barchart with Significance Tests]
== Interaction plot ==
[[T-test#Randomized_block_design|Randomized block design]]

= geom_segment() =
= geom_segment() =
Line 1,541: Line 2,164:
= geom_errorbar(): error bars =
= geom_errorbar(): error bars =
<li>[ Plotting means and error bars (ggplot2)] from Cookbook for R.
<li>[ GGPlot Error Bars] using geom_errorbar() and geom_segment()
<li>[ GGPlot Error Bars] using geom_errorbar() and geom_segment()
<br />
<br />
Line 1,548: Line 2,172:
* Can ggplot2 do this?
* Can ggplot2 do this?
* [ plotCI() from the plotrix package or geom_errorbar() from ggplot2 package]
* [ plotCI() from the plotrix package or geom_errorbar() from ggplot2 package]
* [ Vertical error bars]
* [ Vertical error bars]
* [ Horizontal error bars]
* [ Horizontal error bars]
Line 1,574: Line 2,197:

* Forest plot example using geom_errorbarh()

= geom_rect(), geom_bar() =
= geom_rect(), geom_bar() =
Line 1,590: Line 2,216:
* [ Rectangles]. This is useful for creating heatmaps; .e.g [ DoHeatmap()] & [ an example] in Seurat.
* [ Rectangles]. This is useful for creating heatmaps; .e.g [ DoHeatmap()] & [ an example] in Seurat.
* [ Wordle Words and Expected Value]
* [ Wordle Words and Expected Value]
== Waterfall plot ==
* A waterfall chart is a type of chart that represents how an '''initial value''' is affected by a series of intermediate positive or negative values.
* [ Understanding Waterfall Plots]
* [ Waterfall charts in ggplot2 with waterfalls package]
* [ ggplot2: Waterfall Charts] geom_rect()
* [ Waterfall Charts in Oncology Trials - Ride the Wave]. '''Drug response'''
** Collected data is compared to the data taken at '''baseline''' to determine if drug has some activity or not. Also each patient is assigned in to different categories based on overall response
** Y-axis = % of change from baseline in the '''tumor size''' for each patient
** We want to create this plot by grouping different patients based on their overall response category (eg 'Earth Death' or 'Complete Response') and fill the bars of such patients with different colors so it is easy to identify different groups.
* A waterfall plot for drug BYL719 and color it based on the mutation status of the CDK13 gene, see [ Xeva] vignette.

= geom_linerange =
= geom_linerange =
Line 1,605: Line 2,242:
= Annotation =
= Annotation =

== geom_hline(), geom_vline() ==
== Add a horizontal/vertical line ==
[ geom_hline(), geom_vline()]
Line 1,656: Line 2,294:
<li>[ Volcano plots], [ EnhancedVolcano] package </li>
<li>[ Volcano plots], [ EnhancedVolcano] package </li>
<li>[ Visualization of Volcano Plots in R]
data <- data.frame(
    gene = paste("Gene", 1:1000, sep = "_"),
    log2FoldChange = rnorm(1000),
    pvalue = runif(1000)
data$pvalue[1:20] <- runif(20, 0, .001)
data$padj <- p.adjust(data$pvalue, method = "BH") # Adjusted p-values
significant_genes <- subset(data, padj < 0.05 & abs(log2FoldChange) > 1)
ggplot(data, aes(x = log2FoldChange, y = -log10(padj))) +
    geom_point(aes(color = padj < 0.05 & abs(log2FoldChange) > 1), alpha = 0.5) +
    scale_color_manual(values = c("black", "red"), na.translate = F) +
    theme_minimal() +
    labs(title = "Volcano Plot", x = "Log2 Fold Change", y = "-Log10 Adjusted P-Value") +
        data = significant_genes,
        aes(label = gene),
        box.padding = 0.25,    # default
        point.padding = 1e-06,  # default
        max.overlaps = 10      # default

Line 1,745: Line 2,415:

We can specify dpi to increase the resolution if we use the '''png''' format ('''svg''' is not affected). For example,
We can specify dpi to increase the resolution if we use the '''png''' format ('''svg''' is not affected); see Chapter 14.5 [ Outputting to Bitmap (PNG/TIFF) Files] from R Graphics Cookbook.
<syntaxhighlight lang='rsplus'>
<syntaxhighlight lang='rsplus'>
g1 <- ggplot(data = mydf)  
g1 <- ggplot(data = mydf)  
ggsave("myfile.png", g1, height = 7, width = 8, units = "in", dpi = 500)
ggsave("myfile.png", g1, height = 7, width = 8, units = "in", dpi = 300)
I got an error -  Error in loadNamespace(name) : there is no package called ‘svglite’. After I install the package, everything works fine.
I got an error -  Error in loadNamespace(name) : there is no package called ‘svglite’. After I install the package, everything works fine.
Line 1,774: Line 2,444:

= graphics::smoothScatter =
= graphics::smoothScatter: scatter plots with lots of points =
[ smoothScatter with ggplot2]
* [ ?smoothScatter]
* [ Smooth scatter plot in R]
* [ smoothScatter with ggplot2]
* [ An example] from DeMixT. As we can see, we can we the '''lines()''' or '''abline()''' to add lines.

= Other tips/FAQs =
= Other tips/FAQs =
Line 1,782: Line 2,455:
== Ten Simple Rules for Better Figures ==
== Ten Simple Rules for Better Figures ==
[ Ten Simple Rules for Better Figures]
[ Ten Simple Rules for Better Figures]
== Five ways to improve your chart axes ==
[ Five ways to improve your chart axes]
== Beyond Bar and Line Graphs ==
[ Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm]

== Recreating the Storytelling with Data look with ggplot ==
== Recreating the Storytelling with Data look with ggplot ==
Line 1,798: Line 2,477:

= Animation and gganimate =
= Animation and gganimate =
* [ Animating Changes in Football Kits using R]: rvest, tidyverse, xml2, purrr & magick
* [ Animated Directional Chord Diagrams] tweenr & magick
<li>[ Animating Changes in Football Kits using R]: rvest, tidyverse, xml2, purrr & magick
* [ x-mas tRees with gganimate, ggplot, plotly and friends]
<li>[ Animated Directional Chord Diagrams] tweenr & magick
* [ Create animation in R]: learn by examples (gganimate)
<li>[ x-mas tRees with gganimate, ggplot, plotly and friends]
* [ The USMS ePostal Over the Last 20+ Years] (gganimate and bar charts)
<li>[ Create animation in R]: learn by examples (gganimate)
* [ R tip: Animations in R] from IDG TECHtalk
<li>[ The USMS ePostal Over the Last 20+ Years] (gganimate and bar charts)
<li>[ R tip: Animations in R] from IDG TECHtalk
<li>A moving super mario. See [ gganimate (with a spooky twist)] </br>

= ggstatsplot =
= ggstatsplot =

Revision as of 13:21, 11 October 2024



The Grammar of Graphics

  • Data: Raw data that we'd like to visualize
  • Geometrics: shapes that we use to visualize data
  • Aesthetics: Properties of geometries (size, color, etc)
  • Scales: Mapping between geometries and aesthetics

Scatterplot aesthetics

geom_point(). The aesthetics is geom dependent.

  • x, y
  • shape
  • color
  • size. It is not always to put 'size' inside aes(). See an example at Legend layout.
  • alpha
x1 <- rbinom(100, 1, .5) - .5
x2 <- c(rnorm(50, 3, .8)*.1, rnorm(50, 8, .8)*.1)
x3 <- x1*x2*2
# x=1:100, y=x1, x2, x3
tibble(x=1:length(x1), T=x1, S=x2, I=x3) %>% 
  tidyr::pivot_longer(-x) %>% 
  ggplot(aes(x=x, y=value)) + 

# Cf
matplot(1:length(x1), cbind(x1, x2, x3), pch=16, 
        col=c('cornflowerblue', 'springgreen3', 'salmon'))

Online tutorials


> library(ggplot2)
Need help? Try Stackoverflow:


Some examples

Examples from 'R for Data Science' book - Aesthetic mappings

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy))
  # the 'mapping' is the 1st argument for all geom_* functions, so we can safely skip it.
# template
ggplot(data = <DATA>) + 
  <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))

# add another variable through color, size, alpha or shape
ggplot(data = mpg) + 
  geom_point(aes(x = displ, y = hwy, color = class))

ggplot(data = mpg) + 
  geom_point(aes(x = displ, y = hwy, size = class))

ggplot(data = mpg) + 
  geom_point(aes(x = displ, y = hwy, alpha = class))

ggplot(data = mpg) + 
  geom_point(aes(x = displ, y = hwy, shape = class))

ggplot(data = mpg) + 
  geom_point(aes(x = displ, y = hwy), color = "blue")

# add another variable through facets
ggplot(data = mpg) + 
  geom_point(aes(x = displ, y = hwy)) + 
  facet_wrap(~ class, nrow = 2)

# add another 2 variables through facets
ggplot(data = mpg) + 
  geom_point(aes(x = displ, y = hwy)) + 
  facet_grid(drv ~ cyl)

Examples from 'R for Data Science' book - Geometric objects, lines and smoothers

How to Add a Regression Line to a ggplot?

# Points
ggplot(data = mpg) + 
  geom_point(aes(x = displ, y = hwy)) # we can add color to aes()

# Line plot
ggplot() +
  geom_line(aes(x, y))  # we can add color to aes()

# Smoothed
# 'size' controls the line width
ggplot(data = mpg) + 
  geom_smooth(aes(x = displ, y = hwy), size=1) 

# Points + smoother, add transparency to points, remove se
# We add transparency if we need to make smoothed line stands out
#                    and points less significant
# We move aes to the '''mapping''' option in ggplot()
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point(alpha=1/10) +

# Colored points + smoother
ggplot(data = mpg, aes(x = displ, y = hwy)) + 
  geom_point(aes(color = class)) + 

Examples from 'R for Data Science' book - Transformation, bar plot

# y axis = counts
# bar plot
ggplot(data = diamonds) + 
  geom_bar(aes(x = cut))
# Or
ggplot(data = diamonds) + 
  stat_count(aes(x = cut))

# y axis = proportion
ggplot(data = diamonds) + 
  geom_bar(aes(x = cut, y = ..prop.., group = 1))

# bar plot with 2 variables
ggplot(data = diamonds) + 
  geom_bar(aes(x = cut, fill = clarity))

facet_wrap and facet_grid to create a panel of plots

  • The statement facet_grid() can be defined without a data. For example
    mylayout <- list(ggplot2::facet_grid(cat_y ~ cat_x))
    mytheme <- c(mylayout, 
                 list(ggplot2::theme_bw(), ggplot2::ylim(NA, 1)))
    # we haven't defined cat_y, cat_x variables
    ggplot() + geom_line() + 
  • Multiclass predictive modeling for #TidyTuesday NBER papers
  • changing the facet_wrap labels using labeller in ggplot2. The solution is to create a labeller function as a function of a variable x (or any other name as long as it's not the faceting variables' names) and then coerce to labeller with as_labeller.


df <- data.frame(x = rnorm(100), y = rnorm(100), group = sample(c("A", "B"), 100, replace = TRUE))

# Use the xyplot() function to create the plot
# with each group represented by a different color
# result is 1 plot only
# no annotation
xyplot(y ~ x, data = df, groups = group)
df <- data.frame(x = rnorm(100), y = rnorm(100), 
                 group = sample(c("A", "B"), 100, replace = TRUE), 
                 time = sample(c("T1", "T2"), 100, replace = TRUE))

# 2 plots grouped by time
# two colors (defined by group) was used in each plot 
# no annotation
xyplot(y ~ x | time, groups = group, data = df)

For more complicated plot, we can use the panel parameter.

Color palette

Top color palettes

Display color palettes

  • Use barplot()
    pal <- c("#E41A1C", "#377EB8", "#4DAF4A", "#984EA3", "#FF7F00")
    # pal <- sample(colors(), 10) # randomly pick 10 colors 
    barplot(rep(1, length(pal)), col = pal, space = 0, 
            axes = FALSE, border = NA)
    # [1] -0.20  5.20 -0.01  1.00


  • Use heatmap()
    pal <- c("#E41A1C", "#377EB8", "#4DAF4A", "#984EA3", "#FF7F00")
    pal <- matrix(pal, nr=2) # acknowledge a nice warning message
    #      [,1]      [,2]      [,3]     
    # [1,] "#E41A1C" "#4DAF4A" "#FF7F00"
    # [2,] "#377EB8" "#984EA3" "#E41A1C"
    pal_matrix <- matrix(seq_along(pal), nr=nrow(pal), nc=ncol(pal))
    heatmap(pal_matrix, col = pal, Rowv = NA, Colv = NA, scale = "none", 
             ylab = "", xlab = "", main = "", margins = c(5, 5))
    # 2 rows, 3 columns with labeling on two axes
    # [1] 0 1 0 1


  • Use image()
    pal <- palette() # R 4.0 has a new default palette
                     # The old colors are highly saturated and vary enormousely
                     # in terms of luminance
    # [1] "black"   "#DF536B" "#61D04F" "#2297E6" "#28E2E5" "#CD0BBC" "#F5C710"
    # [8] "gray62"
    pal_matrix <- matrix(seq_along(pal), nr=1)
    image(pal_matrix, col = pal, axes = FALSE)
    # 8 rows, 1 column, but no labeling
    # Starting from bottom, left.
    par()$usr  # change with the data dim
    text(0, (par()$usr[4]-par()$usr[3])/8*c(0:7), 
         labels = pal)


  • Use scales::show_col()



In R, colors() is a function that returns a character vector of color names available in R.

To obtain the hexadecimal codes for all colors obtained by colors()

rgb_values <- col2rgb(colors())

# Convert the RGB values to hexadecimal codes
hex_codes <- apply(rgb_values, 2, 
                   function(x) rgb(x[1], x[2], x[3], 
                   maxColorValue = 255))

# View the first few hexadecimal codes



  • ?rainbow
  • Below compare the effects of 's' and 'v' parameters. s (saturation) and v (value): These parameters control the color intensity and brightness, respectively. See also HSL and HSV from wikipedia.
    • Saturation (s): Determines how vivid or muted the colors are. A value of 1 (default) means fully saturated colors, while lower values reduce the intensity.
    • Value (v): Controls the brightness. A value of 1 (default) results in full brightness, while lower values make the colors darker.

Rainbow default.png Rainbow s05.png Rainbow v05.png

Color blind

colorblindcheck: Check Color Palettes for Problems with Color Vision Deficiency

Color picker

> library(colourpicker)
> plotHelper(colours=5)

Listening on

Color names, Complementary/Inverted colors

colorspace package


c4a_gui() # it will create a shiny interface (but R will not be used at the same time)

c4a_types() # understand abbreviation

c4a_series() # 16 series like brewer, hcl, tableau, viridis, etc

c4a_overview() # how many palettes per series x types

c4a_palettes(type = "div", series = "hcl") # What palettes are available

# Give me the colors
c4a("hcl.purple_green", 11)
c4a("brewer.accent", 2)    # the 1st one on the website

# Plot the colors
c4a_plot("hcl.purple_green", 11, = TRUE)

*paletteer package

#67001FFF #B2182BFF #D6604DFF #F4A582FF #FDDBC7FF #F7F7F7FF 
#D1E5F0FF #92C5DEFF #4393C3FF #2166ACFF #053061FF 

#CC0C00FF #5C88DAFF #84BD00FF #FFCD00FF #7C878EFF #00B5E2FF #00AF66FF 

ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
      geom_point() +
# the next is the same as above
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
     geom_point() +
     scale_color_manual(values = c("setosa" = "#CC0C00FF", 
                                   "versicolor" = "#5C88DAFF", 
                                   "virginica" = "#84BD00FF"))



ggokabeito: Colorblind-friendly, qualitative 'Okabe-Ito' Scales for ggplot2 and ggraph. It seems to only support up to 9 classes/colors. It will give an error message if we have too many classes; e.g. Error: Insufficient values in manual scale. 15 needed but only 9 provided.)

# Bad
ggplot(mpg, aes(hwy, color = class, fill = class)) +
     geom_density(alpha = .8)

# Bad (single color)
ggplot(mpg, aes(hwy, color = class, fill = class)) +
     geom_density(alpha = .8) +
     scale_fill_brewer(name = "Class") +
     scale_color_brewer(name = "Class")

# Bad
ggplot(mpg, aes(hwy, color = class, fill = class)) +
     geom_density(alpha = .8) +
     scale_fill_brewer(name = "Class", palette ="Set1") +
     scale_color_brewer(name = "Class", palette ="Set1")

# Nice
ggplot(mpg, aes(hwy, color = class, fill = class)) +
     geom_density(alpha = .8) +
     scale_fill_okabe_ito(name = "Class") +
     scale_color_okabe_ito(name = "Class")

Pride palette

Show Pride on Your Plots. gglgbtq package


Colour related aesthetics: colour, fill and alpha

Scatterplot with large number of points: alpha

smoothScatter with ggplot2

ggplot(aes(x, y)) +

For base R, we can use the alpha parameter rgb(,,,alpha),

plot(x, y, col=rgb(0,0,0, alpha=.1))
polygon(df, col=adjustcolor(c("red", "blue"), alpha.f=.3))

Combine colors and shapes in legend

  • In order for legends to be merged, they must have the same name.
    df <- data.frame(x = 1:3, y = 1:3, z = c("a", "b", "c"))
    ggplot(df, aes(x, y)) + geom_point(aes(shape = z, colour = z), size=4)
  • How to Work with Scales in a ggplot2 in R. This solution is better since it allows to change the legend title. Just make sure the title name we put in both scale_* functions are the same.
    ggplot(mtcars, aes(x=hp, y=mpg)) +
       geom_point(aes(shape=factor(cyl), colour=factor(cyl))) +
       scale_shape_discrete("Cylinders") + # change the legend title from 'factor(cyl)' to 'Cylinders'
       scale_colour_discrete("Cylinders")  # combine shape and colour in one legend; avoid another legend for colour
  • GGPLOT Point Shapes Best Tips
  • Simulated data
    df <- data.frame(x = rnorm(100), y = rnorm(100),
                     Treatment = rep(c("Before", "After"), each = 50),
                     Response = rep(c("Sensitive", "Resistant"), each = 50),
                     Subject = rep(1:50, times = 2))
    ggplot(df, aes(x = x, y = y, shape = Treatment, color = Response)) +
      geom_point() +
      geom_line(aes(group = Subject), alpha = 0.5) +  # Add lines connecting the same subject
      scale_shape_manual(values = c(16, 17)) +  # You can choose different shapes
      scale_color_manual(values = c("blue", "red")) +  # You can choose different colors
      theme_minimal() +
      labs(title = "Scatterplot with Different Shapes and Colors",
           x = "X-axis label",
           y = "Y-axis label",
           shape = "Treatment",
           color = "Response")

ggplot2::scale functions and scales packages

  • Scales control the mapping from data to aesthetics. They take your data and turn it into something that you can see, like size, colour, position or shape.
  • Scales also provide the tools that let you read the plot: the axes and legends.
  • scales 1.2.0

ggplot2::scale_* - axes/axis, legend and reference of all scale_* functions. Modifies the scales of the axes, such as the x- and y-axes, color, size, etc.

Naming convention: scale_AestheticName_NameDataType where

  • AestheticName can be x, y, color, fill, size, shape, ...
  • NameDataType can be continuous, discrete, manual or gradient.
  • Table of common functions


  • See Figure 12.1: Axis and legend components on the book ggplot2: Elegant Graphics for Data Analysis
    # Set x-axis label
    scale_x_discrete("Car type")   # or a shortcut xlab() or labs()
    # Set legend title
    scale_colour_discrete("Drive\ntrain")    # or a shortcut labs()
    # Change the default color
    # Change the axis scale
    # Change breaks and their labels
    scale_x_continuous(breaks = c(2000, 4000), labels = c("2k", "4k"))
    # Relabel the breaks in a categorical scale
    scale_y_discrete(labels = c(a = "apple", b = "banana", c = "carrot"))
  • See an example at geom_linerange where we have to specify the limits parameter in order to make "8" < "16" < "20"; otherwise it is 16 < 20 < 8.
    Browse[2]> order(coordinates$chr)
    [1] 3 4 1 2
    Browse[2]> coordinates$chr 
    [1] "20" "8"  "16" "16"
  • Differences of scale_color_gradient() and scale_color_continuous()
    • scale_color_gradient() (more common than scale_color_continuous) is used to map a continuous variable to a color gradient. It takes two arguments: low and high, which specify the colors for the minimum and maximum values of the variable, respectively. The gradient is automatically generated between these two colors.
    ggplot(data = diamonds, aes(x = carat, y = price, color = depth)) +
      geom_point() +
      scale_color_gradient(low = "blue", high = "red")
    • scale_color_continuous() (useful if we want to specify the labels to display on legend) does not automatically generate the color scale. Instead, it requires the user to specify the values to which the colors should be mapped. The limits argument sets the minimum and maximum values for the variable, and the breaks argument specifies the values at which breaks occur.
    ggplot(data = diamonds, aes(x = carat, y = price, color = depth)) +
         geom_point() +
         scale_color_continuous(name = "Depth", 
                                limits = c(40, 80), 
                                breaks = c(40, 60, 80),
                                labels = c("Shallow", "Moderate", "Deep"), # display on legend
                                type = "gradient")

ylim and xlim in ggplot2 in axes or the Zooming part of the cheatsheet

Use one of the following

  • + scale_x_continuous(limits = c(-5000, 5000))
  • + coord_cartesian(xlim = c(-5000, 5000))
  • + xlim(-5000, 5000)

Emulate ggplot2 default color palette


The above can be created by R >= 4.0.0 using the command scales::show_col(palette.colors(palette = "ggplot2")). We should ignore the 1st color (black). Also if n>=5, the colors do not match with the result of show_col(hue_pal()(5)) .

Answer 1 It is just equally spaced hues around the color wheel. Emulate ggplot2 default color palette

gg_color_hue <- function(n) {
  hues = seq(15, 375, length = n + 1)
  hcl(h = hues, l = 65, c = 100)[1:n]

n = 4
cols = gg_color_hue(n) = 4, height = 4)
plot(1:n, pch = 16, cex = 2, col = cols)

Answer 2 (better, it shows the color values in HEX). It should be read from left to right and then top to down.

scales package

show_col(hue_pal()(4)) # ("#F8766D", "#7CAE00", "#00BFC4", "#C77CFF")
                       # (Salmon, Christi, Iris Blue, Heliotrope)
show_col(hue_pal()(3)) # ("#F8766D", "#00BA38", "#619CFF")
                       # (Salmon, Dark Pastel Green, Cornflower Blue)
show_col(hue_pal()(2)) # ("#F8767D", "#00BFC4") = (salmon, iris blue) 
           # see for color names

See also the last example in ggsurv() where the KM plots have 4 strata. The colors can be obtained by scales::hue_pal()(4) with hue_pal()'s default arguments.

R has a function called colorName() to convert a hex code to color name; see roloc package on CRAN.

How to change the default color palette in geom_XXX

  • Simple custom colour palettes with R ggplot graphs
  • Change the color palette for all plots
    • Create a Custom Theme
      # Define a custom theme with a specific color palette
      custom_theme <- theme_minimal() +
        scale_fill_manual(values = c("red", "blue", "green", "purple")) +
        scale_color_manual(values = c("red", "blue", "green", "purple"))
      # Set the custom theme as the default
    • ggthemr package
    • rcartocolor package
  • Change the color palette for the current plot only:
    • Using scale_fill_manual() and scale_color_manual()
      data <- data.frame(
        category = c("A", "B", "C", "D"),
        value = c(3, 5, 2, 8)
      ggplot(data, aes(x = category, y = value, fill = category)) +
        geom_bar(stat = "identity") +
        scale_fill_manual(values = c("red", "blue", "green", "purple")) +
    • Using scale_fill_brewer() and scale_color_brewer()
      ggplot(data, aes(x = category, y = value, fill = category)) +
        geom_bar(stat = "identity") +
        scale_fill_brewer(palette = "Set3") +
    • Using scale_fill_viridis() and scale_color_viridis()
      ggplot(data, aes(x = category, y = value, fill = category)) +
        geom_bar(stat = "identity") +
        scale_fill_viridis(discrete = TRUE) +
    • Using scale_fill_hue() and scale_color_hue()
      ggplot(data, aes(x = category, y = value, fill = category)) +
        geom_bar(stat = "identity") +
        scale_fill_hue(h = c(0, 360), l = 65, c = 100) +
  • How to change the color in geom_point or lines in ggplot
    ggplot() + 
      geom_point(data = data, aes(x = time, y = y, color = sample),size=4) +
      scale_color_manual(values = c("A" = "black", "B" = "red"))
    ggplot(data = data, aes(x = time, y = y, color = sample)) + 
      geom_point(size=4) + 
      geom_line(aes(group = sample)) + 
      scale_color_manual(values = c("A" = "black", "B" = "red"))

transform scales

How to make that crazy Fox News y axis chart with ggplot2 and scales

Class variables

  • "Set1" is a good choice. See RColorBrewer::display.brewer.all()
  • For ordinal variable, brewer.pal(n, "Spectral") is good. But the middle color is too light. So I modify the middle color
    brewer.pal(5, "Spectral")
    cols[3] <- "#D4C683" # middle of "#FDAE61" and "#ABDDA4"

Red, Green, Blue alternatives

  • Red: "maroon"

Heatmap for single channel

How to Make a Heatmap of Customers in R, source code on github. geom_tile() and geom_text() were used. Heatmap in ggplot2 from

# White <----> Blue
RColorBrewer::display.brewer.pal(n = 8, name = "Blues")

Heatmap for dual channels

# Red <----> Blue
display.brewer.pal(n = 8, name = 'RdBu')
# Hexadecimal color specification 
brewer.pal(n = 8, name = "RdBu")

plot(1:8, col=brewer_pal(palette = "RdBu")(8), pch=20, cex=4)

# Blue <----> Red
plot(1:8, col=rev(brewer_pal(palette = "RdBu")(8)), pch=20, cex=4)


Don't rely on color to explain the data


Don't use very bright or low-contrast colors, accessibility

Create your own scale_fill_FOO and scale_color_FOO

Custom colour palettes for {ggplot2}

Themes and background for ggplot2


  • Export plot in .png with transparent background in base R plot.
    x = c(1, 2, 3)
    op <- par(bg=NA)
    plot (x)
  • Transparent background with ggplot2
    p <- ggplot(airquality, aes(Solar.R, Temp)) +
         geom_point() +
         geom_smooth() +
         # set transparency
            panel.grid.major = element_blank(), 
            panel.grid.minor = element_blank(),
            panel.background = element_rect(fill = "transparent",colour = NA),
            plot.background = element_rect(fill = "transparent",colour = NA)
    ggsave("airquality.png", p, bg = "transparent")
  • ggplot2 theme background color and grids
    ggplot() + geom_bar(aes(x=, fill=y)) +
               theme(panel.background=element_rect(fill='purple')) + 
    ggplot() + geom_bar(aes(x=, fill=y)) + 
               theme(panel.background=element_blank()) + 
               theme(plot.background=element_blank()) # minimal background like base R
               # the grid lines are not gone; they are white so it is the same as the background
    ggplot() + geom_bar(aes(x=, fill=y)) + 
               theme(panel.background=element_blank()) + 
               theme(plot.background=element_blank()) +
               theme(panel.grid.major.y = element_line(color="grey"))
               # draw grid line on y-axis only
    ggplot() + geom_bar() +
               theme_bw()  # very similar to theme_light()
                           # have grid lines
    ggplot() + geom_bar() +
               theme_classic() # similar to base R graphic
                           # no borders on top and right
    ggplot() + geom_bar() +
               theme_minimal() # no edge
    ggplot() + geom_bar() +
               theme_void() # no grid, no edge
    ggplot() + geom_bar() +


ggthmr package

Font size

For example to make the subtitle font size smaller

my_ggp + theme(plot.sybtitle = element_text(size = 8)) 
# Default font size seems to be 11 for title/subtitle

Remove x and y axis titles

ggplot2 title : main, axis and legend titles

Rotate x-axis labels, change colors


theme(axis.text.x = element_text(angle = 90, size=5, hjust=1)

customize ggplot2 axis labels with different colors

Add axis on top or right hand side

Remove labels

Plotting with ggplot: : adding titles and axis names

ggthemes package

ggplot() + geom_bar() +
           theme_solarized()   # sun color in the background





thematic, Top R tips and news from RStudio Global 2021

Common plots


Handling overlapping points (slides) and the ebook Fundamentals of Data Visualization by Claus O. Wilke.

Scatterplot with histograms



Geom smooth ex.png

Bubble Chart


ggside: scatterplot + marginal density plot

ggextra: scatterplot + marginal histogram/density

Line plots

Ridgeline plots, mountain diagram


Histograms is a special case of bar plots. Instead of drawing each unique individual values as a bar, a histogram groups close data points into bins.

ggplot(data = txhousing, aes(x = median)) +
  geom_histogram()  # adding 'origin =0' if we don't expect negative values.
                    # adding 'bins=10' to adjust the number of bins
                    # adding 'binwidth=10' to adjust the bin width

Histogram vs barplot from deeply trivial.

Multiple variables


Be careful that if we added scale_y_continuous(expand = c(0,0), limits = c(0,1)) to the code, it will change the boxplot if some data is outside the range of (0, 1). The console gives a warning message in this case.

Base R method

  • Box Plots - R Base Graphs
    # Use default color palette
    colors <- palette()[1:6] # "black"   "#DF536B" "#61D04F" "#2297E6" "#28E2E5" "#CD0BBC"
    # Boxplot with default colors
    boxplot(count ~ spray, data = InsectSprays, col = colors)
  • If we like to add jitters to the boxplot, we can use points() + jitter(); this this example. However, we need to hide outliers created by boxplot() by adding outline = FALSE
    boxplot(count ~ spray, data = InsectSprays, col = colors, outline = FALSE)
    # par("usr")[1:2] confirms the locations of x-axis are 1, 2, 3, ...
    points(jitter(as.integer(InsectSprays$spray) ), InsectSprays$count, pch=16)
  • We can follow this to use the reorder() function to reorder the groups on the x-axis by their group mean/median.
  • If we like to rotate the boxplot by 90 degrees, we can add , horizontal = TRUE to boxplot() function.
    InsectSprays$newFac <- with(InsectSprays, reorder(spray, count, FUN=median))
    boxplot(count ~ newFac, data = InsectSprays, col = "lightgray", horizontal = TRUE, outline = FALSE)
    set.seed(1); points(InsectSprays$count, jitter(as.integer(InsectSprays$newFac) ),  pch=16)
  • Another base plot approach to create a jittered boxplot is to use boxplot() + stripchart(). See Stripchart in R, How to Create a Strip Chart in R. Consider to add outline = FALSE to boxplot() to avoid drawing outliers in boxplot() when stripchart() has been added.
    ylim <- range(df$estimate, na.rm = TRUE)
    boxplot(estimate~type, data=df, xlab=NULL, ylab=NULL, ylim=ylim, outline=F)
    stripchart(estimate~type, data=df, method = "jitter",
    		pch=19, col=c("salmon", "orange", "yellowgreen", "green"),
    		vertical=TRUE, add=TRUE)

Color fill/scale_fill_XXX

n <- 100
k <- 12
cond <- factor(rep(LETTERS[1:k], each=n))
rating <- rnorm(n*k)
dat <- data.frame(cond = cond, rating = rating)

p <- ggplot(dat, aes(x=cond, y=rating, fill=cond)) + 

p + scale_fill_hue() + labs(title="hue default") # Same as only p 
p + scale_fill_hue(l=40, c=35) + labs(title="hue options")
p + scale_fill_brewer(palette="Dark2") + labs(title="Dark2")
p + colorspace::scale_fill_discrete_qualitative(palette = "Dark 3") + labs(title="Dark 3")
p + scale_fill_brewer(palette="Accent") + labs(title="Accent")
p + scale_fill_brewer(palette="Pastel1") + labs(title="Pastel1")
p + scale_fill_brewer(palette="Set1") + labs(title="Set1")
p + scale_fill_brewer(palette="Spectral") + labs(title ="Spectral") 
p + scale_fill_brewer(palette="Paired") + labs(title="Paired")
# cbbPalette <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
# p + scale_fill_manual(values=cbbPalette)


ColorBrewer palettes RColorBrewer::display.brewer.all() to display all brewer palettes.

Reference from ggplot2. scale_fill_binned, scale_fill_brewer, scale_fill_continuous, scale_fill_date, scale_fill_datetime, scale_fill_discrete, scale_fill_distiller, scale_fill_gradient, scale_fill_gradientc, scale_fill_gradientn, scale_fill_grey, scale_fill_hue, scale_fill_identity, scale_fill_manual, scale_fill_ordinal, scale_fill_steps, scale_fill_steps2, scale_fill_stepsn, scale_fill_viridis_b, scale_fill_viridis_c, scale_fill_viridis_d

Jittering - plot the data on top of the boxplot

Groups of boxplots

  • How to Make Grouped Boxplot with Jittered Data Points in ggplot2. Use the color parameter in ggplot(aes()).
  • Boxplot With Jittered Points in R
  • How To Make Grouped Boxplots with ggplot2?, A review of Longitudinal Data Analysis in R. Use the fill parameter such as
    mydata %>%
      ggplot(aes(x=Factor1, y=Response, fill=factor(Factor2))) +   
  • Another method is to use ggpubr::ggboxplot(). Papers TumorPurity.
    ggboxplot(df, "dose", "len",
               fill = "dose", palette = c("#00AFBB", "#E7B800", "#FC4E07"), add.params=list(size=0.1),
               notch=T, add = "jitter", outlier.shape = NA, shape=16,
               size = 1/.pt, x.text.angle = 30, 
               ylab = "Silhouette Values", legend="right",
               ggtheme = theme_pubr(base_size = 8)) +
         theme(plot.title = element_text(size=8,hjust = 0.5), 
               text = element_text(size=8), 
               title = element_text(size=8),
               rect = element_rect(size = 0.75/.pt),
               line = element_line(size = 0.75/.pt),
               axis.text.x = element_text(size = 7),
               axis.line = element_line(colour = 'black', size = 0.75/.pt),
               legend.title = element_blank(),
               legend.position = c(0,1), 
               legend.justification = c(0,1),
               legend.key.size = unit(4,"mm"))

p-values on top of boxplots

Violin plot and sina plot

geom_density: Kernel density plot

A panel of density plots

  • Common xlim for all subplots
    ggplot(data = mpg, aes(x = hwy)) +
         geom_density() +
         facet_wrap(~ class)
  • Each subplot has its own xlim
    ggplot(data = mpg, aes(x = hwy)) +
         geom_density() +
         facet_wrap(~ class, scales = "free_x")

Bivariate analysis with ggpair

Correlation in R: Pearson & Spearman with Matrix Example


barplot/bar plot

Ordered barplot and facet

  • ?reorder. This, as relevel(), is a special case of simply calling factor(x, levels = levels(x)[....]).
    R> bymedian <- with(InsectSprays, reorder(spray, count, median))
    # bymedian will replace spray (a factor) 
    # The data is not changed except the order of levels (a factor) 
    # In this case, the order is determined by the median of count from each spray level
    #   from small to large.
    R> InsectSprays[1:3, ]
      count spray
    1    10     A
    2     7     A
    3    20     A
    R> bymedian
     [1] A A A A A A A A A A A A B B B B B B B B B B B B C C C C C C C C C C C C D D D D D D D
    [44] D D D D D E E E E E E E E E E E E F F F F F F F F F F F F
       A    B    C    D    E    F 
    14.0 16.5  1.5  5.0  3.0 15.0 
    Levels: C E D A F B
    R> InsectSprays$spray
     [1] A A A A A A A A A A A A B B B B B B B B B B B B C C C C C C C C C C C C D D D D D D D
    [44] D D D D D E E E E E E E E E E E E F F F F F F F F F F F F
    Levels: A B C D E F
    R> boxplot(count ~ bymedian, data = InsectSprays,
             xlab = "Type of spray", ylab = "Insect count",
             main = "InsectSprays data", varwidth = TRUE,
             col = "lightgray")


    tibble(y=sample(6), x=letters[1:6]) %>% 
      ggplot(aes(reorder(x, -y), y)) + geom_point(size=4)
  • Sorting the x-axis in bargraphs using ggplot2 or this one from Deeply Trivial. reorder(fac, value) was used.
    ggplot(df, aes(x=reorder(x, -y), y=y)) + geom_bar(stat = 'identity')
    df$order <- 1:nrow(df)
    # Assume df$y is a continuous variable and df$fac is a character/factor variable
    #   and we want to show factor according to the way they appear in the data
    #   (not following R's order even the variable is of type "character" not "factor")
    # We like to plot df$fac on the y-axis and df$y on x-axis. Fortunately,
    #   ggplot2 will draw barplot vertically or horizontally depending the 2 variables' types
    # The reason of using "-order" is to make the 1st name appears on the top
    ggplot(df, aes(x=y, y=reorder(fac, -order))) + geom_col()
    ggplot(df, aes(x=reorder(x, desc(y)), y=y)), geom_col()
  • Predict #TidyTuesday giant pumpkin weights with workflowsets. fct_reorder()
  • Reordering and facetting for ggplot2. tidytext::reorder_within() was used.
  • Chapter2 of data.table cookbook. reorder(fac, value) was used.
  • PCA and UMAP with tidymodels
  • A simple example
    dat <- structure(list(gene = c("CAPN9", "CSF3R", "HPN", "KCNA5", "MTMR7", 
    "NRG3", "SMTNL2", "TMPRSS6"), coef = c(-1.238, -0.892, -0.224, 
    -0.057, 0.133, 0.377, 0.436, 0.804)), row.names = c("4976", "6467", 
    "12355", "13373", "18143", "19010", "23805", "25602"), class = "data.frame")
    # Base R plot
    barplot(dat$coef, names = dat$gene, horiz = T, las=1,
            main='base R', xlab = "Coefficients")
    # GGplot2
    dat %>% ggplot(aes(y=gene, x=coef)) + geom_col(fill = 'gray') + 
        theme(axis.ticks.y = element_blank()) + 
        theme(panel.background = element_blank(), 
              axis.line.x = element_line(colour = 'black')) +
        labs(x="Coefficients", y = '', title = "ggplot2")

    Barplot base.png, Barplot ggplot2.png

Proportion barplot

Back to back barplot

Pyramid Chart


Flip x and y axes


Rotate x-axis labels

ggplot(mydf) + geom_col(aes(x = model, y=value, fill = method), position="dodge")+
  theme(axis.text.x = element_text(angle = 45, hjust=1, size= 8))

Starts at zero

Starting bars and histograms at zero in ggplot2

scale_y_continuous(expand = c(0,0), limits = c(0, YourLimit))

Add patterns

Barplot with colors for a 2nd variable

How to basic: bar plots

By default, the barplots are stacked on top of each other. Use geom_col(position = "dodge") if we want the barplots to be side-by-side.

df <- data.frame(group = c("A", "A", "B", "B", "C", "C"), 
      count = c(3, 4, 5, 6, 7, 8), 
      fill = c("red", "blue", "red", "blue", "red", "blue"))
ggplot(df, aes(x = group, y = count, fill = fill)) + 
      geom_col(position = "dodge")


Base R approach.

Barplot with color gradient


Barplot with only horizontal gridlines

Geom bar3.png Geom bar4.png

Barplot with text at the end

Geom bar1.png Geom bar2.png

Polygon and map plot


geom_step: Step function

Connect observations: geom_path(), geom_step()

Example: KM curves (without legend)

sf <- survfit(Surv(time, status) ~ x, data = aml)
str(sf) # the first 10 forms one strata and the rest 10 forms the other
ggplot() + 
  geom_step(aes(x=c(0, sf$time[1:10]), y=c(1, sf$surv[1:10])), 
            col='red') + 
  scale_x_continuous('Time', limits = c(0, 161)) + 
  scale_y_continuous('Survival probability', limits = c(0, 1)) +
  geom_step(aes(x=c(0, sf$time[11:20]), y=c(1, sf$surv[11:20])), 
# cf:  plot(sf, col = c('red', 'black'), mark.time=FALSE)

Same example but with legend (see Construct a manual legend for a complicated plot)

cols <- c("NEW"="#f04546","STD"="#3591d1")
ggplot() + 
  geom_step(aes(x=c(0, sf$time[1:10]), y=c(1, sf$surv[1:10]), col='NEW')) +
  scale_x_continuous('Time', limits = c(0, 161)) + 
  scale_y_continuous('Survival probability', limits = c(0, 1)) +
  geom_step(aes(x=c(0, sf$time[11:20]), y=c(1, sf$surv[11:20]), col='STD')) + 
  scale_colour_manual(name="Treatment", values = cols)

To control the line width, use the size parameter; e.g. geom_step(aes(x, y), size=.5). The default size is .5 (where to find this info?).

To allow different line types, use the linetype parameter. The first level is solid line, the 2nd level is dashed, ... We can change the default line types by using the scale_linetype_manual() function. See Line Types in R: The Ultimate Guide for R Base Plot and GGPLOT.

Coefficients, intervals, errorbars

Comparing similarities / differences between groups

comparing similarities / differences between groups

Special plots

Dot plot & forest plot

Lollipop plot

geom_segment() + geom_point()

ggpubr:: ggdotchart()

Correlation Analysis Different

Bump plot: plot ranking over time

Gauge plots

Sankey diagrams

Horizon chart

Circos plots


  • We can create a new aesthetic name in aes(aesthetic = variable) function; for example, the "text2" below. In this case "text2" name will not be shown; only the original variable will be used.
    g <- ggplot(tail(iris), aes(Petal.Length, Sepal.Length, text2=Species)) + geom_point()
    ggplotly(g, tooltip = c("Petal.Length", "text2"))

Aesthetics finder, video



GUI/Helper packages

ggedit & ggplotgui – interactive ggplot aesthetic and theme editor

esquisse (French, means 'sketch'): creating ggplot2 interactively

A 'shiny' gadget to create 'ggplot2' charts interactively with drag-and-drop to map your variables. You can quickly visualize your data accordingly to their type, export to 'PNG' or 'PowerPoint', and retrieve the code to reproduce the chart.

The interface introduces basic terms used in ggplot2:

  • x, y,
  • fill (useful for geom_bar, geom_rect, geom_boxplot, & geom_raster, not useful for scatterplot),
  • color (edges for geom_bar, geom_line, geom_point),
  • size,
  • facet, split up your data by one or more variables and plot the subsets of data together.

It does not include all features in ggplot2. At the bottom of the interface,

  • Labels & title & caption.
  • Plot options. Palette, theme, legend position.
  • Data. Remove subset of data.
  • Export & code. Copy/save the R code. Export file as PNG or PowerPoint.



ggx Create ggplot in natural language



R web → plotly


ggiraph: Make 'ggplot2' Graphics Interactive

ggconf: Simpler Appearance Modification of 'ggplot2'

Plotting individual observations and group means


Adding/Inserting an image to ggplot2

Inserting an image to ggplot2: See annotation_custom.

See also ggbernie which uses a different way ggplot2::layer() and a self-defined geom (geometric object).

Easy way to mix/combine multiple graphs on the same page


  • predcurvePlot.R from TreatmentSelection. One issue is the font size is large for the text & labels at the bottom. The 2nd issue is the bottom part of the graph/annotation (marker value scale) can be truncated if the window size is too large. If the window is too small, the bottom part can overlap with the top part.
    p <- p + theme(plot.margin = unit(c(1,1,4,1), "lines"))  # hard coding
    p <- p + annotation_custom() # axis for marker value scale
    p <- p + annotation_custom() # label only
    • Similar plot but without using base R graphic. One issue is the text is not below the scale (this can be fixed by par(mar) & mtext(text, side=1, line=4)) and the 2nd issue is the same as ggplot2's approach.
      axis(1,at= breaks, label = round(quantile(x1, prob = breaks/100), 1),pos=-0.26) # hard coding
    • Another common problem is the plot saved by pdf() or png() can be truncated too. I have a better luck with png() though.



Force a regular plot object into a Grob for use in grid.arrange

gridGraphics package

make one panel blank/create a placeholder

# Method 1: Blank
ggplot() + theme_void()
# Method 2: Display N/A
ggplot() +
    theme_void() +

Overall title

multiple ggplots overall title

Remove vertical/horizontal grids but keep ticks



Common legend

Add a common Legend for combined ggplots


p1 <- ggplot(df1, aes(x = x, y = y, colour = group)) + 
  geom_point(position = position_jitter(w = 0.04, h = 0.02), size = 1.8)
p2 <- ggplot(df2, aes(x = x, y = y, colour = group)) + 
  geom_point(position = position_jitter(w = 0.04, h = 0.02), size = 1.8)

# Method 1:
p1 + p2 + plot_layout(guides = "collect") + theme(legend.position = "bottom") 
                                          # one legend on the bottom
# Method 2:
p1 + p2 + plot_layout(guides = "collect") # one legend on the RHS
# Method 2:
p1 + theme(legend.position="none") + p2  # legend (based on p2) is on the RHS
# Method 3:
p1 + p2 + theme(legend.position="none")  # legend (based on p1) is in the middle!!

Overall title

Common Main Title for Multiple Plots in Base R & ggplot2 (2 Examples)


Common x or y labels

Base R plot vs ggplot2

  • My summary
base-R ggplot2
plot(x, y, col) geom_point(aes(x, y, color, shape))
xlim scale_x_continuous(limits)
log="x" scale_x_continuous(trans="log10")
mtext("Var", cex, line, adj, las, side)
scale_x_discrete(name="sample size")
main labs(x, y, title, colour)
axis(2, labels) scale_y_continuous(labels, breaks)
? scale_color_discrete('new color title')
? scale_shape_discrete('new shape title')
col scale_color_manual(name,
values = NamedVector)
pch, cex geom_point(pch, size)
plot(mpg, disp, col=factor(cyl))
legend = sort(unique(cyl)),
col=1:3, pch=1)
# discrete case
aes(mpg, disp, color = factor(cyl))) +
geom_point() +
labs(color = "Number of Cylinders")
text() geom_text()
? theme(title = element_text(size=8),
legend.title = element_blank(),
legend.position = "none",
legend.key = element_blank(),
plot.title = element_text(hjust = 0.5),
plot.sybtitle = element_text(size = 8))
las in plot(), barplot()
text(x, y, labs, srt=45)
theme(axis.text.x = element_text(angle = 90))
matplot() geom_line() + geom_point()
plot(type = 'l'), points() geom_line() + geom_point()
barplot() geom_bar()
par(mfrow) facet_grid()

labs for x and y axes

x and y labels or the Labels part of the cheatsheet

You can set the labels with xlab() and ylab(), or make it part of the scale_*.* call.

labs(x = "sample size", y = "ngenes (glmnet)")

scale_x_discrete(name="sample size")
scale_y_continuous(name="ngenes (glmnet)", limits=c(100, 500))

Change tick mark labels

ggplot2 axis ticks : A guide to customize tick marks and labels

name-value pairs

See several examples (color, fill, size, ...) from opioid prescribing habits in texas.


Add Footnote to ggplot2

Prevent sorting of x labels

See Change the order of a discrete x scale.

The idea is to set the levels of x variable.

junk   # n x 2 table
colnames(junk) <- c("gset", "boot")
junk$gset <- factor(junk$gset, levels = as.character(junk$gset))
ggplot(data = junk, aes(x = gset, y = boot, group = 1)) + 
  geom_line() + 
  theme(axis.text.x=element_text(color = "black", angle=30, vjust=.8, hjust=0.8))


Legend title

  • labs() function
    p <- ggplot(df, aes(x, y)) + geom_point(aes(colour = z))
    p + labs(x = "X axis", y = "Y axis", colour = "Colour\nlegend")
           # Use color to represent the legend title
    p <- ggplot(df) + geom_col(aes(x=x, y=y, fill=cat), position = "dodge") 
    p + labs(x = "X", y = "Y", fill = "Category")
           # Use fill to represent the legend title
  • scale_colour_manual()
    scale_colour_manual("Treatment", values = c("black", "red"))
  • scale_color_discrete() and scale_shape_discrete(). See Combine colors and shapes in legend.
    df <- data.frame(x = 1:3, y = 1:3, z = c("a", "b", "c"))
    ggplot(df, aes(x, y)) + geom_point(aes(shape = z, colour = z), size=5) + 
      scale_color_discrete('new title') + scale_shape_discrete('new title')

Remove NA factor level from color legend

Use na.translate = F in scale_color_XXX(). See ggplot: remove NA factor level in legend

Layout: move the legend from right to top/bottom of the plot or inside the plot or hide it

gg + theme(legend.position = "top")

# Useful in the boxplot case
gg + theme(legend.position="none")

gg + theme(legend.position = c(0.87, 0.25)) +
     guides(colour = guide_legend(nrow = 1))

# Customize the edge color and background color
gapminder %>%
  ggplot(aes(gdpPercap,lifeExp, color=continent)) +
  geom_point() +
  theme(legend.position = c(0.87, 0.25),
        legend.background = element_rect(fill = "white", color = "black"))

Guide functions for finer control (legend, axis, color scales)

  • The guide functions, guide_colourbar() and guide_legend(), offer additional control over the fine details of the legend.
  • guide_legend() allows the modification of legends for scales, including fill, color, and shape. This function can be used in scale_fill_manual(), scale_fill_continuous(), ... functions.
    scale_fill_manual(values=c("orange", "blue"), 
                      guide=guide_legend(title = "My Legend Title",
                                         nrow=1,  # multiple items in one row
                                         label.position = "top", # move the texts on top of the color key
                                         keywidth=2.5)) # increase the color key width

    The problem with the default setting is it leaves a lot of white space above and below the legend. To change the position of the entire legend to the bottom of the plot, we use theme().

    theme(legend.position = 'bottom')
  • guides()
    • Legend. For example, to remove the legend title:
    ggplot(mtcars, aes(x = mpg, y = disp, color = factor(cyl))) +
      geom_point() +
      guides(color = guide_legend(title = NULL))
    • Axis. For example, to change the angle of the x-axis labels:
    ggplot(mtcars, aes(x = mpg, y = disp)) +
      geom_point() +
      theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
      guides(x = guide_axis(angle = 45))
    • Color scales. For example, to change the number of color breaks:
    ggplot(mtcars, aes(x = mpg, y = disp, color = hp)) +
      geom_point() +
      guides(color = guide_colorbar(nbin = 10))

Legend symbol background

ggplot() + geom_point(aes(x, y, color, size)) +
           theme(legend.key = element_blank())
           # remove the symbol background in legend

Construct a manual legend for a complicated plot

Legend size

How to Change Legend Size in ggplot2 (With Examples)

data <- data.frame(x = 1:5, y = 1:5, label = c("A", "B", "C", "D", "E"))
ggplot(data, aes(x, y, color = as.factor(label))) +
  geom_point() +
  labs(title = "Legend Size Example with Theme Modification",
       color = "Label") +
    legend.text = element_text(size = 12), 
    legend.title = element_text(size = 14)


Centered title

See the Legends part of the cheatsheet.

ggtitle("MY TITLE") +
  theme(plot.title = element_text(hjust = 0.5))


ggtitle("My title",
        subtitle = "My subtitle")


Aspect ratio


p <- ggplot(mtcars, aes(mpg, wt)) + geom_point()
p + coord_fixed() # plot is compressed horizontally
p  # fill up plot region

Time series plot

Multiple lines plot

nc <- 9
df <- data.frame(x=rep(1:5, nc), val=sample(1:100, 5*nc), 
                   variable=rep(paste0("category", 1:nc), each=5))
# plot
ggplot(data = df, aes(x=x, y=val)) + 
    geom_line(aes(colour=variable)) + 
    scale_colour_manual(values=c("#a6cee3", "#1f78b4", "#b2df8a", "#33a02c", "#fb9a99", "#e31a1c", "#fdbf6f", "#ff7f00", "#cab2d6"))

Versus old fashion

dat <- matrix(runif(40,1,20),ncol=4) # make data
matplot(dat, type = c("b"),pch=1,col = 1:4) #plot
legend("topleft", legend = 1:4, col=1:4, pch=1) # optional legend


Calendar plot in R using ggplot2

Github style calendar plot


See Scatterplot.

df <- data.frame(x=1:3, y=1:3, color=c("red", "green", "blue"))
# Use I() to set aes values to the identify of a value from your data table
ggplot(df, aes(x,y, color=I(color))) + geom_point(size=10) # no color legend
# VS
ggplot(df, aes(x,y, color=color)) + geom_point(size=10) # color is like a class label

geom_bar(), geom_col(), stat_count()

  • geom_bar: Counts the number of cases at each x position and makes the height of the bar proportional to the count (or sum of weights if supplied)
  • geom_col: Leaves the data as is and makes the height of the bar proportional to the value in the data
Function Default Statistic Purpose
geom_bar() stat_count()
df2 <- data.frame(cat = c("A", "A", "A", "B", "B", 
   "B", "B", "B", "C", "C", "C", "C", "C", "C"))
ggplot(df2, aes(x = cat)) + geom_bar()
# Same as
# barplot(table(df2$cat))
geom_col() stat_identity()
df <- data.frame(group = c("A", "B", "C"), 
                 count = c(3, 5, 6))
ggplot(df, aes(x = group, y = count)) + geom_col()
# Same as
# barplot(df$count, names.arg = df$group)
geom_col(position = 'dodge')  # same as 
geom_bar(stat = 'identity', position = 'dodge')

geom_bar() can not specify the y-axis. To specify y-axis, use geom_col().

ggplot() + geom_col(mapping = aes(x, y))

Add colors to the plot

df <- data.frame(group = c("A", "B", "C"), 
                 count = c(3, 5, 6), 
                 fill = c("red", "green", "blue"))
ggplot(df, aes(x = group, y = count, fill = fill)) + 

Add numbers to the plot

An example

Simple example

Original Geom bar simple.png

fct_reorder() Geom bar reorder.png.

Ordered barplot and reorder()

Ordered barplot and facet



stat_smooth(), geom_smooth()

?geom_smooth, ?stat_smooth

ggplot(data = mtcars, aes(x = wt, y = mpg)) + 
  geom_point() +
  stat_smooth(method = "glm", formula = "y ~ x", 
              method.args = list(family = poisson(link = "log")), 
              se = FALSE, color = "red") +
  labs(x = "Weight", y = "Miles per gallon")

To control the smoothness, use the "span" parameter. To disable the confidence interval, use "se = F".

geom_smooth(method = 'loess', se = FALSE, span = 0.3)


df <- data.frame(
  X = seq(0, 100, by = 5),  # Pathologist estimate
  Y = seq(0, 100, by = 5) + rnorm(21, 0, 5)  # XXX prediction

# Choice 1: Calculate the lower and upper bounds of the confidence interval
df$lower_bound <- 0.863 * df$X  # 13.7% below X
df$upper_bound <- 1.137 * df$X  # 13.7% above X

# Choice 2: Constant width for the confidence band
c <- 13.7 
df$lower_bound <- df$X - c
df$upper_bound <- df$X + c

# Plotting
ggplot(df, aes(x = X, y = Y)) +
  geom_point() + 
  geom_ribbon(aes(ymin = lower_bound, ymax = upper_bound), fill = "blue", alpha = 0.2) + 
  geom_smooth(method = "lm", color = "red", se = FALSE) +
  labs(x = "Pathologist Estimate", y = "XXX Prediction") +

= geom_area() =
[ The Pfizer-Biontech Vaccine May Be A Lot More Effective Than You Think]

= Square shaped plot =
ggplot() + theme(aspect.ratio=1) # do not adjust xlim, ylim

xylim <- range(c(x, y))
ggplot() + coord_fixed(xlim=xylim, ylim=xylim) 


See also aes(..., group, ...).

Connect Paired Points with Lines in Scatterplot

Use geom_line() to create a square bracket to annotate the plot

Barchart with Significance Tests

Interaction plot

Randomized block design


Line segments, arrows and curves. See an example in geom_errorbar section below.

Cf annotate("segment", ...)

geom_errorbar(): error bars

x <- rnorm(10)
SE <- rnorm(10)
y <- 1:10

xlim <- c(-4, 4)
plot(x[1:5], 1:5, xlim=xlim, ylim=c(0+.1,6-.1), yaxs="i", xaxt = "n", ylab = "", pch = 16, las=1)
mtext("group 1", 4, las = 1, adj = 0, line = 1) # las=text rotation, adj=alignment, line=spacing
plot(x[6:10], 6:10, xlim=xlim, ylim=c(5+.1,11-.1), yaxs="i", ylab ="", pch = 16, las=1, xlab="")
arrows(x[6:10]-SE[6:10], 6:10, x[6:10]+SE[6:10], 6:10, code=3, angle=90, length=0)
mtext("group 2", 4, las = 1, adj = 0, line = 1)


  • Forest plot example using geom_errorbarh()


geom_rect(), geom_bar()

Note that we can use scale_fill_manual() to change the 'fill' colors (scheme/palette). The 'fill' parameter in geom_rect() is only used to define the discrete variable.

ggplot(data=) +
  geom_bar(aes(x=, fill=)) +
  scale_fill_manual(values = c("orange", "blue"))

geom_raster() and geom_tile()

Waterfall plot



Circle in ggplot2 ggplot(data.frame(x = 0, y = 0), aes(x, y)) + geom_point(size = 25, pch = 1)


Add a horizontal/vertical line

geom_hline(), geom_vline()


text annotations, annotate() and geom_text(): ggrepel package

    annotate("text", label="Toyota", x=3, y=100)
    annotate("segment", x = 2.5, xend = 4, y = 15, yend = 25, colour = "blue", size = 2)
    geom_text(aes(x, y, label), data, size, vjust, hjust, nudge_x)
  • Text annotations in ggplot2
    p + geom_text(aes(x = -115, y = 25,
                      label = "Map of the United States"),
                  stat = "unique")
    p + geom_label(aes(x = -115, y = 25,
                       label = "Map of the United States"),
                  stat = "unique") # include border around the text
  • Use the nudge_y parameter to avoid the overlap of the point and the text such as
    ggplot() + geom_point() +
               geom_text(aes(x, y, label), color='red', data, nudge_y=1)
  • What do hjust and vjust do when making a plot using ggplot? 0 means left-justified 1 means right-justified. This is necessary if we have multiples lines in text. By default, it will center-justified.
  • Volcano plots, EnhancedVolcano package
  • Visualization of Volcano Plots in R
  • AI
    data <- data.frame(
        gene = paste("Gene", 1:1000, sep = "_"),
        log2FoldChange = rnorm(1000),
        pvalue = runif(1000)
    data$pvalue[1:20] <- runif(20, 0, .001)
    data$padj <- p.adjust(data$pvalue, method = "BH") # Adjusted p-values
    significant_genes <- subset(data, padj < 0.05 & abs(log2FoldChange) > 1)
    ggplot(data, aes(x = log2FoldChange, y = -log10(padj))) +
        geom_point(aes(color = padj < 0.05 & abs(log2FoldChange) > 1), alpha = 0.5) +
        scale_color_manual(values = c("black", "red"), na.translate = F) +
        theme_minimal() +
        labs(title = "Volcano Plot", x = "Log2 Fold Change", y = "-Log10 Adjusted P-Value") +
            data = significant_genes,
            aes(label = gene),
            box.padding = 0.25,     # default
            point.padding = 1e-06,  # default
            max.overlaps = 10       # default

Text wrap

ggplot2 is there an easy way to wrap annotation text?

p <- ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point()

# Solution 1: Not work with Chinese characters
wrapper <- function(x, ...) paste(strwrap(x, ...), collapse = "\n")
# The a label
my_label <- "Some arbitrarily larger text"
# and finally your plot with the label
p + annotate("text", x = 4, y = 25, label = wrapper(my_label, width = 5))

# Solution 2: Not work with Chinese characters
p <- ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point()
grob1 <-  splitTextGrob("Some arbitrarily larger text")
p + annotation_custom(grob = grob1,  xmin = 3, xmax = 4, ymin = 25, ymax = 25) 

# Solution 3: stringr::str_wrap()
my_label <- "太極者無極而生。陰陽之母也。動之則分。靜之則合。無過不及。隨曲就伸。人剛我柔謂之走。我順人背謂之黏。"
p <- ggplot() + geom_point() + xlim(0, 400) + ylim(0, 300) # 400x300 e-paper
p + annotate("text", x = 0, y = 200, hjust=0, size=5,
             label = stringr::str_wrap(my_label, width =30)) +
    theme_bw () + 
    theme(panel.grid.major = element_blank(), 
          panel.grid.minor = element_blank(), 
          panel.border = element_blank(),
          axis.title = element_blank(), 
          axis.text = element_blank(),
          axis.ticks = element_blank()) 


ggtext: Improved text rendering support for ggplot2

ggforce - Annotate areas with ellipses


Other geoms

Exploring other {ggplot2} geoms


geomtextpath- Create curved text in ggplot2

Build your own geom

Fonts, icons

Lines of best fit

Lines of best fit

Save the plots -- ggsave()

ggsave(). Note svglite package is required, see R Graphics Cookbook. The svglite package provides more standards-compliant output.

By default the units of width & height is inch no matter what output formats we choose.

(3/24/2022) If I save the plot in the svg format using RStudio GUI (Export -> As as Image...) or by the svg() function, the svg plot can't be converted to a png file by ImageMagick. But if I save the plot by using the ggsave() command, the svg plot can be converted to a png file.

$ convert -resize 100% Rerrorbar.svg tmp.png
convert-im6.q16: non-conforming drawing primitive definition `path' @ error/draw.c/RenderMVGContent/4300.
$ convert -resize 100% Rerrorbar2.svg tmp.png # Works

(1/31/2022) For some reason, the text in legend in svg files generated by ggsave() looks fine in browsers but when I insert it into ppt, the word "Sensitive" becomes "Sensitiv e". However, the svg files generated by svg() command looks fine in browsers AND in ppt.

ggsave() will save a plot with the width/height based on the current graphical device if we don't specify them. That's why after we issue ggsave() it will tell us the image size (inch). So in order to have a fixed width/height, we need to specify them explicitly. See

My experience is ggsave() is better than png() because ggsave() makes the text larger when we save a file with a higher resolution.

ggsave("filename.png", object, width=8, height=4)
# vs
png("filename.png", width=1200, height=600)

We can specify dpi to increase the resolution if we use the png format (svg is not affected); see Chapter 14.5 Outputting to Bitmap (PNG/TIFF) Files from R Graphics Cookbook.

g1 <- ggplot(data = mydf) 
ggsave("myfile.png", g1, height = 7, width = 8, units = "in", dpi = 300)

I got an error - Error in loadNamespace(name) : there is no package called ‘svglite’. After I install the package, everything works fine.

ggsave("raw-output.bmp", p, width=4, height=3, dpi = 100)
# Will generate 4*100 x 3*100 pixel plot


  • For saving to "png" file, increasing dpi (from 72 to 300) will increase font & point size. dpi/ppi is not an inherent property of an image.
  • If we don't specify any parameters and without resizing the graphics device size, then "png" file created by ggsave() will contain much more pixels compared to "svg" file (e.g. 1200 vs 360).
  • How ggsave() decides width/height if a svg file was used in an Rmd file? A: 7x7 from my experiment. So the font/point size will be smaller compared to a 4x4 inch output.
  • When I created an svg file in Linux with 4x4 inch (width x height), the file is 360 x 360 pixels when I right click the file to get the properties of the file. But macOS cannot return this number nor am I able to find this number from the svg file??

Multiple pages in pdf The key is to save the plot in an object and use the print() function.

pdf("FileName", onefile = TRUE)
for(i in 1:I) {
  p <- ggplot()

graphics::smoothScatter: scatter plots with lots of points

Other tips/FAQs

Tips and tricks for working with images and figures in R Markdown documents

Ten Simple Rules for Better Figures

Ten Simple Rules for Better Figures

Five ways to improve your chart axes

Five ways to improve your chart axes

Beyond Bar and Line Graphs

Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm

Recreating the Storytelling with Data look with ggplot

Recreating the Storytelling with Data look with ggplot

ggplot2 does not appear to work when inside a function Use print() or ggsave(). When you use these functions interactively at the command line, the result is automatically printed, but in source() or inside your own functions you will need an explicit print() statement.


Add your brand to ggplot graph

You Need to Start Branding Your Graphs. Here's How, with ggplot!

Animation and gganimate


ggstatsplot: ggplot2 Based Plots with Statistical Details

Write your own ggplot2 function: rlang

Some packages depend on ggplot2

dittoSeq from Bicoonductor



plotnine: A Grammar of Graphics for Python.

plotnine is an implementation of a grammar of graphics in Python, it is based on ggplot2. The grammar allows users to compose plots by explicitly mapping data to the visual objects that make up the plot.

The Hitchhiker’s Guide to Plotnine