# to read files faster
library(readr)
# to munge data
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# to make dplyr even better
library(magrittr)
# to tidy data
library(tidyr)
##
## Attaching package: 'tidyr'
##
## The following object is masked from 'package:magrittr':
##
## extract
# to plot
library(ggplot2)
# to make ggplot look even better
library(ggthemes)
# to get even more out of ggplot2
library(cowplot)
##
## Attaching package: 'cowplot'
##
## The following object is masked from 'package:ggplot2':
##
## ggsave
R is a language for statistical computing and graphics. In contrast to matlab or mathematica it’s free and open source. Hence, an ecosystem of diverse packages is growing rapidly. These packages are designed to make the R experience better, examples are dplyr and tidyr, to improve R’s plotting capabilities, e.g. ggplot2, or to address the specific needs of countless fields such as: Finance, Astronomy, Linguistics, …, and of course Biology. A lot of biological packages especially those that deal with high throughput genomic data are associated with Bioconductor (https://www.bioconductor.org/). People working with structures should check out Bio3d (http://thegrantlab.org/bio3d/index.php).
Before we get started I like to introduce some basic navigation commands:
Three more commands that come in very handy are:
The basic structure are vectors. They come either in the form of lists or in the form of atomic vectors. The difference is that atomic vectors contain elements of one type only whereas lists can be mixed.
atomic_vector_1 <- c(0, 1, 10.1, 10**3)
atomic_vector_1
## [1] 0.0 1.0 10.1 1000.0
atomic_vector_2 <- c("hello", "world")
atomic_vector_2
## [1] "hello" "world"
list_1 <- list("a bit of this", "...", 2+2, atomic_vector_1, atomic_vector_2)
list_1
## [[1]]
## [1] "a bit of this"
##
## [[2]]
## [1] "..."
##
## [[3]]
## [1] 4
##
## [[4]]
## [1] 0.0 1.0 10.1 1000.0
##
## [[5]]
## [1] "hello" "world"
list_2 <- list(letters[1:8], 1:8, 42, "Life", "The Universe", "And Everything")
list_2
## [[1]]
## [1] "a" "b" "c" "d" "e" "f" "g" "h"
##
## [[2]]
## [1] 1 2 3 4 5 6 7 8
##
## [[3]]
## [1] 42
##
## [[4]]
## [1] "Life"
##
## [[5]]
## [1] "The Universe"
##
## [[6]]
## [1] "And Everything"
Atomic vectors are created using c() and lists are created using list(). When mixing different types of data in an atomic vector they will be coerced:
atomic_vector_3 <- c(42, "Life", "The Universe", "And Everything")
atomic_vector_3
## [1] "42" "Life" "The Universe" "And Everything"
In the example above 42 will be treated as a string.
Vectors have three properties: - type - length - attributes
typeof(list_1)
## [1] "list"
length(list_1)
## [1] 5
Attributes are used to store metadata
attr(atomic_vector_3, "my_attribute") = "Hitchhiker's guide"
str(atomic_vector_3)
## atomic [1:4] 42 Life The Universe And Everything
## - attr(*, "my_attribute")= chr "Hitchhiker's guide"
logical_vector <- c(TRUE, FALSE, FALSE, TRUE, TRUE)
# checking the type
typeof(logical_vector)
## [1] "logical"
str(logical_vector)
## logi [1:5] TRUE FALSE FALSE TRUE TRUE
# it's also possible to verify the type by:
is.logical(logical_vector)
## [1] TRUE
# the type can also be changed
x <- as.numeric(logical_vector)
str(x)
## num [1:5] 1 0 0 1 1
Note, coercion from less to more flexible types (logical < integer < double < character) usually works, the other way around will raise errors or undesired results.
For example:
y <- c("six", "seven")
as.numeric(y)
Changing a logical vector no numeric turns TRUEs into 1s and FALSEs into 0. This allows for quick analysis of logical data:
x
## [1] 1 0 0 1 1
# counting all TRUEs
sum(x)
## [1] 3
# ratio TRUE/FALSE
mean(x)
## [1] 0.6
It’s important to remember that many operations will coerce the type automatically.
As in Pandas the main workhorse are dataframes. (For this tutorial we ignore matrices, arrays, and factors.) A dataframe is a list of equal-length vectors. Let’s have a look at the frog tongue adhesion data to illustrate this.
whole_frog_file <- read_csv("frog_tongue_adhesion.csv")
## Warning: 93 parsing failures.
## row col expected actual
## 1 -- 2 columns 5 columns
## 2 -- 2 columns 1 columns
## 3 -- 2 columns 1 columns
## 4 -- 2 columns 1 columns
## 5 -- 2 columns 1 columns
## ... ... ......... .........
## .See problems(...) for more details.
# read in the csv file and skip the comments
frog <- read_csv("frog_tongue_adhesion.csv", skip = 14)
head(frog)
## Source: local data frame [6 x 15]
##
## date ID trial number impact force (mN) impact time (ms)
## (date) (chr) (int) (int) (int)
## 1 2013-02-26 I 3 1205 46
## 2 2013-02-26 I 4 2527 44
## 3 2013-03-01 I 1 1745 34
## 4 2013-03-01 I 2 1556 41
## 5 2013-03-01 I 3 493 36
## 6 2013-03-01 I 4 2276 31
## Variables not shown: impact force / body weight (dbl), adhesive force (mN)
## (int), time frog pulls on target (ms) (int), adhesive force / body
## weight (dbl), adhesive impulse (N-s) (dbl), total contact area (mm2)
## (int), contact area without mucus (mm2) (int), contact area with mucus /
## contact area without mucus (dbl), contact pressure (Pa) (int), adhesive
## strength (Pa) (int)
Let’s have a look at some of the dataframe’s properties:
str(frog)
## Classes 'tbl_df', 'tbl' and 'data.frame': 80 obs. of 15 variables:
## $ date : Date, format: "2013-02-26" "2013-02-26" ...
## $ ID : chr "I" "I" "I" "I" ...
## $ trial number : int 3 4 1 2 3 4 1 2 3 4 ...
## $ impact force (mN) : int 1205 2527 1745 1556 493 2276 556 1928 2641 1897 ...
## $ impact time (ms) : int 46 44 34 41 36 31 43 46 50 41 ...
## $ impact force / body weight : num 1.95 4.08 2.82 2.51 0.8 3.68 0.9 3.11 4.27 3.06 ...
## $ adhesive force (mN) : int -785 -983 -850 -455 -974 -592 -512 -804 -690 -462 ...
## $ time frog pulls on target (ms) : int 884 248 211 1025 499 969 835 508 491 839 ...
## $ adhesive force / body weight : num 1.27 1.59 1.37 0.74 1.57 0.96 0.83 1.3 1.12 0.75 ...
## $ adhesive impulse (N-s) : num -0.29 -0.181 -0.157 -0.17 -0.423 -0.176 -0.285 -0.285 -0.239 -0.328 ...
## $ total contact area (mm2) : int 387 101 83 330 245 341 359 246 269 266 ...
## $ contact area without mucus (mm2) : int 70 94 79 158 216 106 110 178 224 176 ...
## $ contact area with mucus / contact area without mucus: num 0.82 0.07 0.05 0.52 0.12 0.69 0.69 0.28 0.17 0.34 ...
## $ contact pressure (Pa) : int 3117 24923 21020 4718 2012 6676 1550 7832 9824 7122 ...
## $ adhesive strength (Pa) : int -2030 -9695 -10239 -1381 -3975 -1737 -1427 -3266 -2568 -1733 ...
# column and row names
colnames(frog)
## [1] "date"
## [2] "ID"
## [3] "trial number"
## [4] "impact force (mN)"
## [5] "impact time (ms)"
## [6] "impact force / body weight"
## [7] "adhesive force (mN)"
## [8] "time frog pulls on target (ms)"
## [9] "adhesive force / body weight"
## [10] "adhesive impulse (N-s)"
## [11] "total contact area (mm2)"
## [12] "contact area without mucus (mm2)"
## [13] "contact area with mucus / contact area without mucus"
## [14] "contact pressure (Pa)"
## [15] "adhesive strength (Pa)"
# or
names(frog)
## [1] "date"
## [2] "ID"
## [3] "trial number"
## [4] "impact force (mN)"
## [5] "impact time (ms)"
## [6] "impact force / body weight"
## [7] "adhesive force (mN)"
## [8] "time frog pulls on target (ms)"
## [9] "adhesive force / body weight"
## [10] "adhesive impulse (N-s)"
## [11] "total contact area (mm2)"
## [12] "contact area without mucus (mm2)"
## [13] "contact area with mucus / contact area without mucus"
## [14] "contact pressure (Pa)"
## [15] "adhesive strength (Pa)"
# and
rownames(frog)
## [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14"
## [15] "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28"
## [29] "29" "30" "31" "32" "33" "34" "35" "36" "37" "38" "39" "40" "41" "42"
## [43] "43" "44" "45" "46" "47" "48" "49" "50" "51" "52" "53" "54" "55" "56"
## [57] "57" "58" "59" "60" "61" "62" "63" "64" "65" "66" "67" "68" "69" "70"
## [71] "71" "72" "73" "74" "75" "76" "77" "78" "79" "80"
# Since there are no row names the row numbers are displayed. Note, the first
# line is 1, not 0!
# number of colmuns
length(frog)
## [1] 15
#or
ncol(frog)
## [1] 15
# number of rows
nrow(frog)
## [1] 80
# summary of data
summary(frog)
## date ID trial number impact force (mN)
## Min. :2013-02-26 Length:80 Min. :1.0 Min. : 22.0
## 1st Qu.:2013-03-18 Class :character 1st Qu.:1.0 1st Qu.: 456.0
## Median :2013-05-04 Mode :character Median :2.0 Median : 601.0
## Mean :2013-04-30 Mean :2.4 Mean : 801.7
## 3rd Qu.:2013-06-15 3rd Qu.:3.0 3rd Qu.:1005.0
## Max. :2013-06-26 Max. :5.0 Max. :2641.0
## impact time (ms) impact force / body weight adhesive force (mN)
## Min. : 6.00 Min. :0.170 Min. :-983.0
## 1st Qu.: 29.75 1st Qu.:1.470 1st Qu.:-567.8
## Median : 34.00 Median :3.030 Median :-335.0
## Mean : 39.06 Mean :2.920 Mean :-397.8
## 3rd Qu.: 42.00 3rd Qu.:4.277 3rd Qu.:-224.5
## Max. :143.00 Max. :6.490 Max. : -92.0
## time frog pulls on target (ms) adhesive force / body weight
## Min. : 189.0 Min. :0.220
## 1st Qu.: 682.2 1st Qu.:0.990
## Median : 927.0 Median :1.320
## Mean :1132.5 Mean :1.445
## 3rd Qu.:1381.2 3rd Qu.:1.772
## Max. :4251.0 Max. :3.400
## adhesive impulse (N-s) total contact area (mm2)
## Min. :-0.76800 Min. : 19.0
## 1st Qu.:-0.27725 1st Qu.:104.8
## Median :-0.16500 Median :134.5
## Mean :-0.18746 Mean :166.5
## 3rd Qu.:-0.08125 3rd Qu.:238.2
## Max. :-0.00100 Max. :455.0
## contact area without mucus (mm2)
## Min. : 0.00
## 1st Qu.: 16.75
## Median : 43.00
## Mean : 61.40
## 3rd Qu.: 92.50
## Max. :260.00
## contact area with mucus / contact area without mucus
## Min. :0.010
## 1st Qu.:0.280
## Median :0.665
## Mean :0.569
## 3rd Qu.:0.885
## Max. :1.000
## contact pressure (Pa) adhesive strength (Pa)
## Min. : 397 Min. :-17652
## 1st Qu.: 2579 1st Qu.: -3443
## Median : 4678 Median : -2186
## Mean : 6073 Mean : -3006
## 3rd Qu.: 7250 3rd Qu.: -1736
## Max. :28641 Max. : -678
A lot of data at our disposal needs to be prepared to be useful. The packages tidyr, dplyr, and magrittr help us to transform the data in a way that is useful. They are written in C++, hence, they are much faster than base R operations. Thanks to the piping (%>%, see below for examples) that was brought to a new level by magrittr, which is now largely incorporated in dplyr, code is surprisingly easy to read.
Let’s have another look at the frog data. The first column describes the date in the format Year-Month-Day. Let’s assume we wanted to compare the results from different months, wouldn’t it be lovely to have three date columns – year, month, and day – instead?
frog %>% separate(date, c("Year", "Month", "Day"), sep = "-", remove = TRUE)
## Source: local data frame [80 x 17]
##
## Year Month Day ID trial number impact force (mN) impact time (ms)
## (chr) (chr) (chr) (chr) (int) (int) (int)
## 1 2013 02 26 I 3 1205 46
## 2 2013 02 26 I 4 2527 44
## 3 2013 03 01 I 1 1745 34
## 4 2013 03 01 I 2 1556 41
## 5 2013 03 01 I 3 493 36
## 6 2013 03 01 I 4 2276 31
## 7 2013 03 05 I 1 556 43
## 8 2013 03 05 I 2 1928 46
## 9 2013 03 05 I 3 2641 50
## 10 2013 03 05 I 4 1897 41
## .. ... ... ... ... ... ... ...
## Variables not shown: impact force / body weight (dbl), adhesive force (mN)
## (int), time frog pulls on target (ms) (int), adhesive force / body
## weight (dbl), adhesive impulse (N-s) (dbl), total contact area (mm2)
## (int), contact area without mucus (mm2) (int), contact area with mucus /
## contact area without mucus (dbl), contact pressure (Pa) (int), adhesive
## strength (Pa) (int)
frog
## Source: local data frame [80 x 15]
##
## date ID trial number impact force (mN) impact time (ms)
## (date) (chr) (int) (int) (int)
## 1 2013-02-26 I 3 1205 46
## 2 2013-02-26 I 4 2527 44
## 3 2013-03-01 I 1 1745 34
## 4 2013-03-01 I 2 1556 41
## 5 2013-03-01 I 3 493 36
## 6 2013-03-01 I 4 2276 31
## 7 2013-03-05 I 1 556 43
## 8 2013-03-05 I 2 1928 46
## 9 2013-03-05 I 3 2641 50
## 10 2013-03-05 I 4 1897 41
## .. ... ... ... ... ...
## Variables not shown: impact force / body weight (dbl), adhesive force (mN)
## (int), time frog pulls on target (ms) (int), adhesive force / body
## weight (dbl), adhesive impulse (N-s) (dbl), total contact area (mm2)
## (int), contact area without mucus (mm2) (int), contact area with mucus /
## contact area without mucus (dbl), contact pressure (Pa) (int), adhesive
## strength (Pa) (int)
magrittr allows us to use the %<>% operator which enables us to overwrite the input
frog %<>% separate(date, c("Year", "Month", "Day"), sep = "-", remove = TRUE)
frog
## Source: local data frame [80 x 17]
##
## Year Month Day ID trial number impact force (mN) impact time (ms)
## (chr) (chr) (chr) (chr) (int) (int) (int)
## 1 2013 02 26 I 3 1205 46
## 2 2013 02 26 I 4 2527 44
## 3 2013 03 01 I 1 1745 34
## 4 2013 03 01 I 2 1556 41
## 5 2013 03 01 I 3 493 36
## 6 2013 03 01 I 4 2276 31
## 7 2013 03 05 I 1 556 43
## 8 2013 03 05 I 2 1928 46
## 9 2013 03 05 I 3 2641 50
## 10 2013 03 05 I 4 1897 41
## .. ... ... ... ... ... ... ...
## Variables not shown: impact force / body weight (dbl), adhesive force (mN)
## (int), time frog pulls on target (ms) (int), adhesive force / body
## weight (dbl), adhesive impulse (N-s) (dbl), total contact area (mm2)
## (int), contact area without mucus (mm2) (int), contact area with mucus /
## contact area without mucus (dbl), contact pressure (Pa) (int), adhesive
## strength (Pa) (int)
tidyr also allows the user to fuse columns:
frog %>% unite(date, Year, Month, Day, sep = "_" )
## Source: local data frame [80 x 15]
##
## date ID trial number impact force (mN) impact time (ms)
## (chr) (chr) (int) (int) (int)
## 1 2013_02_26 I 3 1205 46
## 2 2013_02_26 I 4 2527 44
## 3 2013_03_01 I 1 1745 34
## 4 2013_03_01 I 2 1556 41
## 5 2013_03_01 I 3 493 36
## 6 2013_03_01 I 4 2276 31
## 7 2013_03_05 I 1 556 43
## 8 2013_03_05 I 2 1928 46
## 9 2013_03_05 I 3 2641 50
## 10 2013_03_05 I 4 1897 41
## .. ... ... ... ... ...
## Variables not shown: impact force / body weight (dbl), adhesive force (mN)
## (int), time frog pulls on target (ms) (int), adhesive force / body
## weight (dbl), adhesive impulse (N-s) (dbl), total contact area (mm2)
## (int), contact area without mucus (mm2) (int), contact area with mucus /
## contact area without mucus (dbl), contact pressure (Pa) (int), adhesive
## strength (Pa) (int)
The spaces, parentheses, and division sign in the column names are not very helpful. Let’s get rid of them first before we move on:
# regular expressions exist in R too!
temp_names <- gsub(( " |\\/|\\("), "_", colnames(frog))
temp_names <- gsub("_+", "_", temp_names)
temp_names <- gsub("\\)", "", temp_names)
colnames(frog) <- temp_names
frog %>% head
## Source: local data frame [6 x 17]
##
## Year Month Day ID trial_number impact_force_mN impact_time_ms
## (chr) (chr) (chr) (chr) (int) (int) (int)
## 1 2013 02 26 I 3 1205 46
## 2 2013 02 26 I 4 2527 44
## 3 2013 03 01 I 1 1745 34
## 4 2013 03 01 I 2 1556 41
## 5 2013 03 01 I 3 493 36
## 6 2013 03 01 I 4 2276 31
## Variables not shown: impact_force_body_weight (dbl), adhesive_force_mN
## (int), time_frog_pulls_on_target_ms (int), adhesive_force_body_weight
## (dbl), adhesive_impulse_N-s (dbl), total_contact_area_mm2 (int),
## contact_area_without_mucus_mm2 (int),
## contact_area_with_mucus_contact_area_without_mucus (dbl),
## contact_pressure_Pa (int), adhesive_strength_Pa (int)
You are already familiar with the concept of tidy data. Let’s apply it to the modified frog dataframe:
tidy_frog <-
frog %>% gather(experiment, result, impact_force_mN:adhesive_strength_Pa)
tidy_frog
## Source: local data frame [960 x 7]
##
## Year Month Day ID trial_number experiment result
## (chr) (chr) (chr) (chr) (int) (fctr) (dbl)
## 1 2013 02 26 I 3 impact_force_mN 1205
## 2 2013 02 26 I 4 impact_force_mN 2527
## 3 2013 03 01 I 1 impact_force_mN 1745
## 4 2013 03 01 I 2 impact_force_mN 1556
## 5 2013 03 01 I 3 impact_force_mN 493
## 6 2013 03 01 I 4 impact_force_mN 2276
## 7 2013 03 05 I 1 impact_force_mN 556
## 8 2013 03 05 I 2 impact_force_mN 1928
## 9 2013 03 05 I 3 impact_force_mN 2641
## 10 2013 03 05 I 4 impact_force_mN 1897
## .. ... ... ... ... ... ... ...
dplyr uses seven key verbs to do stuff:
Remember, you can find out more about these using ?command. Let’s arbitrarily look at a few things to illustrate these capabilities. For this we use the wide (not-tidied) dataframe
dry_contact_area <-
frog %>%
# Select a subset of the data
select(Year:trial_number,
total_contact_area_mm2,
contact_area_without_mucus_mm2) %>%
# calculate area with mucus and add an additional column
mutate(dry_contact_area_mm2 =
total_contact_area_mm2 - contact_area_without_mucus_mm2) %>%
group_by(ID) %>%
summarize(mean_dry_contact = mean(dry_contact_area_mm2),
median_dry_contact = median(dry_contact_area_mm2),
min_dry_contact = min(dry_contact_area_mm2),
max_dry_contact = max(dry_contact_area_mm2),
sd_dry_contact = sd(dry_contact_area_mm2),
counts = n())
dry_contact_area
## Source: local data frame [4 x 7]
##
## ID mean_dry_contact median_dry_contact min_dry_contact
## (chr) (dbl) (dbl) (int)
## 1 I 128.55 90.5 4
## 2 II 163.50 146.5 1
## 3 III 56.40 61.0 -95
## 4 IV 71.85 63.0 -17
## Variables not shown: max_dry_contact (int), sd_dry_contact (dbl), counts
## (int)
There is some potentially useful information in the comments. Let’s create an new dataframe with this data:
names <- c("ID", "age_group", "svl", "weight")
ID <- c("I", "II", "III", "IV")
age_group <- c("adult", "adult", "juvenile", "juvenile")
svl <- c(63, 70, 28, 31)
weight <- c(63.1, 72.7, 12.7, 12.7)
frog_characteristics <- data.frame(ID, age_group, svl, weight)
colnames(frog_characteristics) <- names
frog_characteristics
## ID age_group svl weight
## 1 I adult 63 63.1
## 2 II adult 70 72.7
## 3 III juvenile 28 12.7
## 4 IV juvenile 31 12.7
The join function allows us to combine this data with our dry_contact_area
dry_contact_area_characteristics <-
dry_contact_area %>% inner_join(frog_characteristics)
## Joining by: "ID"
## Warning in inner_join_impl(x, y, by$x, by$y): joining character vector and
## factor, coercing into character vector
dry_contact_area_characteristics
## Source: local data frame [4 x 10]
##
## ID mean_dry_contact median_dry_contact min_dry_contact
## (chr) (dbl) (dbl) (int)
## 1 I 128.55 90.5 4
## 2 II 163.50 146.5 1
## 3 III 56.40 61.0 -95
## 4 IV 71.85 63.0 -17
## Variables not shown: max_dry_contact (int), sd_dry_contact (dbl), counts
## (int), age_group (fctr), svl (dbl), weight (dbl)
Let’s add it also to the frog frame
new_frog <- frog %>%
mutate(dry_contact_area_mm2 =
total_contact_area_mm2 - contact_area_without_mucus_mm2) %>%
inner_join(frog_characteristics) %>%
# rearrange the columns
select(Year:trial_number, age_group:weight,
impact_force_mN:dry_contact_area_mm2) %>%
gather(experiment, result, impact_force_mN:dry_contact_area_mm2)
## Joining by: "ID"
## Warning in inner_join_impl(x, y, by$x, by$y): joining character vector and
## factor, coercing into character vector
new_frog
## Source: local data frame [1,040 x 10]
##
## Year Month Day ID trial_number age_group svl weight
## (chr) (chr) (chr) (chr) (int) (fctr) (dbl) (dbl)
## 1 2013 02 26 I 3 adult 63 63.1
## 2 2013 02 26 I 4 adult 63 63.1
## 3 2013 03 01 I 1 adult 63 63.1
## 4 2013 03 01 I 2 adult 63 63.1
## 5 2013 03 01 I 3 adult 63 63.1
## 6 2013 03 01 I 4 adult 63 63.1
## 7 2013 03 05 I 1 adult 63 63.1
## 8 2013 03 05 I 2 adult 63 63.1
## 9 2013 03 05 I 3 adult 63 63.1
## 10 2013 03 05 I 4 adult 63 63.1
## .. ... ... ... ... ... ... ... ...
## Variables not shown: experiment (fctr), result (dbl)
ggplot2 uses the concept “grammar of graphics”. Let’s illustrate this by plotting a few things.
Examples for one variable:
a <- ggplot(frog, aes(contact_pressure_Pa))
a + geom_area(stat = "bin")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
a + geom_histogram()
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
a + geom_dotplot()
## stat_bindot: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
# the histogram will be saved as:
plot1 <- a + geom_histogram() + xlab("") + ylab("")
Examples for two continuous variables:
b <- ggplot(new_frog, aes(x = "impact_force_mN-s", y = "contact_pressure_Pa"))
b + geom_jitter()
# add color by ID
b + geom_jitter(aes(color = factor(age_group)))
# this plot will be saved as:
plot2 <- b + geom_jitter(aes(color = factor(age_group))) + xlab("") + ylab("")
Examples for discrete X and continuous Y:
c + geom_boxplot()
c + geom_tufteboxplot() + theme_tufte()
c + geom_violin() + geom_tufteboxplot() + theme_tufte()
plot3 <- c + geom_violin() + geom_tufteboxplot() + theme_tufte() + xlab(“”) + ylab(“”)
cowplot enables to create a figure out of the saved plots
plot_grid(plot1, plot2, plot3, labels = LETTERS[1:3], align = "h", nrow = 3)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.