We want to count the number of species (sp
) in each site
. Here we’ll use a dataset that is so small that we can count by eye. But in real life you’ll need to approach this problem programatically. Here is how.
- Setup
library(tidyverse)
## -- Attaching packages -------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.0 v purrr 0.2.5
## v tibble 2.0.0 v dplyr 0.7.8
## v tidyr 0.8.2 v stringr 1.3.1
## v readr 1.3.1 v forcats 0.3.0
## -- Conflicts ----------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
- Data: Site A has only one species; site B has three species.
dataset <- tibble::tribble(
~site, ~sp, ~other_vars,
"A", "sp1", 1,
"A", "sp1", 1,
"A", "sp1", 2,
"B", "sp1", 3,
"B", "sp2", 4,
"B", "sp3", 5
)
dataset
## # A tibble: 6 x 3
## site sp other_vars
## <chr> <chr> <dbl>
## 1 A sp1 1
## 2 A sp1 1
## 3 A sp1 2
## 4 B sp1 3
## 5 B sp2 4
## 6 B sp3 5
Count
- Option 1: Expressive way. Group the data by
site
and count the unique occurrences ofsp
.
n_sp <- dataset %>%
group_by(site) %>%
summarise(n = n_distinct(sp))
n_sp
## # A tibble: 2 x 2
## site n
## <chr> <int>
## 1 A 1
## 2 B 3
- Option 2: A bit more cryptic way: Select the relevant columns; get the unique combinations of values; count the number of rows by
site
.
dataset %>%
select(site, sp) %>%
unique() %>%
count(site)
## # A tibble: 2 x 2
## site n
## <chr> <int>
## 1 A 1
## 2 B 3
Warning: This is wrong. If you don’t understand why, then use the more expressive approach (option 1 above).
dataset %>%
select(site, sp) %>%
unique() %>%
# This is wrong! You should count by site -- not species.
count(sp)
## # A tibble: 3 x 2
## sp n
## <chr> <int>
## 1 sp1 2
## 2 sp2 1
## 3 sp3 1
Add count
How can you add the count to the original dataset?
site_species <- dataset %>%
select(site, sp) %>%
unique() %>%
add_count(site)
site_species
## # A tibble: 4 x 3
## site sp n
## <chr> <chr> <int>
## 1 A sp1 1
## 2 B sp1 3
## 3 B sp2 3
## 4 B sp3 3
This is particularly useful when you want to create plots.
library(ggplot2)
to_plot <- site_species %>% mutate(site_sp = interaction(site, sp))
ggplot(to_plot, aes(site_sp, n, color = sp)) +
geom_col()