We want to count the number of species (sp) in each site. Here we’ll use a dataset that is so small that we can count by eye. But in real life you’ll need to approach this problem programatically. Here is how.

  • Setup
library(tidyverse)
## -- Attaching packages -------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.0     v purrr   0.2.5
## v tibble  2.0.0     v dplyr   0.7.8
## v tidyr   0.8.2     v stringr 1.3.1
## v readr   1.3.1     v forcats 0.3.0
## -- Conflicts ----------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
  • Data: Site A has only one species; site B has three species.
dataset <- tibble::tribble(
  ~site,  ~sp, ~other_vars,
    "A", "sp1",          1,
    "A", "sp1",          1,
    "A", "sp1",          2,
    "B", "sp1",          3,
    "B", "sp2",          4,
    "B", "sp3",          5
)
dataset
## # A tibble: 6 x 3
##   site  sp    other_vars
##   <chr> <chr>      <dbl>
## 1 A     sp1            1
## 2 A     sp1            1
## 3 A     sp1            2
## 4 B     sp1            3
## 5 B     sp2            4
## 6 B     sp3            5

Count

  • Option 1: Expressive way. Group the data by site and count the unique occurrences of sp.
n_sp <- dataset %>% 
  group_by(site) %>% 
  summarise(n = n_distinct(sp))
n_sp
## # A tibble: 2 x 2
##   site      n
##   <chr> <int>
## 1 A         1
## 2 B         3
  • Option 2: A bit more cryptic way: Select the relevant columns; get the unique combinations of values; count the number of rows by site.
dataset %>% 
  select(site, sp) %>% 
  unique() %>% 
  count(site)
## # A tibble: 2 x 2
##   site      n
##   <chr> <int>
## 1 A         1
## 2 B         3

Warning: This is wrong. If you don’t understand why, then use the more expressive approach (option 1 above).

dataset %>% 
  select(site, sp) %>% 
  unique() %>% 
  # This is wrong! You should count by site -- not species.
  count(sp)
## # A tibble: 3 x 2
##   sp        n
##   <chr> <int>
## 1 sp1       2
## 2 sp2       1
## 3 sp3       1

Add count

How can you add the count to the original dataset?

site_species <- dataset %>% 
  select(site, sp) %>% 
  unique() %>% 
  add_count(site)
site_species
## # A tibble: 4 x 3
##   site  sp        n
##   <chr> <chr> <int>
## 1 A     sp1       1
## 2 B     sp1       3
## 3 B     sp2       3
## 4 B     sp3       3

This is particularly useful when you want to create plots.

library(ggplot2)

to_plot <- site_species %>% mutate(site_sp = interaction(site, sp))

ggplot(to_plot, aes(site_sp, n, color = sp)) +
  geom_col()