git clone https://github.com/mattocci27/phy-fun-div.git
Diversity calculation
Course materials for 2024-11-10 AFEC at XTBG.
1 Objectives
- Prepare species \(\times\) site matrices and trait data from csv files.
- Calculate diversity indices.
2 Prerequisites
Clone this repo.
The following command (probably) installs all the R packages required for this course.
::restore() renv
Otherwise, please install those packages manually without using renv
(I might miss some packages).
install.packages("picante")
install.packages("FD")
install.packages("tidyverse")
install.packages("rmarkdown")
install.packages("DT")
install.packages("quarto")
Load some packages.
library(tidyverse)
library(picante)
library(FD)
3 Data
3.1 Community
First, we import the community data.
<- read_csv("data/samp.csv") d
- I prefer
read_csv
butread.csv
is fine too. - Our working directory is
phy-fun-div
, and the relative path tosamp.csv
isdata/samp.csv
.
We can also use here
package to specify the absolute path to the file.
::here("data/samp.csv") here
[1] "/Users/mattocci/Dropbox/5-Tools/ghq/github.com/mattocci27/phy-fun-div/data/samp.csv"
<- read_csv(here::here("data/samp.csv")) d
but don’t do this:
<- read_csv("/Users/mattocci/Dropbox/5-Tools/ghq/github.com/mattocci27/phy-fun-div/data/samp.csv") d
because absolute paths will be different on different computers (lack of reproducibility). It is also uncomfortable to make file system structure public (security risk).
Now let’s look at the community data.
d
# A tibble: 40 × 3
Site Species abund
<chr> <chr> <dbl>
1 Site1 Illicium_macranthum 1
2 Site1 Manglietia_insignis 0
3 Site1 Michelia_floribunda 0
4 Site1 Beilschmiedia_robusta 0
5 Site1 Neolitsea_chuii 0
6 Site1 Lindera_thomsonii 0
7 Site1 Actinodaphne_forrestii 0
8 Site1 Machilus_yunnanensis 0
9 Site2 Illicium_macranthum 1
10 Site2 Manglietia_insignis 2
# ℹ 30 more rows
::datatable(d) DT
Then, we want to make a species \(\times\) site matrix. tapply
is a useful function here.
tapply(d$abund, d$Species, sum)
Actinodaphne_forrestii Beilschmiedia_robusta Illicium_macranthum
4 2 5
Lindera_thomsonii Machilus_yunnanensis Manglietia_insignis
5 2 3
Michelia_floribunda Neolitsea_chuii
2 1
<- tapply(d$abund, list(d$Site, d$Species), sum)
samp samp
Actinodaphne_forrestii Beilschmiedia_robusta Illicium_macranthum
Site1 0 0 1
Site2 0 2 1
Site3 2 0 1
Site4 2 0 1
Site5 0 0 1
Lindera_thomsonii Machilus_yunnanensis Manglietia_insignis
Site1 0 0 0
Site2 0 0 2
Site3 2 2 0
Site4 2 0 1
Site5 1 0 0
Michelia_floribunda Neolitsea_chuii
Site1 0 0
Site2 2 0
Site3 0 0
Site4 0 0
Site5 0 1
class(samp)
[1] "matrix" "array"
3.2 Phylogeny
<- read.tree("data/dummy_tree.newick")
phylo plot(phylo)
3.3 Traits
Abbreviation | Trait | Unit |
---|---|---|
LMA | Leaf mass per area | g m-2 |
LL | Leaf lifespans (longevity) | months |
Amass | Maximum photosynthetic rates per unit mass | nnoml g-1 s-1 |
Rmass | Dark respiration rates per unit mass | nnoml g-1 s-1 |
Nmass | Leaf nitrogen per unit mass | % |
Pmass | Leaf phosphorus per unit mass | % |
WD | Wood density | g cm-3 |
SM | Seed dry mass | mg |
<- read_csv("data/dummy_trait.csv")
trait
::datatable(trait) DT
3.4 Check how traits look like first
<- trait |>
trait_long pivot_longer(lma:sm, names_to = "trait")
trait_long
# A tibble: 616 × 3
sp trait value
<chr> <chr> <dbl>
1 Acer_campbellii lma 145.
2 Acer_campbellii ll 2.17
3 Acer_campbellii a_mass 70.3
4 Acer_campbellii r_mass 5.35
5 Acer_campbellii n_mass 1.48
6 Acer_campbellii p_mass 0.09
7 Acer_campbellii wd 0.68
8 Acer_campbellii sm 0.87
9 Actinodaphne_forrestii lma 49.0
10 Actinodaphne_forrestii ll 2.59
# ℹ 606 more rows
ggplot(trait_long, aes(x = value)) +
geom_histogram(position = "identity") +
facet_wrap(~ trait, scale = "free") +
theme_bw()
Probably we can do log-transformation for all the traits except for WD.
<- trait |>
trait2 mutate(
across(c(lma, ll, a_mass, r_mass, n_mass, p_mass, sm),
~ log(.),
.names = "log_{.col}")) |>
::select(sp, log_lma, log_ll, log_a_mass, log_r_mass, log_n_mass, log_p_mass, wd, log_sm)
dplyr
|>
trait2 mutate(across(where(is.numeric), ~ round(., 2))) |>
::datatable() DT
|>
trait2 pivot_longer(log_lma:log_sm, names_to = "trait") |>
ggplot(aes(x = value)) +
geom_histogram(position = "identity") +
facet_wrap(~ trait, scale = "free") +
theme_bw()
4 First-order metrics (without phylogeny or traits)
4.1 Species richness
> 0 samp
Actinodaphne_forrestii Beilschmiedia_robusta Illicium_macranthum
Site1 FALSE FALSE TRUE
Site2 FALSE TRUE TRUE
Site3 TRUE FALSE TRUE
Site4 TRUE FALSE TRUE
Site5 FALSE FALSE TRUE
Lindera_thomsonii Machilus_yunnanensis Manglietia_insignis
Site1 FALSE FALSE FALSE
Site2 FALSE FALSE TRUE
Site3 TRUE TRUE FALSE
Site4 TRUE FALSE TRUE
Site5 TRUE FALSE FALSE
Michelia_floribunda Neolitsea_chuii
Site1 FALSE FALSE
Site2 TRUE FALSE
Site3 FALSE FALSE
Site4 FALSE FALSE
Site5 FALSE TRUE
apply(samp > 0, 1, sum)
Site1 Site2 Site3 Site4 Site5
1 4 4 4 3
4.2 Shannon
\(H' = - \sum_i^n p_i\mathrm{log}p_i\), where \(p_i\) is the relative abundance for species i.
<- function(abund) {
shannon <- abund / sum(abund)
p0 <- p0[p0 > 0]
p -sum(p * log(p))
}
apply(samp, 1, shannon)
Site1 Site2 Site3 Site4 Site5
0.000000 1.351784 1.351784 1.329661 1.098612
You don’t have to reinvent the wheel.
::diversity(samp, index = "shannon") vegan
Site1 Site2 Site3 Site4 Site5
0.000000 1.351784 1.351784 1.329661 1.098612
4.3 Nonmetric Multidimensional Scaling (NMDS)
<- metaMDS(samp) res_mds
Run 0 stress 0
Run 1 stress 0
... Procrustes: rmse 0.06405868 max resid 0.09198237
Run 2 stress 0
... Procrustes: rmse 0.07474537 max resid 0.114016
Run 3 stress 9.92479e-05
... Procrustes: rmse 0.1288527 max resid 0.1986298
Run 4 stress 0.1302441
Run 5 stress 0.259715
Run 6 stress 0.09680971
Run 7 stress 0.09681003
Run 8 stress 0.1302441
Run 9 stress 0.1302441
Run 10 stress 0.09680977
Run 11 stress 0
... Procrustes: rmse 0.01646437 max resid 0.02492915
Run 12 stress 0.1302441
Run 13 stress 0
... Procrustes: rmse 0.08455785 max resid 0.1026221
Run 14 stress 5.686353e-05
... Procrustes: rmse 0.1288297 max resid 0.1985397
Run 15 stress 0.1302441
Run 16 stress 0
... Procrustes: rmse 0.1199962 max resid 0.1742123
Run 17 stress 0
... Procrustes: rmse 0.1006567 max resid 0.1626854
Run 18 stress 0.1302441
Run 19 stress 9.263808e-05
... Procrustes: rmse 0.128874 max resid 0.1986345
Run 20 stress 0.2297529
*** Best solution was not repeated -- monoMDS stopping criteria:
9: stress < smin
4: stress ratio > sratmax
7: scale factor of the gradient < sfgrmin
plot(res_mds)
We can use the function ordiplot
and orditorp
to add text to the plot in place of points to make some more sense.
ordiplot(res_mds, type = "n")
orditorp(res_mds, display = "species", col = "red", air = 0.01)
orditorp(res_mds, display = "sites", cex = 1.25, air = 0.01)
5 Phylogenetic metrics
5.1 Branch length based metric
5.1.1 PD
<- pd(samp, phylo)
res_pd res_pd
PD SR
Site1 1.000000 1
Site2 3.022727 4
Site3 2.909091 4
Site4 3.136364 4
Site5 2.454545 3
You can always see the help.
?pd
5.2 Distance based metric
cophenetic()
creates distance matrices based on phylogenetic trees. Let’s see the first 5 species.
cophenetic(phylo)[1:5, 1:5]
Acer_campbellii Melia_toosendan Skimmia_arborescens
Acer_campbellii 0.0000000 0.18181818 0.18181818
Melia_toosendan 0.1818182 0.00000000 0.09090909
Skimmia_arborescens 0.1818182 0.09090909 0.00000000
Rhus_sylvestris 0.3636364 0.36363636 0.36363636
Sterculia_nobilis 0.5454545 0.54545455 0.54545455
Rhus_sylvestris Sterculia_nobilis
Acer_campbellii 0.3636364 0.5454545
Melia_toosendan 0.3636364 0.5454545
Skimmia_arborescens 0.3636364 0.5454545
Rhus_sylvestris 0.0000000 0.5454545
Sterculia_nobilis 0.5454545 0.0000000
5.2.1 MPD
\(MPD = \frac{1}{n} \Sigma^n_i \Sigma^n_j \delta_{i,j} \; i \neq j\), where \(\delta_{i, j}\) is the pairwise distance between species i and j
<- mpd(samp, cophenetic(phylo))
res_mpd res_mpd
[1] NA 1.568182 1.454545 1.606061 1.636364
The above vector shows MPD for each site.
5.2.2 MNTD
\(MNTD = \frac{1}{n} \Sigma^n_i min \delta_{i,j} \; i \neq j\), where \(min \delta_{i, j}\) is the minimum distance between species i and all other species in the community.
<- mntd(samp, cophenetic(phylo))
res_mntd res_mntd
[1] NA 1.181818 1.181818 1.295455 1.272727
6 Functional metrics
6.1 Community weighted means (CWM)
\[ \mathrm{CWM}_i = \frac{\sum_{j=1}^n a_{ij} \times t_{j}}{\sum_{j=1}^n a_{ij}} \]
or
\[ \mathrm{CWM}_i = \frac{\vec{a_i} \cdot \vec{t}}{\sum_{j=1}^n a_{ij}}, \]
where \(a_{ij}\) is the abundance of species j in community i, and \(t_{j}\) is a trait value of species j.
<- trait2 |>
tmp filter(sp %in% colnames(samp))
tmp
# A tibble: 8 × 9
sp log_lma log_ll log_a_mass log_r_mass log_n_mass log_p_mass wd
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Actinodaphn… 3.89 0.952 5.70 4.26 1.32 -1.83 0.69
2 Beilschmied… 3.06 1.02 5.44 3.57 2.05 -0.478 0.46
3 Illicium_ma… 4.79 -0.329 4.50 2.01 -0.0943 -3.22 0.78
4 Lindera_tho… 3.87 -0.693 6.16 3.78 1.57 -1.43 0.59
5 Machilus_yu… 3.85 1.30 5.86 2.92 0.747 -0.416 0.48
6 Manglietia_… 4.90 1.39 4.61 3.22 -0.416 -2.81 0.83
7 Michelia_fl… 4.47 1.13 4.68 2.64 0.944 -2.04 0.43
8 Neolitsea_c… 3.80 -0.0202 4.75 3.53 0.888 -0.844 0.56
# ℹ 1 more variable: log_sm <dbl>
<- apply(samp, 1, sum)) (ab
Site1 Site2 Site3 Site4 Site5
1 7 7 6 3
# %*% denotes inner product
<- samp %*% as.matrix(tmp[, -1])) (cws
log_lma log_ll log_a_mass log_r_mass log_n_mass log_p_mass wd
Site1 4.789407 -0.3285041 4.501475 2.006871 -0.09431068 -3.218876 0.78
Site2 29.657500 6.7453287 33.955456 20.844958 5.06301056 -13.882211 4.22
Site3 27.999275 2.7834436 39.948369 23.936371 7.18513497 -10.569302 4.30
Site4 25.213325 1.5748117 32.833779 21.304242 5.27624363 -12.551682 4.17
Site5 12.456989 -1.0418540 15.414409 9.324552 2.36219650 -5.489962 1.93
log_sm
Site1 0.7080358
Site2 -2.3807632
Site3 4.5217803
Site4 3.8234293
Site5 2.5542834
<- cws / ab) (cwm
log_lma log_ll log_a_mass log_r_mass log_n_mass log_p_mass
Site1 4.789407 -0.3285041 4.501475 2.006871 -0.09431068 -3.218876
Site2 4.236786 0.9636184 4.850779 2.977851 0.72328722 -1.983173
Site3 3.999896 0.3976348 5.706910 3.419482 1.02644785 -1.509900
Site4 4.202221 0.2624686 5.472297 3.550707 0.87937394 -2.091947
Site5 4.152330 -0.3472847 5.138136 3.108184 0.78739883 -1.829987
wd log_sm
Site1 0.7800000 0.7080358
Site2 0.6028571 -0.3401090
Site3 0.6142857 0.6459686
Site4 0.6950000 0.6372382
Site5 0.6433333 0.8514278
The species \(\times\) site matrix and the species \(\times\) trait matrix became the trait \(\times\) site matrix.
6.2 Distance based metrics
6.2.1 Prepare a trait distance matrix
We have a data.fame
or tibble
object of traits. First, we need to prepare a trait matrix, then a distance matrix based on trait values.
<- as.matrix(trait2[, -1])
trait_mat0 rownames(trait_mat0) <- trait2$sp
Let’s see a subset of the trait matrix
1:5, 1:5] trait_mat0[
log_lma log_ll log_a_mass log_r_mass log_n_mass
Acer_campbellii 4.973556 0.7747272 4.252630 1.677097 0.3920421
Actinodaphne_forrestii 3.891412 0.9516579 5.701881 4.256180 1.3244190
Alnus_nepalensis 5.088707 3.1780538 3.018960 2.175887 0.2776317
Anneslea_fragrans 4.144721 0.9042182 5.183972 3.556205 1.2059708
Beilschmiedia_robusta 3.062456 1.0224509 5.442418 3.566147 2.0502702
Then, we will make a trait distance matrix based on the Euclidean distance. There are other distance measures, for example Gower’s Distance, but we focus on the Euclidean distance today.
Before calculating distance, we need to make sure the unit change in distances is the same for different traits. We will scale trait values so that they have mean = 0 and SD = 1. (e.g., \((X_i - \mu) / \sigma\))
# (trait_mat0 - mean(trait_mat0)) / sd(trait_mat0)
<- scale(trait_mat0)
trait_mat
par(mfrow = c(2, 2))
hist(trait_mat0[, "log_lma"])
hist(trait_mat[, "log_lma"])
hist(trait_mat0[, "wd"])
hist(trait_mat[, "wd"])
par(mfrow = c(1, 1))
Now we can make a trait distance matrix.
<- as.matrix(dist(trait_mat)) trait_dm
Let’s see the first 5 species.
1:5, 1:5] trait_dm[
Acer_campbellii Actinodaphne_forrestii Alnus_nepalensis
Acer_campbellii 0.000000 4.043670 3.191609
Actinodaphne_forrestii 4.043670 0.000000 5.272279
Alnus_nepalensis 3.191609 5.272279 0.000000
Anneslea_fragrans 3.430633 2.431268 3.992029
Beilschmiedia_robusta 5.258920 3.064649 6.412475
Anneslea_fragrans Beilschmiedia_robusta
Acer_campbellii 3.430633 5.258920
Actinodaphne_forrestii 2.431268 3.064649
Alnus_nepalensis 3.992029 6.412475
Anneslea_fragrans 0.000000 4.029059
Beilschmiedia_robusta 4.029059 0.000000
6.2.2 MPD (Mean Pairwise Distance)
mpd(samp, trait_dm)
[1] NA 4.269375 3.797902 3.766172 3.876694
ses.mpd(samp, trait_dm)
ntaxa mpd.obs mpd.rand.mean mpd.rand.sd mpd.obs.rank mpd.obs.z
Site1 1 NA NaN NA NA NA
Site2 4 4.269375 3.794488 0.7392780 773 0.64236659
Site3 4 3.797902 3.788524 0.7297932 548 0.01285074
Site4 4 3.766172 3.794574 0.7478919 534 -0.03797657
Site5 3 3.876694 3.740586 0.8541511 600 0.15934881
mpd.obs.p runs
Site1 NA 999
Site2 0.773 999
Site3 0.548 999
Site4 0.534 999
Site5 0.600 999
Effect size (ES)
\[ ES = \frac{\bar{x_1} - \bar{x_2}}{\sigma} \sim Normal(\bar{x_1} - \bar{x_2}, 1) \]
where \(\bar{x_1}\) is the mean value of \(x_1\), \(\bar{x_2}\) is the mean value of \(x_2\), and \(\sigma\) is the standard deviation of the pooled data.
In null model analyses, we can translate this into a standardized effect size (SES):
\[ SES = \frac{obs - \bar{rand}}{\sigma_{rand}} \]
where obs is the observed metric, \(\bar{rand}\) is the mean value of the metric in null communities, and \(\sigma_{rand}\) is the standard deviation of the metric in the null communities.
6.2.3 MNTD (Mean Nearest Taxon Distance)
mntd(samp, trait_dm)
[1] NA 2.832669 3.265097 2.583548 3.255192
ses.mntd(samp, trait_dm)
ntaxa mntd.obs mntd.rand.mean mntd.rand.sd mntd.obs.rank mntd.obs.z
Site1 1 NA NaN NA NA NA
Site2 4 2.832669 2.846540 0.5349929 500 -0.02592702
Site3 4 3.265097 2.843900 0.5498952 784 0.76595887
Site4 4 2.583548 2.875859 0.5559363 312 -0.52579887
Site5 3 3.255192 3.167127 0.7405363 577 0.11892086
mntd.obs.p runs
Site1 NA 999
Site2 0.500 999
Site3 0.784 999
Site4 0.312 999
Site5 0.577 999
6.3 Branch length based metric
6.3.1 FD
We will make a functional dendrogram using clustering methods. We use UPGMA in this example.
<- hclust(dist(trait_mat), method = "average")
t_clust
plot(t_clust)
6.3.2 More functional diversity metrics
<- dbFD(trait_mat[colnames(samp), ], samp) res_fd
FEVe: Could not be calculated for communities with <3 functionally singular species.
FDis: Equals 0 in communities with only one functionally singular species.
FRic: To respect s > t, FRic could not be calculated for communities with <3 functionally singular species.
FRic: Dimensionality reduction was required. The last 5 PCoA axes (out of 7 in total) were removed.
FRic: Quality of the reduced-space representation = 0.7629008
FDiv: Could not be calculated for communities with <3 functionally singular species.
res_fd
$nbsp
Site1 Site2 Site3 Site4 Site5
1 4 4 4 3
$sing.sp
Site1 Site2 Site3 Site4 Site5
1 4 4 4 3
$FRic
Site1 Site2 Site3 Site4 Site5
NA 5.785538 7.361330 9.369238 5.730526
$qual.FRic
[1] 0.7629008
$FEve
Site1 Site2 Site3 Site4 Site5
NA 0.9620749 0.7373963 0.8608372 0.9003555
$FDiv
Site1 Site2 Site3 Site4 Site5
NA 0.6775453 0.8282914 0.8719263 0.8388685
$FDis
Site1 Site2 Site3 Site4 Site5
0.000000 2.567056 2.322816 2.569872 2.423423
$RaoQ
Site1 Site2 Site3 Site4 Site5
0.000000 7.238587 5.863652 6.911377 6.047385
$CWM
log_lma log_ll log_a_mass log_r_mass log_n_mass log_p_mass
Site1 0.2791931 -1.6111593 -0.07500213 -0.6575852 -0.8276000 -1.13886265
Site2 -0.5637180 -0.5590237 0.29925577 0.5039931 0.1533945 0.20947284
Site3 -0.9250445 -1.0198867 1.21654596 1.0323130 0.5171416 0.72588358
Site4 -0.6164397 -1.1299484 0.96517252 1.1892973 0.3406750 0.09078414
Site5 -0.6925386 -1.6264517 0.60714064 0.6599096 0.2303187 0.37662100
wd log_sm
Site1 1.39116817 0.2736764
Site2 -0.08552707 -0.7968442
Site3 0.00974359 0.2102843
Site4 0.68259263 0.2013675
Site5 0.25188985 0.4201296