Bayesian estimation for ecology
đ§đ»âđ» Masatoshi Katabuchi @ XTBG, CAS
November 10, 2024 XTBG AFEC
We Learn
Why Bayesian estimation is useful
Why multilevel models are important
We Do Not Learn
Assuming everyone knows the concept of likelihood
\(f(x \mid \mu, \sigma) = \frac{1}{\sqrt{2\pi\sigma^2}}\mathrm{exp} \bigl[{-\frac{1}{2}\bigl\{\frac{x-\mu}{\sigma}}\bigr\}^2\bigr]\)
A likelihood function is a probability function that shows how likely it is to observe the data for a specific set of parameters.
e.g., when your model is \(x \sim \mathcal{N}(\mu = 0, \sigma^2 = 1)\) (normal distribution with mean 0 and variance 1), what is the probability density at x = 1.96?
When your data is x = {-1.5, 0, 1.5} and your model is \(x \sim \mathcal{N}(0, 1)\), what is the probability of observing x?
Because we usually donât know the true parameters (\(\mu\) and \(\sigma\)), we need to find the parameters that maximize the likelihood function (Maximum Likelihood Estimation).
\(p\): survival rates, \(1-p\): mortality rate
\(L = {}_5C_2 p^2(1-p)^3\) (Binomial distribution)
\(\mathrm{ln}\;L = \mathrm{ln}\;{}_5C_2 + 2\mathrm{ln}\;p + 3\mathrm{ln}(1-p)\)
\(\frac{d\mathrm{ln}\;L}{dt} = \frac{2}{p} - \frac{3}{1-p} = 0\)
\(p = \frac{2}{5}\)
Multilevel Model
Conditional Probability and Bayesâs Theorem
Prior
Multilevel Model Revisited
ââŠrare species suffered more from the presence of conspecific neighbors than common species did, suggesting that conspecific density dependence shapes species abundances in diverse communities.â
- Comita et al. 2010 -
[1] Comita, L. S., Muller-Landau, H. C., Aguilar, S. & Hubbell, S. P. Asymmetric density dependence shapes species abundances in a tropical tree community. Science 329, 330â2 (2010).
Inspired by [1] Comita, L. S., Muller-Landau, H. C., Aguilar, S. & Hubbell, S. P. Asymmetric density dependence shapes species abundances in a tropical tree community. Science 329, 330â2 (2010).
There is negative density dependence (NDD) of seedling survival rate, and the strength of NDD varies among species. The strength of NDD depends on species abundance.
Inspired by [1] MartĂnez Cano et al. Tropical Tree Height and Crown Allometries for the Barro Colorado Nature Monument, Panama: A Comparison of Alternative Hierarchical Models Incorporating Interspecific Variation in Relation to Life History Traits. Biogeosciences 16, 847â62 (2019).
There is a power-law relationship (\(y = ax^b\)) between tree diameter (DBH) and crown area, and the power-law exponent varies among species. Those relationships depend on wood density.
Model crown area as a function of DBH (individual-level).
Model the coefficient b as a function of wood density (group-level).
We need to extract the slope coefficients for each species from fit_ind
and make new_data
.
Inspired by [1] Comita, L. S., Muller-Landau, H. C., Aguilar, S. & Hubbell, S. P. Asymmetric density dependence shapes species abundances in a tropical tree community. Science 329, 330â2 (2010).
data{
int<lower=1> N; // number of samples
int<lower=1> J; // number of sp
int<lower=1> K; // number of tree-level preditors (i.e, CONS, HETS,...)
int<lower=1> L; // number of sp-level predictors (i.e., interecept, SLA,...)
matrix[N, K] x; // tree-level predictor
matrix[L, J] u; // sp-level predictor
array[N] int<lower=0, upper=1> suv; // 1 or 0
array[N] int<lower=1, upper=J> sp; // integer
}
parameters{
matrix[K, L] gamma;
matrix[K, J] z;
cholesky_factor_corr[K] L_Omega;
vector<lower=0, upper=pi() / 2>[K] tau_unif;
}
transformed parameters{
matrix[K, J] beta;
vector<lower=0>[K] tau;
for (k in 1:K) tau[k] = 2.5 * tan(tau_unif[k]);
beta = gamma * u + diag_pre_multiply(tau, L_Omega) * z;
}
model {
vector[N] mu;
to_vector(z) ~ std_normal();
L_Omega ~ lkj_corr_cholesky(2);
to_vector(gamma) ~ normal(0, 2.5);
for (n in 1:N) {
mu[n] = x[n, ] * beta[, sp[n]];
}
suv ~ bernoulli_logit(mu);
}
generated quantities {
vector[N] log_lik;
corr_matrix[K] Omega;
Omega = multiply_lower_tri_self_transpose(L_Omega);
for (n in 1:N) {
log_lik[n] = bernoulli_logit_lpmf(suv[n] | x[n, ] * beta[, sp[n]] +
phi[plot[n]] + xi[census[n]] + psi[tag[n]]);
}
}
\(s_{i,j} \sim \mathcal{B}(p_{i, j})\): likelihood
\(\mathrm{logit}(p_{i,j}) = \boldsymbol{x_{i}} \cdot \boldsymbol{\beta_{j}}\): individual-level model
\(\boldsymbol{\beta_j} = \boldsymbol{\gamma_k} \cdot \boldsymbol{u_j} + \mathrm{diag}(\boldsymbol{\sigma})\cdot \boldsymbol{L} \cdot \boldsymbol{z}\): species-level model
\(L \sim \mathrm{LkjCholesky}(\eta)\): prior (?)
\(z_j \sim \mathcal{N}(0, 1)\): prior (?)
\(\tau \sim \mathrm{Cauchy}(0, 2.5)\): prior (?)
\(\gamma_k \sim \mathcal{N}(0, 2.5)\): prior (?)
\[ P(A \mid B) = \frac{P(B \mid A) \times P(A)}{P(B)} \]
Conditional probability
Bayesâs Theorem
Forward / inverse problems
Bayes revisit
Probility of A:
\[ P(A) = \frac{A}{U} \]
e.g., probability of rolling a dice and getting an odd number is 3/6 = 1/2
Probability of A occurring given B has already occurred:
\[ P(A \mid B) = \frac{P(A \cap B)}{P(B)} \]
e.g.,
\[ P(B \mid A) = \frac{P(A \cap B)}{P(A)} \]
\[ P(A \mid B) = \frac{P(A \cap B)}{P(B)} \]
\[ P(A \mid B) \times P(B) = P(B \mid A) \times P(A) \]
\[ P(A \mid B) = \frac{P(B \mid A) \times P(A)}{P(B)} \]
X: 3 red balls, 5 white balls
Y: 1 red balls, 3 white balls
Randomly choose a bag X or Y
\(P(A)\): Choosing X
\(P(B)\): Drawing a red ball
\(P(B \mid A)\): Getting a red ball from X
\(P(A \cap B)\): Choosing X and drawing a red ball
\(P(A \mid B)\): Picked X, because you got a red ball(?)
\(P(A) = 1/2\)
\(P(B \mid A) = 3/8\)
\(P(B)\) = 1/2 \(\times\) 3/8 + 1/2 \(\times\) 1/4 = 5 /16
\(P(A \cap B)\) = 1/2 \(\times\) 3/8 = 3/16
\(P(A \mid B)\) = \(P(A \cap B) / P(B)\) = (3/16) / (5/16) = 3/5
\[ P(\mathrm{Parameter} \mid \mathrm{Data}) = \frac{P(\mathrm{Data} \mid \mathrm{Parameter}) \times P(\mathrm{Parameter})}{P(\mathrm{Data})} \]
\(P(\mathrm{Parameter} \mid \mathrm{Data})\)
\(P(\mathrm{Data} \mid \mathrm{Parameter})\)
\(P(\mathrm{Parameter})\)
\(P(\mathrm{Data})\)
\(L_A = {}_3C_2 p^2 (1-p)^1\)
\(L_B = {}_{100}C_{60} p^{60} (1-p)^{40}\)
\(\mathrm{Prior} \propto p^{50} (1-p)^{50}\)
\(Post_A \propto p^2 (1-p)^1 \times p^{50} (1-p)^{50}\)
\(Post_A \propto p^{52} (1-p)^{51}\)
\(Post_A' = 52p^{51}(1-p)^{51}-\) \(51p^{52}(1-p)^{50}\)
p = 52/103 = 0.5048
\(Post_B \propto p^{60} (1-p)^{40} \times p^{50} (1-p)^{50}\)
\(Post_B \propto p^{110} (1-p)^{90}\)
p = 110/200 = 0.55
https://www.mlb.com/stats/batting-average
[1] Rubin, D. B. Estimation in parallel randomized experiments. Journal of Educational Statistics 6, 377â401 (1981).
[2] Gelman, A. et al. Bayesian Data Analysis, Third Edition. (Chapman & Hall/CRC, 2013).
Newly developed example
What is the survival rate?
What is the survival rate?
sp | n | suv | p_like |
---|---|---|---|
A | 41 | 9 | 0.22 |
B | 45 | 18 | 0.40 |
C | 32 | 6 | 0.19 |
D | 18 | 5 | 0.28 |
E | 33 | 8 | 0.24 |
F | 26 | 8 | 0.31 |
G | 46 | 11 | 0.24 |
H | 16 | 8 | 0.50 |
p_like
What is the survival rate?
sp | n | suv | p_like | p_true |
---|---|---|---|---|
A | 41 | 9 | 0.22 | 0.34 |
B | 45 | 18 | 0.40 | 0.30 |
C | 32 | 6 | 0.19 | 0.34 |
D | 18 | 5 | 0.28 | 0.36 |
E | 33 | 8 | 0.24 | 0.25 |
F | 26 | 8 | 0.31 | 0.23 |
G | 46 | 11 | 0.24 | 0.25 |
H | 16 | 8 | 0.50 | 0.34 |
p_true
ranges [0.23, 0.36]
p_like
ranges [0.19, 0.5]
The estimate shows the larger variation
Because of the small sample size (common in ecological studies)
We know p_true
because we generated this data using p_true
as a known parameter. In practical scenarios, we donât know p_true
.
\(S_i \sim \mathcal{B}(N_i, p_i)\)
\(z_i \sim \mathcal{N}(\mu, \sigma)\) where \(z_i = \mathrm{logit}(p_i) = \mathrm{log}\frac{p_i}{1 - p_i}\)
The overall survival rate is 0.5 in this example. We have some sense of a scale for \(\sigma\).
data {
int<lower=1> I; // number of species
array[I] int<lower=0> suv; // number of survivors
array[I] int<lower=0> N; // number of individuals
array[I] int<lower=1, upper=I> ii; // integer
}
parameters {
real mu;
real<lower=0> sigma;
vector[I] z_tilde;
}
transformed parameters {
vector[I] z;
z = mu + sigma * z_tilde;
}
model {
mu ~ normal(0, 5);
z_tilde ~ std_normal();
sigma ~ std_normal();
for (i in 1:I)
suv[i] ~ binomial_logit(N[i], z[ii[i]]);
}
generated quantities {
vector[I] log_lik;
vector[I] p = inv_logit(z);
for (i in 1:I)
log_lik[i] = binomial_logit_lpmf(suv[i] | N[i], z[ii[i]]);
}
\(S_i \sim \mathcal{B}_{logit}(N_i, z_i)\): likelihood
\(z_i \sim \mathcal{N}(\mu, \sigma)\) : prior
\(\sigma \sim \mathcal{N}(0, 1)\): prior
\(\mu \sim \mathcal{N}(0, 5)\): prior
\(z_i = \mu + \sigma \cdot \tilde{z_i}\)
\(\tilde{z_i} \sim \mathcal{N}(0, 1)\): prior
sp | n | suv | p_like | p_true | p_bayes |
---|---|---|---|---|---|
A | 41 | 9 | 0.22 | 0.34 | 0.26 |
B | 45 | 18 | 0.40 | 0.30 | 0.33 |
C | 32 | 6 | 0.19 | 0.34 | 0.26 |
D | 18 | 5 | 0.28 | 0.36 | 0.28 |
E | 33 | 8 | 0.24 | 0.25 | 0.27 |
F | 26 | 8 | 0.31 | 0.23 | 0.29 |
G | 46 | 11 | 0.24 | 0.25 | 0.27 |
H | 16 | 8 | 0.50 | 0.34 | 0.32 |
Closed symbols (p_bayes
) align more closely with the 1:1 line, indicating more accurate estimates.
This model compensates for limited data by using prior knowledge that species responses are somehow similar and compensates for limited data.
Bayesian estimation (stan
)
data {
int<lower=1> I; // number of species
array[I] int<lower=0> suv; // number of survivors
array[I] int<lower=0> N; // number of individuals
array[I] int<lower=1, upper=I> ii; // integer
}
parameters {
real mu;
real<lower=0> sigma;
vector[I] z_tilde;
}
transformed parameters {
vector[I] z;
z = mu + sigma * z_tilde;
}
model {
mu ~ normal(0, 5);
z_tilde ~ std_normal();
sigma ~ std_normal();
for (i in 1:I)
suv[i] ~ binomial_logit(N[i], z[ii[i]]);
}
generated quantities {
vector[I] log_lik;
vector[I] p = inv_logit(z);
for (i in 1:I)
log_lik[i] = binomial_logit_lpmf(suv[i] | N[i], z[ii[i]]);
}
Bayesian estimation (brms
)
brms
and rstanarm
, offer Bayesian estimation using syntax similar to lme4
.lme4
)\(L(\mu, \sigma) = \prod_i \int_{-\infty}^{\infty} \mathcal{B}(S_i \mid N_i, p_i) \times \mathcal{N}(\mathrm{logit}(p_i) \mid \mu, \sigma) dp_i\)
Analytically find \(\mu\) and \(\sigma\) to maximize \(L\)
An analytical solution is often not available (this example is easy though)
\(P(\mu, \sigma \mid S_i, N_i) \propto \prod_i \mathcal{B}(S_i \mid N_i, p_i) \times \prod_i \mathcal{N}(\mathrm{logit}(p_i) \mid \mu, \sigma) \times\) \(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\mathcal{N}(\mu \mid 0, 5) \times \mathcal{N}(\sigma \mid 0, 2.5)\)
Numerically find \(\mu\) and \(\sigma\) to maximize \(P\) (aka MCMC)
MCMC works even if an analytical solution is not available
Bayesâs theorem supports the use of MCMC
We can use a priori information about parameters in our model
Models are flexible
MCMC works even if models are complicated
Multilevel models have a good balance between pooled estimates and separate estimates, which is useful for practical sample sizes
Multilevel models handle nested or hierarchical data structures, a common and important scenario in ecological research (e.g., trees within species, community within sites, etc.).
Gelman, A. et al. Bayesian Data Analysis, Third Edition. (Chapman & Hall/CRC, 2013)
AIcia Solid Project (in Japanese and Math)