Ensemble Quadratic MCMC algorithm detailed example for fitting a Weibull distribution

Load the ‘svMisc’ package to use the progress bar in the Ensemble MCMC algorithm and load the ‘coda’ package to return a ‘mcmc.list’ object from the Ensemble MCMC algorithm that can be used with various MCMC diagnostic functions in the ‘coda’ package.

library(svMisc)
library(coda)

Assume the true parameters in the Weibull distribution are shape \(a = 20\) and scale \(\sigma = 900\), \(\textrm{WEI}(a = 20, \sigma = 900)\). Then we take 50 random samples from this Weibull distribution, \(Y \sim \textrm{WEI}(a = 20, \sigma = 900)\). These random samples, \(y_i\) for \(i = 1, ..., 50\), are the data to be fitted.


a_shape = 20
sigma_scale = 900

num_ran_samples = 50

data_weibull = matrix(NA, nrow = 1, ncol = num_ran_samples)

data_weibull = rweibull(num_ran_samples, shape = a_shape, scale = sigma_scale)

plot(c(1:num_ran_samples), data_weibull, xlab="random samples", ylab="y")

Now, we want to fit a Weibull distribution with shape parameter \(a\) and scale parameter \(\sigma\) to this \(y_i\) data using the Ensemble Quadratic MCMC algorithm and the prior knowledge that the shape parameter \(a \sim \textrm{U}(0.01, 100)\) and scale parameter \(\sigma \sim \textrm{U}(1, 10000)\).

The log prior function and the log likelihood function for \(a\) and \(\sigma\) are coded below.


logp <- function(param)
{
  a_shape_use = param[1]
  sigma_scale_use = param[2]

  logp_val = dunif(a_shape_use, min = 1e-2, max = 1e2, log = TRUE) + dunif(sigma_scale_use, min = 1, max = 1e4, log = TRUE)

  return(logp_val)
}

logl <- function(param)
{
  a_shape_use = param[1]
  sigma_scale_use = param[2]

  logl_val = sum(dweibull(data_weibull, shape = a_shape_use, scale = sigma_scale_use, log = TRUE))

  return(logl_val)
}

logfuns = list(logp = logp, logl = logl)

It is recommended to use at least twice as many chains as the number of parameters to be estimated. Since there are two parameters to be estimated in this example, four MCMC chains will be used in the Ensemble Quadratic MCMC algorithm.

Next, initial guesses are proposed for each of the four MCMC chains and the following while loops ensure that the initial guesses for each of the four MCMC chains produces a log unnormalized posterior density value that is not NA or infinite.

num_par = 2

num_chains = 2*num_par

theta0 = matrix(0, nrow = num_par, ncol = num_chains)

temp_val = 0
j = 0

while(j < num_chains)
{
  initial = c(runif(1, 1e-2, 1e2), runif(1, 1, 1e4))
  temp_val = logl(initial) + logp(initial)

  while(is.na(temp_val) || is.infinite(temp_val))
  {
    initial = c(runif(1, 1e-2, 1e2), runif(1, 1, 1e4))
    temp_val = logl(initial) + logp(initial)
  }
  
  j = j + 1

  message(paste('j:', j))

  theta0[1,j] = initial[1]
  theta0[2,j] = initial[2]

}
#> j: 1
#> j: 2
#> j: 3
#> j: 4

The Ensemble Quadratic MCMC algorithm will be used with 1e5 iterations for each chain in the Ensemble MCMC algorithm and each 10th iteration is returned.

The total number of samples returned by the algorithm is given by floor(num_chain_iterations/thin_val_par)*num_chains

num_chain_iterations = 1e5
thin_val_par = 10

floor(num_chain_iterations/thin_val_par)*num_chains
#> [1] 40000


Weibull_Quad_result = ensemblealg(theta0, logfuns, T_iter = num_chain_iterations, Thin_val = thin_val_par, ShowProgress = TRUE, ReturnCODA = TRUE)
#>           0-----------------------------------------------------------------1e+05
#> Progress: |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

After the Ensemble Quadratic MCMC algorithm is run, the matrix of parameter samples, matrix of log likelihood samples, matrix of log prior samples, and ‘coda’ package ‘mcmc.list’ object are returned from the algorithm.

my_samples = Weibull_Quad_result$theta_sample
my_log_prior = Weibull_Quad_result$log_prior_sample
my_log_like = Weibull_Quad_result$log_like_sample
my_mcmc_list_coda = Weibull_Quad_result$mcmc_list_coda

Use the ‘plot’ function on the ‘mcmc.list’ object to view the convergence of the chains over the iterations.

varnames(my_mcmc_list_coda) = c("a_shape", "sigma_scale")

plot(my_mcmc_list_coda)

Now, we remove (also called burn-in) 25% of each MCMC chain to ensure that we use the samples that converged to the estimated unnormalized posterior distribution.

my_samples_burn_in = my_samples[,,-c(1:floor((num_chain_iterations/thin_val_par)*0.25))]

my_log_prior_burn_in = my_log_prior[,-c(1:floor((num_chain_iterations/thin_val_par)*0.25))]

my_log_like_burn_in = my_log_like[,-c(1:floor((num_chain_iterations/thin_val_par)*0.25))]

Then we calculate the potential scale reduction factors, which provide a formal test of the convergence of the MCMC sampling to the estimated posterior distribution. The potential scale reduction factor are close to 1 for all the estimated parameters, this indicates that the MCMC sampling converged to the estimated posterior distribution for each parameter.

diagnostic_result = psrfdiagnostic(my_samples_burn_in, 0.05)

diagnostic_result$p_s_r_f_vec
#>          [,1]      [,2]
#> [1,] 1.000554 0.9996759

We collapse the sample matrices into sample vectors to pool all of the MCMC chain samples together.

log_un_post_vec = as.vector(my_log_prior_burn_in + my_log_like_burn_in)

k1 = as.vector(my_samples_burn_in[1,,])

k2 = as.vector(my_samples_burn_in[2,,])

Next, we calculate the posterior median, 95% credible intervals, and maximum posterior for the parameters.

#a_shape posterior median and 95% credible interval
median(k1)
#> [1] 23.89217
hpdparameter(k1, 0.05)
#> [1] 18.95000 28.98099
#(The 95% CI is narrower than the prior distribution and it contains the true 
#value of a_shape, 20)

#sigma_scale posterior median and 95% credible interval
median(k2)
#> [1] 906.0427
hpdparameter(k2, 0.05)
#> [1] 894.5921 917.4541
#(The 95% CI is narrower than the prior distribution and it contains the true 
#value of sigma_scale, 900)

#a_shape and sigma_scale maximum posterior
k1[which.max(log_un_post_vec)]
#> [1] 24.06038

k2[which.max(log_un_post_vec)]
#> [1] 905.7494

The following plots display the silhouette of the unnormalized posterior surface from the chosen parameter’s perspective.

plot(k1, exp(log_un_post_vec), xlab="a_shape", ylab="unnormalized posterior density")

plot(k2, exp(log_un_post_vec), xlab="sigma_scale", ylab="unnormalized posterior density")

Now, ten thousand MCMC samples are used to generate the posterior predictive distribution. (Also, the Bayesian p-value is calculated in this for loop, which assesses the goodness of fit of the Weibull distribution to data).

num_samples = 10000

w_predict = matrix(NA, nrow = num_samples, ncol = length(data_weibull))

discrepancy_w = matrix(NA, nrow = num_samples, ncol = 1)
discrepancy_w_pred = matrix(NA, nrow = num_samples, ncol = 1)
ind_pred_exceeds_w = matrix(NA, nrow = num_samples, ncol = 1)

for (i in (length(log_un_post_vec) - num_samples + 1):length(log_un_post_vec))
{
  l = i - (length(log_un_post_vec) - num_samples)

    w_predict[l,] = rweibull(length(data_weibull), shape = k1[i], scale = k2[i])

    my_w_mean = k2[i]*gamma(1 + (1/k1[i]))

    discrepancy_w[l,1] = sum((data_weibull - rep(my_w_mean, length(data_weibull)))^2)

    discrepancy_w_pred[l,1] = sum((w_predict[l,] - rep(my_w_mean, length(data_weibull)))^2)

    if(discrepancy_w_pred[l,1] > discrepancy_w[l,1])
    {
      ind_pred_exceeds_w[l,1] = 1
    }
    else
    {
      ind_pred_exceeds_w[l,1] = 0
    }

  svMisc::progress(l,num_samples,progress.bar = TRUE,init = (l == 1))
}
#>           0-----------------------------------------------------------------10000
#> Progress: |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

The Bayesian p-value uses the sum of squares discrepancy function and the Bayesian p-value is 0.6613, which indicates that there is no evidence against the null hypothesis that the model predictions fit the data.

This Bayesian p-value measures how extreme is the realized value of the sum of squares discrepancy among all possible values that could have been realized under this model with the same parameters that generates the current data.

Note: If this Bayesian p-value was close to 0 or 1, then there would be evidence against the null hypothesis that the model predictions fit the data. Extreme tail-area probabilities, less than 0.01 or more than 0.99, indicate a major failure of the model predictions to fit the data.

Bayes_p_value = sum(ind_pred_exceeds_w)/length(ind_pred_exceeds_w)

Bayes_p_value
#> [1] 0.6613

w_predict_lower = matrix(NA, nrow = 1, ncol = length(data_weibull))
w_predict_upper = matrix(NA, nrow = 1, ncol = length(data_weibull))

for(j in 1:length(data_weibull))
{
  w_predict_lower[j] = quantile(w_predict[,j], probs = 0.025)
  w_predict_upper[j] = quantile(w_predict[,j], probs = 0.975)
}

Lastly, the data, posterior predictive mean, and 95% prediction interval is plotted., and this plot illustrates the good fit of the Weibull distribution to the generated data.

plot(c(1:num_ran_samples), data_weibull, main="Weibull dist. (a = 20, sigma = 900) Example",
     xlab="random samples", ylab="y", type = "p", lty = 0, ylim = c(600, 1000))
lines(c(1:num_ran_samples), colMeans(w_predict), type = "l", lty = 1, col = "blue")
lines(c(1:num_ran_samples), w_predict_lower, type = "l", lty = 2, col = "blue")
lines(c(1:num_ran_samples), w_predict_upper, type = "l", lty = 2, col = "blue")

legend("bottomleft", legend = c("Data", "Posterior predictive mean", "95% prediction interval"), lty=c(0,1,2), col = c("black", "blue", "blue"), pch=c(1,NA,NA))