--- title: "Ensemble Quadratic MCMC algorithm detailed example for fitting a Disease Markov Model" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Ensemble Quadratic MCMC algorithm detailed example for fitting a Disease Markov Model} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = (4/6)*7 ) ``` ```{r setup} library(QAEnsemble) ``` Load the 'svMisc' package to use the progress bar in the Ensemble MCMC algorithm, load the 'coda' package to return a 'mcmc.list' object from the Ensemble MCMC algorithm that can be used with various MCMC diagnostic functions in the 'coda' package, load the 'expm' package to use the matrix exponential function 'expm' that is used to convert a transition-rate matrix into a transition-probability matrix for Markov modeling, and load the 'diagram' package to visualize the disease Markov model. ```{r eval=TRUE} library(svMisc) library(coda) library(expm) library(diagram) ``` Set the seed for this example. ```{r eval=TRUE} set.seed(10) ``` The disease Markov Model structure in this example is motivated by the Markov model presented in the following health economics book: Edlin R, McCabe C, Hulme C, Hall P, Wright J (2015) Cost effectiveness modelling for health technology assessment. Switzerland: Springer International Publishing. The disease Markov Model being used in this example is a Markov cohort state-transition model. Markov cohort state-transition models are generally used as deterministic models in the health economics literature, where the Markov cohort state-transition model is a recursive matrix formula that calculates the trajectory of population counts in each state using a transition probability matrix (which can be obtained by a transition rate matrix) and an initial distribution of the population counts across the states. In this example the disease Markov cohort state-transition model is used in the common way as a deterministic system, and then negative binomial error distributions are placed on the new daily sickness and new daily death counts from the disease Markov cohort state-transition model solutions in order to simulate data to be fit. Using error distributions placed on the new daily sickness and new daily death counts from the deterministic Markov cohort state-transition model solutions allows for a great deal of flexibility for describing many different types of datasets. For more information about Markov cohort state-transition models, please see the following journal article: Iskandar R (2018) A theoretical foundation for state-transition cohort models in health decision analysis. PLoS One. 13(12):e0205543. https://doi.org/10.1371/journal.pone.0205543 In this example, we assume there are five people infected with a virus in an isolated town of 4000 people and that the true deterministic Markov cohort state-transition model, $\boldsymbol{N}(t) = \begin{bmatrix} W(t) & S(t) & D(t) \end{bmatrix}$, is a model with three states $W(t)$: Number of people who are in the Well state at time t\ $S(t)$: Number of people who are in the Sick state at time t\ $D(t)$: Number of people who are in the Dead state at time t and with the transition-rate matrix, $Q$, $\begin{bmatrix} q_{WW} & q_{WS} & q_{WD}\\ q_{SW} & q_{SS} & q_{SD}\\ 0 & 0 & q_{DD} \end{bmatrix}$. The Markov cohort state-transition model is visualized in the plot below. ```{r eval=TRUE} names_diagram = c("W", "S", "D") #Note: byrow = true is not used in the matrix functions in order to have the #direction of the arrows correctin the diagram diagram_matrix = matrix(c("q_WW", "q_WS", "q_WD", "q_SW", "q_SS", "q_SD", 0, 0, "q_DD"), nrow = 3, ncol = 3) curve_matrix = matrix(c(0.1, 0.1, 0, 0.1, 0.1, 0, 0, 0, 0.1), nrow = 3, ncol = 3) diagram::plotmat(diagram_matrix, pos = c(2, 1), curve = curve_matrix, name = names_diagram, lwd = 2, box.lwd = 2, cex.txt = 0.8, self.lwd = 2, self.cex = 0.5, self.shiftx = c(-0.1, 0.1, -0.1), box.type = "circle", box.prop = 0.5, arr.type = "triangle") ``` The time step, $t$, used in this model is one day and the number of day cycles ran in the model is twenty. ```{r eval=TRUE} num_cycles = 20 ``` It is assumed that the true parameters in the transition-rate matrix, $Q$, for the model, $\boldsymbol{N}(t)$, are the following: ```{r eval=TRUE} #well people that become sick per day q_WS = 5e-2 #sick people that become well per day q_SW = 1e-3 #well people that die per day q_WD = 1e-7 #sick people that die per day q_SD = 5e-4 q_WW = -1*(q_WS + q_WD) q_SS = -1*(q_SW + q_SD) q_DD = 0 ``` The formula used to find the transition-probability matrix at $s$ time steps forward, $P(s)$, using the transition-rate matrix, $Q$, is the following: $P(s) = EXP(Q*s)$, where $EXP(Q*s)$ is the matrix exponential of the matrix $Q*s$. Since the model, $\boldsymbol{N}(t)$, is being moved forward one day at a time in this example, we use $s = 1$ in the previous formula and the following formula to calculate the Markov cohort state-transition model at each one day time step: $\boldsymbol{N}(t + 1) = \boldsymbol{N}(t)*P(1) = \boldsymbol{N}(t)*EXP(Q*1) = \boldsymbol{N}(t)*EXP(Q)$. This method for converting a transition-rate matrix into a transition-probability matrix is explained further in the following journal article: Jones E, Epstein D, and García-Mochón L (2017) A procedure for deriving formulas to convert transition rates to probabilities for multistate Markov models. MDM 37(7):779-789. https://doi.org/10.1177/0272989X17696997 Given the four inputs $q_{WS}$, $q_{SW}$, $q_{WD}$, and $q_{SD}$, the function 'MarkovModel_sol' returns the Markov model solutions of $W(t)$, $S(t)$, and $D(t)$. ```{r eval=TRUE} MarkovModel_sol <- function(param) { q_WS_use = param[1] q_SW_use = param[2] q_WD_use = param[3] q_SD_use = param[4] q_WW_use = -1*(q_WS_use + q_WD_use) q_SS_use = -1*(q_SW_use + q_SD_use) q_DD_use = 0 Q_transition_matrix_use = matrix(c(q_WW_use, q_WS_use, q_WD_use, q_SW_use, q_SS_use, q_SD_use, 0, 0, q_DD_use), nrow = 3, ncol = 3, byrow = TRUE) num_cycles = 20 num_states = 3 name_states = c("W", "S", "D") #People over time begin day 0 cohort_over_time = matrix(NA, nrow = (num_cycles+1), ncol = num_states, dimnames = list(0:num_cycles, name_states)) cohort_0day = c(W = 3995, S = 5, D = 0) cohort_over_time[1,] = cohort_0day rownames(cohort_over_time) = paste("Day", c(0:(num_cycles)), sep = "_") for(t in 1:num_cycles) { cohort_over_time[t+1, ] = cohort_over_time[t, ] %*% expm::expm(Q_transition_matrix_use*(1)) } return(cohort_over_time) } ``` Given the assumed disease model structure and parameter values, the true disease spread in the isolated town per day is plotted below. ```{r eval=TRUE} time_use = c(1:num_cycles) cohort_over_time = MarkovModel_sol(c(q_WS, q_SW, q_WD, q_SD)) plot(time_use, diff(cohort_over_time[,2]), xlab="Days (t)", ylab="New daily sick people", type = "l", col = "red") plot(time_use, diff(cohort_over_time[,3]), xlab="Days (t)", ylab="New daily deaths", type = "l", col = "red") ``` Now, we random sample from the Negative Binomial distribution to generate data for the new daily sick people with mean given by the difference of the Markov model function solution cohort_over_time[,2] and variance given by diff(cohort_over_time[,2])/p1, where p1 = 0.1. Also, we random sample from the Negative Binomial distribution to generate data for the new daily deaths with mean given by the difference of the Markov model function solution cohort_over_time[,3] and variance given by diff(cohort_over_time[,3])/p2, where p2 = 0.3. ```{r eval=TRUE} p1 = 0.1 p2 = 0.3 data_nbin_new_S = matrix(NA, nrow = 1, ncol = num_cycles) data_nbin_new_D = matrix(NA, nrow = 1, ncol = num_cycles) new_S_true = diff(cohort_over_time[,2]) new_D_true = diff(cohort_over_time[,3]) for(t in 1:num_cycles) { data_nbin_new_S[t] = rnbinom(1, size = (new_S_true[t]*p1)/(1 - p1), prob = p1) if(new_D_true[t] == 0) { data_nbin_new_D[t] = 0 } else { data_nbin_new_D[t] = rnbinom(1, size = (new_D_true[t]*p2)/(1 - p2), prob = p2) } } plot(time_use, data_nbin_new_S, xlab="Days (t)", ylab="New daily sick people") plot(time_use, data_nbin_new_D, xlab="Days (t)", ylab="New daily deaths") ``` Assume it is known in the town that the death rate of well people in the isolated town is $q_{WD}$ = 1e-7 per day. Now, we want to fit the disease Markov model and estimate $q_{WS}$, $q_{SW}$, $q_{SD}$, $p_1$ parameter from the negative binomial distribution for new daily sick people, and $p_2$ parameter from the negative binomial distribution for new daily deaths to the new daily data using the Ensemble Quadratic MCMC algorithm and the prior knowledge that $q_{WS} \sim \textrm{U}(10^{-4}, 10)$, $q_{SW} \sim \textrm{U}(10^{-7}, 10^{-1})$, $q_{SD} \sim \textrm{U}(10^{-6}, 10^{-2})$, $p_1 \sim \textrm{U}(10^{-4}, 1)$, and $p_2 \sim \textrm{U}(10^{-4}, 1)$. The log prior function and the log likelihood function for $q_{WS}$, $q_{SW}$, $q_{SD}$, $p_1$, and $p_2$ are coded below. ```{r eval=TRUE} logp <- function(param) { q_WS_use = param[1] q_SW_use = param[2] q_SD_use = param[3] p1_use = param[4] p2_use = param[5] logp_val = dunif(q_WS_use, min = 1e-4, max = 10, log = TRUE) + dunif(q_SW_use, min = 1e-7, max = 1e-1, log = TRUE) + dunif(q_SD_use, min = 1e-6, max = 1e-2, log = TRUE) + dunif(p1_use, min = 1e-4, max = 1, log = TRUE) + dunif(p2_use, min = 1e-4, max = 1, log = TRUE) return(logp_val) } logl <- function(param) { cohort_over_time_use = MarkovModel_sol(c(param[1], param[2], q_WD, param[3])) new_S_sol = diff(cohort_over_time_use[,2]) new_D_sol = diff(cohort_over_time_use[,3]) issue_flag = 0 for(t in 1:num_cycles) { if(new_S_sol[t] < 0 || new_D_sol[t] < 0) { issue_flag = 1 } if(issue_flag == 1) { break } if(new_S_sol[t] == 0) { new_S_sol[t] = 1e-4; } if(new_D_sol[t] == 0) { new_D_sol[t] = 1e-4; } } if(issue_flag == 0) { logl_val = sum(dnbinom(data_nbin_new_S, size = ((new_S_sol)*param[4])/(1 - param[4]), prob = param[4], log = TRUE)) + sum(dnbinom(data_nbin_new_D, size = ((new_D_sol)*param[5])/(1 - param[5]), prob = param[5], log = TRUE)) } else { logl_val = -Inf } return(logl_val) } logfuns = list(logp = logp, logl = logl) ``` It is recommended to use at least twice as many chains as the number of parameters to be estimated. Since there are five parameters to be estimated in this example, ten MCMC chains will be used in the Ensemble Quadratic MCMC algorithm. Next, initial guesses are proposed for each of the ten MCMC chains and the following while loops ensure that the initial guesses for each of the ten MCMC chains produces a log unnormalized posterior density value that is not NA or infinite. ```{r eval=TRUE} num_par = 5 num_chains = 2*num_par theta0 = matrix(0, nrow = num_par, ncol = num_chains) temp_val = 0 j = 0 while(j < num_chains) { initial = c(runif(1, 1e-4, 10), runif(1, 1e-7, 1e-1), runif(1, 1e-6, 1e-2), runif(1, 1e-4, 1), runif(1, 1e-4, 1)) temp_val = logl(initial) + logp(initial) while(is.na(temp_val) || is.infinite(temp_val)) { initial = c(runif(1, 1e-4, 10), runif(1, 1e-7, 1e-1), runif(1, 1e-6, 1e-2), runif(1, 1e-4, 1), runif(1, 1e-4, 1)) temp_val = logl(initial) + logp(initial) } j = j + 1 message(paste('j:', j)) theta0[1,j] = initial[1] theta0[2,j] = initial[2] theta0[3,j] = initial[3] theta0[4,j] = initial[4] theta0[5,j] = initial[5] } ``` The Ensemble Quadratic MCMC algorithm will be used with 1.5e4 iterations for each chain in the Ensemble MCMC algorithm and each 10th iteration is returned. The total number of samples returned by the algorithm is given by floor(num_chain_iterations/thin_val_par)*num_chains ```{r eval=TRUE} num_chain_iterations = 1.5e4 thin_val_par = 10 floor(num_chain_iterations/thin_val_par)*num_chains ``` Next, the Ensemble Quadratic MCMC algorithm is run. ```{r eval=TRUE} Dis_Model_Quad_result = ensemblealg(theta0, logfuns, T_iter = num_chain_iterations, Thin_val = thin_val_par, ShowProgress = TRUE, ReturnCODA = TRUE) ``` After the Ensemble Quadratic MCMC algorithm is run, the matrix of parameter samples, matrix of log likelihood samples, matrix of log prior samples, and 'coda' package 'mcmc.list' object are returned from the algorithm. ```{r eval=TRUE} my_samples = Dis_Model_Quad_result$theta_sample my_log_prior = Dis_Model_Quad_result$log_prior_sample my_log_like = Dis_Model_Quad_result$log_like_sample my_mcmc_list_coda = Dis_Model_Quad_result$mcmc_list_coda ``` Use the 'plot' function on the 'mcmc.list' object to view the convergence of the chains over the iterations. ```{r eval=TRUE} varnames(my_mcmc_list_coda) = c("q_WS", "q_SW", "q_SD", "p1", "p2") plot(my_mcmc_list_coda) ``` Now, we remove (also called burn-in) 25% of each MCMC chain to ensure that we use the samples that converged to the estimated unnormalized posterior distribution. ```{r eval=TRUE} my_samples_burn_in = my_samples[,,-c(1:floor((num_chain_iterations/thin_val_par)*0.25))] my_log_prior_burn_in = my_log_prior[,-c(1:floor((num_chain_iterations/thin_val_par)*0.25))] my_log_like_burn_in = my_log_like[,-c(1:floor((num_chain_iterations/thin_val_par)*0.25))] ``` Then we calculate the potential scale reduction factors, which provide a formal test of the convergence of the MCMC sampling to the estimated posterior distribution. The potential scale reduction factor are close to 1 for all the estimated parameters, this indicates that the MCMC sampling converged to the estimated posterior distribution for each parameter. ```{r eval=TRUE} diagnostic_result = psrfdiagnostic(my_samples_burn_in, 0.05) diagnostic_result$p_s_r_f_vec ``` We collapse the sample matrices into sample vectors to pool all of the MCMC chain samples together. ```{r eval=TRUE} log_un_post_vec = as.vector(my_log_prior_burn_in + my_log_like_burn_in) k1 = as.vector(my_samples_burn_in[1,,]) k2 = as.vector(my_samples_burn_in[2,,]) k3 = as.vector(my_samples_burn_in[3,,]) k4 = as.vector(my_samples_burn_in[4,,]) k5 = as.vector(my_samples_burn_in[5,,]) ``` Next, we calculate the posterior median, 95% credible intervals, and maximum posterior for the parameters. ```{r eval=TRUE} #q_WS posterior median and 95% credible interval median(k1) hpdparameter(k1, 0.05) #(The 95% CI is narrower than the prior distribution and it contains the true #value of q_WS, 5e-2) #q_SW posterior median and 95% credible interval median(k2) hpdparameter(k2, 0.05) #(The 95% CI is narrower than the prior distribution and it contains the true #value of q_SW, 1e-3) #q_SD posterior median and 95% credible interval median(k3) hpdparameter(k3, 0.05) #(The 95% CI is narrower than the prior distribution and it contains the true #value of q_SD, 5e-4) #p1 posterior median and 95% credible interval median(k4) hpdparameter(k4, 0.05) #(The 95% CI is narrower than the prior distribution and it contains the true #value of p1, 0.1) #p2 posterior median and 95% credible interval median(k5) hpdparameter(k5, 0.05) #(The 95% CI is narrower than the prior distribution and it contains the true #value of p2, 0.3) #q_WS, q_SW, q_SD, p1, and p2 maximum posterior k1[which.max(log_un_post_vec)] k2[which.max(log_un_post_vec)] k3[which.max(log_un_post_vec)] k4[which.max(log_un_post_vec)] k5[which.max(log_un_post_vec)] ``` The following plots display the silhouette of the unnormalized posterior surface from the chosen parameter's perspective. ```{r eval=TRUE} plot(k1, exp(log_un_post_vec), xlab="q_WS", ylab="unnormalized posterior density") ``` ```{r eval=TRUE} plot(k2, exp(log_un_post_vec), xlab="q_SW", ylab="unnormalized posterior density") ``` ```{r eval=TRUE} plot(k3, exp(log_un_post_vec), xlab="q_SD", ylab="unnormalized posterior density") ``` ```{r eval=TRUE} plot(k4, exp(log_un_post_vec), xlab="p1", ylab="unnormalized posterior density") ``` ```{r eval=TRUE} plot(k5, exp(log_un_post_vec), xlab="p2", ylab="unnormalized posterior density") ``` Now, ten thousand MCMC samples are used to generate the posterior predictive distribution. (Also, the Bayesian p-value is calculated in this for loop, which assesses the goodness of fit of the disease Markov model with negative binomial error to the new daily sickness and new daily death data). ```{r eval=TRUE} num_samples = 10000 new_S_predict = matrix(NA, nrow = num_samples, ncol = length(time_use)) new_D_predict = matrix(NA, nrow = num_samples, ncol = length(time_use)) discrepancy_new_S = matrix(NA, nrow = num_samples, ncol = 1) discrepancy_new_S_pred = matrix(NA, nrow = num_samples, ncol = 1) ind_pred_exceeds_new_S = matrix(NA, nrow = num_samples, ncol = 1) discrepancy_new_D = matrix(NA, nrow = num_samples, ncol = 1) discrepancy_new_D_pred = matrix(NA, nrow = num_samples, ncol = 1) ind_pred_exceeds_new_D = matrix(NA, nrow = num_samples, ncol = 1) for (i in (length(log_un_post_vec) - num_samples + 1):length(log_un_post_vec)) { l = i - (length(log_un_post_vec) - num_samples) my_sol = MarkovModel_sol(c(k1[i], k2[i], 1e-7, k3[i])) my_new_S_mean = diff(my_sol[,2]) my_new_D_mean = diff(my_sol[,3]) for(j in 1:length(time_use)) { new_S_predict[l,j] = rnbinom(1, size = ((my_new_S_mean[j])*k4[i])/(1 - k4[i]), prob = k4[i]) new_D_predict[l,j] = rnbinom(1, size = ((my_new_D_mean[j])*k5[i])/(1 - k5[i]), prob = k5[i]) } discrepancy_new_S[l,1] = sum((data_nbin_new_S - my_new_S_mean)^2) discrepancy_new_S_pred[l,1] = sum((new_S_predict[l,] - my_new_S_mean)^2) discrepancy_new_D[l,1] = sum((data_nbin_new_D - my_new_D_mean)^2) discrepancy_new_D_pred[l,1] = sum((new_D_predict[l,] - my_new_D_mean)^2) if(discrepancy_new_S_pred[l,1] > discrepancy_new_S[l,1]) { ind_pred_exceeds_new_S[l,1] = 1 } else { ind_pred_exceeds_new_S[l,1] = 0 } if(discrepancy_new_D_pred[l,1] > discrepancy_new_D[l,1]) { ind_pred_exceeds_new_D[l,1] = 1 } else { ind_pred_exceeds_new_D[l,1] = 0 } svMisc::progress(l, num_samples, progress.bar = TRUE, init = (l == 1)) } ``` The Bayesian p-value uses the sum of squares discrepancy function and the Bayesian p-value is 0.452, which indicates that there is no evidence against the null hypothesis that the model predictions fit the data. This Bayesian p-value measures how extreme is the realized value of the sum of squares discrepancy among all possible values that could have been realized under this model with the same parameters that generates the current data. Note: If this Bayesian p-value was close to 0 or 1, then there would be evidence against the null hypothesis that the model predictions fit the data. Extreme tail-area probabilities, less than 0.01 or more than 0.99, indicate a major failure of the model predictions to fit the data. ```{r eval=TRUE} Bayes_p_value = sum(c(ind_pred_exceeds_new_S, ind_pred_exceeds_new_D))/length(c(ind_pred_exceeds_new_S, ind_pred_exceeds_new_D)) Bayes_p_value ``` Next, the lower and upper 95% prediction interval is calculated for $S(t)$ and $D(t)$. ```{r eval=TRUE} new_S_predict_lower = matrix(NA, nrow = 1, ncol = length(time_use)) new_S_predict_upper = matrix(NA, nrow = 1, ncol = length(time_use)) new_D_predict_lower = matrix(NA, nrow = 1, ncol = length(time_use)) new_D_predict_upper = matrix(NA, nrow = 1, ncol = length(time_use)) for(j in 1:length(time_use)) { new_S_predict_lower[j] = quantile(new_S_predict[,j], probs = 0.025) new_S_predict_upper[j] = quantile(new_S_predict[,j], probs = 0.975) new_D_predict_lower[j] = quantile(new_D_predict[,j], probs = 0.025) new_D_predict_upper[j] = quantile(new_D_predict[,j], probs = 0.975) } ``` Lastly, the data, posterior predictive mean, and 95% prediction interval is plotted, and this plot illustrates the good fit of the disease Markov model with negative binomial error to the generated data. ```{r eval=TRUE} plot(time_use, data_nbin_new_S, main="New daily sicknesses (Disease Markov Model Example)", cex.main = 1, xlab="Days (t)", ylab="New daily sick people", type = "p", lty = 0, ylim = c(0, 500)) lines(time_use, diff(cohort_over_time[,2]), type = "l", lty = 1, col = "red") lines(time_use, colMeans(new_S_predict), type = "l", lty = 1, col = "blue") lines(time_use, new_S_predict_lower, type = "l", lty = 2, col = "blue") lines(time_use, new_S_predict_upper, type = "l", lty = 2, col = "blue") legend("topleft", legend = c("Data", "True model", "Posterior predictive mean", "95% prediction interval"), lty=c(0,1,1,2), col = c("black", "red", "blue", "blue"), pch=c(1,NA,NA,NA)) ``` ```{r eval=TRUE} plot(time_use, data_nbin_new_D, main="New daily deaths (Disease Markov Model Example)", cex.main = 1, xlab="Days (t)", ylab="New daily deaths", type = "p", lty = 0, ylim = c(0, 20)) lines(time_use, diff(cohort_over_time[,3]), type = "l", lty = 1, col = "red") lines(time_use, colMeans(new_D_predict), type = "l", lty = 1, col = "blue") lines(time_use, new_D_predict_lower, type = "l", lty = 2, col = "blue") lines(time_use, new_D_predict_upper, type = "l", lty = 2, col = "blue") legend("topleft", legend = c("Data", "True model", "Posterior predictive mean", "95% prediction interval"), lty=c(0,1,1,2), col = c("black", "red", "blue", "blue"), pch=c(1,NA,NA,NA)) ```