---
title: "badp: Bayesian model averaging for dynamic panels with weakly exogenous regressors"
author: |
  Krzysztof Beck^1,2^, Piotr Cukier^2^, Marcin Dubel^2^,
  Mariusz Szczepańczyk^2^, Mateusz Wyszyński^3^
output:
  html_vignette:
    toc: true
bibliography: references.bib
vignette: >
  %\VignetteIndexEntry{badp: Bayesian dynamic systems modelling}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  fig.width = 5,
  fig.height = 3,
  dpi = 72
)
```

^1^This article was prepared within the research project "Bayesian simultaneous equations model averaging - theoretical development and R package," funded by the Polish National Science Centre, on the basis of the decision No. DEC-2021/43/B/HS4/01745, ^2^Lazarski University, ^3^University of Warsaw

**Abstract:** This manuscript introduces the `badp` package,
which enables Bayesian model averaging
for dynamic panels with weakly exogenous regressors
— a methodology developed by @Moral+2016.
The package allows researchers to simultaneously address
model uncertainty and reverse causality.
The manuscript includes a hands-on tutorial
accessible to users unfamiliar with this approach.
In addition to calculating the model space and providing key BMA statistics,
the package offers flexible options for specifying model priors,
or including a dilution prior that accounts for multicollinearity.
It also provides graphical tools
for visualizing prior and posterior model probabilities,
as well as functions for plotting histograms
and kernel densities of the estimated coefficients.
Furthermore, the package enables researchers to compute jointness measures
and perform Bayesian model selection
to examine the most probable models based on posterior model probabilities.

## Introduction

Since the seminal works of @Leamer+1978, @Leamer+1981, @Leamer+1983, @Leamer+1985, there has been an increased focus on reporting the fragility of regression estimates.
@Leamer+1983 proposed Extreme Bounds Analysis (EBA) as a remedy for addressing the sensitivity of empirical research findings [see @Hlavac+2016 for an R package for EBA].
In economics, growth regressions [@Barro+1991] became a central focus of research on economic growth during the 1990s.
However, the credibility of these results was challenged when @Levine+1992 applied EBA to cross-country economic growth data.
The authors found that investment as a share of GDP was the only variable robust to changes in model specification.
In response, EBA was criticized for being too stringent [see @Granger+1990 for a less restrictive variant of EBA], leading to the proposal of alternative approaches [@Sala+1997].

Bayesian model averaging (BMA) emerged as a preferred method during a period when studies of economic growth advanced alongside methodological innovations [@Fernandez+2001; @Fernandez+2001b; @Sala+2004; @Eicher+2007; @Ley+2012; @Moser+2014; @Arin+2019].
As a result, BMA became a widely used technique for assessing the robustness of regressors in economics [for a detailed review of BMA applications in economics, see @Moral+2015; @Steel+2020] (e.g., @Liu+2009; @Ductor+2016; @Figini+2017; @Beck+2022; @DAndrea+2022; @Horvath+2024), as well as in other fields (e.g., @Sloughter+2013; @Baran+2015; @Aller+2021; @Guliyev+2024; @Payne+2024; @Beck+2025).
Moreover, the growing interest in BMA was fueled by the availability of R packages such as `BMA` [@Raftery+2005], `BAS` [@clyde2011bayesian], and `BMS` [@Feldkircher+2015], along with the `gretl` BMA package developed by @Blazejowski+2015.

The primary issue with the Bayesian model averaging in the aforementioned studies was its reliance on the assumption of exogenous regressors.
In many contexts, particularly in economics, this premise is unsuitable.
Instead, the assumption of endogenous variables within a simultaneous equations framework is more fitting.
Consequently, a new line of research relaxed the assumption of exogenous regressors [@Lenkoski+2014; @Leon+2015; @Mirestean+2016; @Moral+2016; @Chen+2018].
However, these methods have not found their way into mainstream research.
The code to implement them is only available upon request from the authors and is provided exclusively for `MATLAB` and `GAUSS`.

The `badp` package was developed to address this gap.
It offers tools for performing Bayesian model averaging on dynamic panels with weakly exogenous regressors.
As a result, it enables researchers to address both model uncertainty and reverse causality.
The core of the code is based on the methodological approach developed by @Moral+2012, @Moral+2013, @Moral+2016.
While the main aspects of the method are described in the manuscript, interested readers should refer to the original articles for further details.
In addition to the key features developed by @Moral+2016, the `badp` package offers a wide range of additional functionalities.
The package enables users to employ flexible model prior options, along with a dilution prior, which helps account for multicollinearity.
The `badp` package provides users with graphical options for plotting prior and posterior model probabilities across model sizes and the model space.
Additionally, users can utilize Bayesian model selection to thoroughly examine the best models based on posterior model probability.
The package calculates jointness measures developed by @Doppelhofer+2009, @Ley+2007, @Hofmarcher+2018.
Finally, it offers users the option to plot histograms or kernel densities of the estimated coefficients for the examined regressors.

The remainder of the manuscript is structured as follows.
Section [Model setup and Bayesian model averaging](#model-setup-and-bayesian-model-averaging) describes the dynamic panel setup considered by @Moral+2013 and outlines the Bayesian model averaging approach used in the package.
Data preparation is detailed in Section [Data preparation](#data-preparation), while Section [Estimation of the model space](#estimation-of-the-model-space) addresses the estimation of the model space.
Section [Performing Bayesian model averaging](#performing-bayesian-model-averaging) provides an overview of the `badp` functions related to performing Bayesian model averaging, calculating jointness measures, and presenting the estimation results.
The details of the model prior choices are described in Section [Changes in model priors](#changes-in-model-priors).
Finally, Section [Concluding remarks](#concluding-remarks) offers some concluding remarks.


## Model setup and Bayesian model averaging

This section outlines the model setup, describes the approach to Bayesian model averaging implemented in the package, summarizes the main BMA statistics, and discusses model priors and jointness measures.

### Model setup

@Moral+2016 considers the following model specification:

$$y_{it}=\alpha y_{it-1}+\beta x_{it}+\eta_{i}+\zeta_{t}+v_{it} \tag{1}$$

where $y_{it}$ is the dependent variable, $i$ $(=1,...,N)$ indexes entity (ex. country), $t$ $(=1,...,T)$ indexes time, $x_{it}$ is a matrix of growth determinants, $\beta$ is a parameter vector, $\eta_{i}$ is an entity specific fixed effect, $\zeta_{t}$ is a period-specific shock and $v_{it}$ is a shock to the dependent variable.
To address the issue of reverse causality the model is built on the assumption of weak exogeneity, that can be formalized as

$$\mathbb{E}(v_{i,t}|y^{t-1}_{t},x^{t}_{i},\eta_{i})=0 \tag{2}$$

where $y^{t-1}_{t}=(y_{i,0},...,y_{i,t-1})'$ and $x^t_{i}=(x_{i,0},...,x_{i,t})'$.
Accordingly, weak exogeneity implies that the current values of the regressors, lagged dependent variable, and fixed effects are uncorrelated with the current shocks, while they are all allowed to be correlated with each other at the same time.
On the assumption of weakly exogenous regressors, @Moral+2013 augmented equation (1) with additional reduced-form equations capturing the unrestricted feedback process:

$$x_{it}=\gamma_{t0}y_{i0}+...+\gamma_{tt-1}y_{it-1}+\Lambda_{t1}x_{i1}+...+\Lambda_{tt-1}x_{it-1}+c_{t}\eta_{i}+\vartheta_{it} \tag{3}$$

where $t=2,\dots ,T;$ $c_{t}$ is the $k\times 1$ vector of parameters.
For $h<t$, $\gamma_{th}$ is a $k\times 1$ vector $(y_{th}^{1},\dots,y_{th}^{k})'$  $h=0,\dots,T-1$; $\Lambda_{th}$ is a $k\times k$ matrix of parameters, and $\vartheta_{it}$ is a $k\times 1$ of prediction errors. The initial observations are defined with

$$y_{i0}=c_{0}\eta_{i}+\upsilon_{it} \tag{4}$$

$$x_{i1}=\gamma_{10}y_{i0}+c_{1}\eta_{i}+\vartheta_{it} \tag{5}$$

where $c_{0}$ is a scalar, $c_{1}$ and $\gamma_{10}$ are $k\times 1$ vectors and $\eta_{i}$ are the individual effects. The mean vector and the covariance matrix of the joint distribution of the initial observations and the individual effects are unrestricted. [The method outperforms the Arellano–Bond estimator, see @Moral+2019.]

For the model setup given in equations (1) and (3)-(5), @Moral+2013 derived the log-likelihood function:

$$\log f(data|\theta) \propto \frac{N}{2}\log\det(B^{-1}D\Sigma D'B'^{-1})-\frac{1}{2}\sum_{i=1}^{N}\{R'_{i}(B^{-1}D\Sigma D'B'^{-1})^{-1}R_{i}\} \tag{6}$$

where $\theta$ denotes parameters to be estimated, $R_{i}=(y_{io},x'_{i1},y_{i1},\dots,x'_{iT},y_{iT})'$ are vectors of observed variables, and $\Sigma=diag[\sigma^{2}_{\eta},\sigma^{2}_{\upsilon_{0}},\Sigma_{\vartheta_{1}},\sigma^{2}_{\upsilon_{1}},...,\Sigma_{\vartheta_{T}},\sigma^{2}_{\upsilon_{T}}]$ is the block-diagonal variance-covariance matrix.
Matrix B is given by:

$$B=\begin{bmatrix}
 1&0&0&0&0&\dotsc& 0&0&0\\
 -\gamma_{10}&I_{k}&0&0&0&\dotsc&0&0&0\\
 -\alpha&-\beta'&1&0&0&\dotsc&0&0&0\\
 -\gamma_{20}&-\Lambda_{21}&-\gamma_{21}&I_{k}&0&\dotsc&0&0&0\\
0&0&-\alpha&-\beta'&1&\dotsc&\vdots&\vdots&\vdots\\
\vdots&\vdots&\vdots&\vdots&\vdots&\ddots&0&0&0\\
 -\gamma_{T0}&-\Lambda_{T1}&-\gamma_{T1}&-\Lambda_{T2}&-\gamma_{T2}&\dotsc&-\gamma_{TT-1}&I_{k}&0\\
 0&0&0&0&0&\dotsc&-\alpha&-\beta'&1\\
\end{bmatrix} \tag{7}$$

and matrix D is given by:

$$D=\begin{bmatrix}
(c_{0}&c'_{1}&1&c'_{2}&1&\dotsc&c'_{T}&1)' & I_{T(k+1) + 1}
 \end{bmatrix}. \tag{8}$$

The model setup in equations (1) and (3)-(5) requires that in addition to the parameters of interest $\alpha$ and $\beta$, the parameters $\gamma_{ij}$ and $\Lambda_{km}$ need to be estimated.
To make the optimization of likelihood computationally feasible, @Moral+2013 developed Simultaneous Equations Model (SEM) setup where the parameters of non-central interest are incorporated in the variance-covariance matrix.
In the SEM setup, the model is defined by $1 + (T - 1)k + T$ equations:

$$\begin{cases}
    \eta_i = \phi_i y_{i0} + x'_{i1} \phi_1 + \epsilon_i &  \\
    x_{it} = \pi_{t0} y_{i0} + \pi_{t1} x_{i1} + \pi^w_t x_{i1} + \xi_{it}, & t = 2, ..., T \\
    y_{it} = \alpha y_{it-1} + x'_{it} \beta + \phi_0 y_{i0} + x'_{i1} \phi_1 + w'_i \delta + \epsilon_i + v_{it}, & t = 1, ..., T
\end{cases} \tag{9}$$

This setup can be rewritten in a matrix form:

$$B R_i = C z_i + U_i, \tag{10}$$

where:

$$z_i = [y_{i0}, x'_{i1}, w'_i]' \tag{11}$$

is the vector of strictly exogenous variables,

$$R_i = [y_{i1}, y_{i2}, ..., y_{iT}, x'_{i2}, x'_{i3}, ..., x'_{iT}]', \tag{12}$$

$$U_i = [\epsilon_i + v_{i1}, \epsilon_i + v_{i2}, ..., \epsilon_i + v_{iT}, \xi'_{i2}, \xi'_{i3}, ..., \xi'_{iT}]' \tag{13}$$

and matrices $B$ and $C$ contain coefficients $\alpha$, $\beta$, $\phi_0$, $\phi_1$.
Since these matrices are not connected to the error, we simply note that they are defined in such a way that the equation (10) is equivalent to the SEM setup.
The main difference of the SEM setup is that equations for $x_{it}$ now depend only on $y_{i0}$ and $x_{i1}$ and not on $y_{is}$ and $x_{is}$ for other periods $s$.

Following @Moral+2013, we can then define the likelihood function as:

$$L \propto - \frac{N}{2} \log \det \Omega(\theta) - \frac{1}{2} tr \{ \Omega(\theta)^{-1} (R - Z \Pi(\theta))' (R - Z \Pi(\theta)) \} \tag{14}$$

where $R$ and $Z$ are matrices containing vectors $R_i$ and $z_i$ respectively and:

$$\Pi(\theta) = B^{-1} C \tag{15}$$

$$U^*_i(\theta) = B^{-1} U_i \tag{16}$$

$$\Omega(\theta) = Var(U^*_i) = B^{-1} \cdot Var(U_i) \cdot  B'^{-1} = B^{-1} \Sigma B'^{-1} \tag{17}$$

It is possible to find analytical solution for MLE for some of the parameters.
Then the formula for the likelihood function can be simplified to:

$$L(\theta) \propto - \frac{N}{2} \log \det \Sigma_{11} - \frac{1}{2} tr \{ \Sigma_{11}^{-1} U_1' U_1 \} - \frac{N}{2} \log \det (\frac{H}{N}) \tag{18}$$

where $U_1$ is a matrix of errors connected only to dependent variables, $\Sigma_{11}$ is a part of the $\Sigma$ matrix:

$$\Sigma=var(U_{i})=var\left(\frac{U_{i1}}{U_{i2}}\right)=\begin{bmatrix}
            \Sigma_{11}&\Sigma_{12}\\
            \Sigma_{21}&\Sigma_{22}\\
            \end{bmatrix}. \tag{19}$$

and $H = (R_2 + U_1 F_{12})' Q (R_2 + U_1 F_{12})$ with $R_2$ being a matrix of regressor vectors $[x'_{i2}, x'_{i3}, ..., x'_{iT}]$ and $F_{12} = - \Sigma_{11}^{-1} \Sigma_{12}$.

As shown in @Moral+2013, the calculation of the likelihood function requires only the knowledge of $\Sigma_{11}$ and $\Sigma_{12}$. $\Sigma_{11}=\sigma^{2}_{\epsilon}\iota\iota'+diag\{\sigma^{2}_{\upsilon 1},\dots,\sigma^{2}_{\upsilon T}\}$ is a classical error-component, where $\iota$ denotes $T\times1$ vector of ones. Finally, the $T\times (T-1)k$ matrix capturing the feedback process is given by:

$$\Sigma_{12}=\begin{bmatrix}
            \phi'_{2}+\psi'_{21}&\phi'_{3}+\psi'_{31}&\dots&\phi'_{T}+\psi'_{T1}\\
            \phi'_{2}&\phi'_{3}+\psi'_{32}&\dots&\phi'_{T}+\psi'_{T2}\\
            \phi'_{2}&\phi'_{3}&\dots&\phi'_{T}+\psi'_{T3}\\
            \vdots&\vdots&\ddots&\vdots\\
            \phi'_{2}&\phi'_{3}&\dots&\phi'_{T}+\psi'_{T,T-1}\\
            \phi'_{2}&\phi'_{3}&\dots&\phi'_{T}\\
            \end{bmatrix} \tag{20}$$

where:

$$cov(\epsilon_{t},\xi_{it})=\phi_{t} \tag{21}$$

$$cov(v_{ih},\xi_{it})=\begin{cases}
      \psi_{ht} \text{ if } h<t\\
      \mathbf{0} \text{ otherwise}\\
\end{cases} \tag{22}$$

and $\phi_{t}$, $\psi_{th}$, and $\mathbf{0}$ are $k\times1$ vectors.

Within the BMA approach developed by @Moral+2016, the matrix $\Sigma_{12}$ is computed for each model using all considered regressors. This framework treats each estimated model as nested within the model that includes all covariates. Consequently, the matrix $R$ contains regressors that are excluded from the model being estimated.

However, the package also allows the user to employ a non-nested approach, i.e., an approach that utilizes the $\Sigma_{12}$ matrix and the matrix $R$ constructed only from the regressors included in the model being estimated. This approach has two advantages. First, the optimization problem is simpler and therefore less error-prone, as the number of parameters to be estimated is smaller. In addition, it substantially improves the speed of maximum likelihood estimation. Second, this approach may be more appropriate in cases where some regressors are unrelated to others, as using the complete set of regressors may lead to overspecification. The implementation of the non-nested approach is described in Section [Estimation of the model space](#estimation-of-the-model-space).

### Bayesian model averaging

Given the likelihood function in (18), henceforth denoted as $L(\text{data}|\theta_{i}, M_{i})$ for a specific model $i$, it is possible to utilize Bayesian model averaging [for an introduction to BMA see @Raftery+1995; @Raftery+1997; @Kass+1995b; @Doppelhofer+2009; @amini2011bayesian; @Beck+2017; @Fragoso+2018] (BMA).
To achieve that, we first estimate all possible variants of equation:

$$Y=f(X,\theta,v) \tag{23}$$

where $Y$ is a vector of dependent variable, $X$ is a matrix of potential determinants, $\theta$ is a parameter vector, and $v$ is a stochastic term.
All the variants include a lagged dependent variable; therefore, with $K$ regressors, there are $2^{K}$ possible models that can be estimated.
Each of these models can be assigned a posterior model probability; however, the marginal (integrated) likelihood, $L(\text{data}|M_{i})$, must first be computed.
@Moral+2012 utilizes approach developed by @Raftery+1995 and @Sala+2004 based on the Bayesian information criterion (BIC) approximation.

The Bayes factor for models $M_{i}$ and $M_{j}$, $B_{ij}=\frac{L(\text{data}|M_{i})}{L(\text{data}|M_{j})}$, can be approximated using Schwartz criterion:

$$S=\log L(\text{data} \mid\hat{\theta_{i}},M_{i})-\log L(\text{data}|\hat{\theta_{j}},M_{j})-\frac{k_{i}-k_{j}}{2}\log (N) \tag{24}$$

where $L(\text{data}|\hat{\theta}_{i}, M_{i})$ and $L(\text{data}|\hat{\theta}_{j}, M_{j})$ are the maximum likelihood values for models $i$ and $j$, respectively.
The terms $k_{i}$ and $k_{j}$ denote the number of regressors in models $i$ and $j$.
Bayesian information criterion is given by:

$$BIC=-2S=-2\log B_{ij}. \tag{25}$$

Given null model $M_{0}$

$$B_{ij}=\frac{L(y|M_{i})}{L(y|M_{j})}=\frac{\frac{L(y|M_{i})}{L(y|M_{0})}}{\frac{L(y|M_{j})}{L(y|M_{0})}}=\frac{B_{i0}}{B_{j0}}=\frac{B_{0j}}{B_{0i}} \tag{26}$$

and

$$2\log B_{ij}=2[\log B_{0j} - \log B_{0i}]=BIC_{j}-BIC_{i}. \tag{27}$$

The posterior model probability (PMP) of model $j$ given the data is

$$\mathbb{P}(M_{j}|y)=\frac{L(\text{data}|M_{j})\mathbb{P}(M_{j})}{\sum_{i=1}^{2^K}L(\text{data}|M_{i})\mathbb{P}(M_{i})} \tag{28}$$

where $\mathbb{P}(M_{j})$ denotes prior model probability.
In other words, the PMP represents the share of model $j$ in the total posterior probability mass.
Combining equations (25)-(28) we get:

$$\mathbb{P}(M_{j}|y)=\frac{L(\text{data}|M_{j})\mathbb{P}(M_{j})}{\sum_{i=1}^{2^K}L(\text{data}|M_{i})\mathbb{P}(M_{i})}
    =\frac{B_{j0}\mathbb{P}(M_{j})}{\sum_{i=1}^{2^K}B_{i0}\mathbb{P}(M_{i})} \tag{29}$$

Finally, using the result that

$$B_{j0}=\exp{(-\frac{1}{2}BIC_{j})} \tag{30}$$

we can calculate posterior model probability as

$$\mathbb{P}(M_{j}|\text{data})=\frac{\exp{(-\frac{1}{2}BIC_{j})}\mathbb{P}(M_{j})}{\sum_{i=1}^{2^K}\exp{(-\frac{1}{2}BIC_{i})} \mathbb{P}(M_{i})}. \tag{31}$$

### BMA statistics

With PMPs, we can calculate useful BMA statistics.
Let's denote by $\pi_k$ the random variable which is equal to one if the $k^{th}$ regressor should be considered as the determinant of the dependent variable.
The posterior inclusion probability (PIP) for the regressor is given by:

$$\mathbb{P}(\pi_k = 1 |\text{data}) = \sum_{j=1}^{2^K} \mathbf{1} (k^{th} \text{ regressor is in model } M_{j}) \cdot \mathbb{P}(M_{j}|\text{data}) \tag{32}$$

where the indicator function $\mathbf{1}$ is equal to one if the regressor is part of the model $M_j$ and zero otherwise.
In other words, the PIP tells us how likely it is that the given regressor has impact on the variable of interest.

Another interesting statistic is the posterior mean (PM) of a given parameter $\beta_k$.
Let's denote by $\pi_{\beta}$ the random variable which is equal to one if the given parameter is present in the model, and zero otherwise.
The posterior mean of $\beta$ is given by:

$$\mathbb{E}(\beta_k|\text{data})=\sum_{j=1}^{2^K}\widehat{\beta_k}_{j} \cdot \mathbb{P}(M_{j}, \pi_{\beta_k} = 1 |\text{data}) \tag{33}$$

where $\widehat{\beta_k}_{j}$ is the value of the coefficient $\beta_k$ in model $j$.
It tells us what is the mean (or expected) value for the parameter taking into account all considered models.
Note that if $\beta_k$ is not present in the given model $j$ we can assign any value to $\widehat{\beta_k}_{j}$,
because the probability $\mathbb{P}(M_{j}, \pi_{\beta_k} = 1 |\text{data})$ will be zero anyway.

The posterior variance of the parameter $\beta_k$ is equal to:

$$Var(\beta_k|\text{data}) = \sum_{j=1}^{2^K}Var(\beta_{k,j}|\text{data},M_{j}) \cdot \mathbb{P}(M_{j}, \pi_{\beta_k} = 1 |\text{data}) + \sum_{j=1}^{2^K}\left[\widehat{\beta}_{k,j}-\mathbb{E}(\beta_k|\text{data})\right]^{2} \cdot \mathbb{P}(M_{j}, \pi_{\beta_k} = 1 |\text{data}) \tag{34}$$

where $Var(\beta_{k,j}|\text{data},M_{j})$ denotes the conditional variance of the coefficient $\beta_k$ in model $M_{j}$
(in other words assuming that the model $M_j$ is the true model).
Posterior standard deviation (PSD) of $\beta_k$ is then defined as the square root of the variance:

$$SD(\beta_k | \text{data}) = \sqrt{ Var(\beta_k | \text{data}) } \tag{35}$$

Alternatively, one might be interested in the values of the mean and variance on the condition of inclusion of a given parameter,
i.e. assuming that it is definitely a part of the model.
Note that this is usually determined by the presence of a related regressor.
The conditional posterior mean (PMcon) for a parameter $\beta_k$ is given by:

$$\mathbb{E}(\beta_k | \pi_{\beta_k}=1,\text{data})=\frac{\mathbb{E}(\beta_k|\text{data})}{\mathbb{P}(\pi_{\beta_k} = 1|\text{data})}. \tag{36}$$

Similarly, the conditional variance is:

$$Var(\beta_k|\pi_{\beta_k}=1,\text{data})=\frac{Var(\beta_k|\text{data})+\mathbb{E}(\beta_k|\text{data})^2}{\mathbb{P}(\pi_{\beta_k}=1|\text{data})}-\mathbb{E}(\beta_k|\pi_{\beta_k}=1,\text{data})^2 \tag{37}$$

and so the conditional standard deviation (PSDcon) is:

$$SD(\beta_k | \pi_{k}=1,\text{data}) = \sqrt{ Var(\beta_k | \pi_{k}=1,\text{data}) } \tag{38}$$

The BMA statistics allow the assessment of the robustness of the examined regressors.
@Raftery+1995 classifies a variable as weak, positive, strong, and very strong when the posterior inclusion probability (PIP) is between 0.5 and 0.75, between 0.75 and 0.95, between 0.95 and 0.99, and above 0.99, respectively.
@Raftery+1995 also refers to the variable as robust when the absolute value of the ratio of posterior mean (PM) to posterior standard deviation (PSD) is above 1, indicating that the regressor improves the power of the regression.
@Masanjala+2008 propose a more stringent criterion, where they require the statistic to be higher than 1.3, while @Sala+2004 argue for 2, corresponding to $90\%$ and $95\%$, respectively.

### Model priors and jointness

To perform BMA one needs to specify prior model probability [for a thorough discussion of model priors see @Sala+2004; @Ley+2009; @George+2010; @Eicher+2011].
The package offers two main options.
The first is binomial model prior [@Sala+2004]:

$$\mathbb{P}(M_{j})=\left(\frac{EMS}{K}\right)^{k_{j}}\left(1-\frac{EMS}{K}\right)^{K-k_{j}} \tag{39}$$

where $EMS$ is the expected model size and $k_{j}$ is a number of regressors in model $j$.
If $EMS = \frac{K}{2}$, the binomial model prior simplifies to a uniform model prior with $\mathbb{P}(M_{j}) = \frac{1}{2^K}$ for every $j$, meaning that all models are assumed to have equal probabilities.
The second is binomial-beta model prior [@Ley+2009] given by:

$$\mathbb{P}(M_{j}) \propto \Gamma(1+k_{j}) \cdot \Gamma\left(\frac{K-EMS}{EMS}+K-k_{j}\right). \tag{40}$$

where $\Gamma$ is the gamma function.
In the context of the binomial-beta prior $EMS = \frac{K}{2}$ corresponds to equal probabilities on model sizes.

In order to account for potential multicollinearity between regressors one can use dilution prior introduced by @George+2010.
The dilution prior involves augmenting the model prior (binomial or binomial-beta) with a function that accounts for multicollinearity:

$$\mathbb{P}_{D}(M_{j}) \propto \mathbb{P}(M_{j})|COR_{j}|^{\omega} \tag{41}$$

where $\mathbb{P}_{D}(M_{j})$ is the diluted model prior, $|COR_{j}|$ is the determinant of the correlation matrix of regressors in model $j$,
and $\omega$ is the dilution parameter.
The lower the correlation between regressors,
the closer $|COR_{j}|$ is to one, resulting in a smaller degree of dilution.

To determine whether regressors are substitutes or complements,
various authors have developed jointness measures [to learn more about jointness measures, we recommend reading @Doppelhofer+2009; @Ley+2007; @Hofmarcher+2018 in that order].
Assuming two different covariates $a$ and $b$,
let $\mathbb{P}(a\cap b)$ be the posterior probability of the inclusion of both variables,
$\mathbb{P}(\overline{a}\cap \overline{b})$ the posterior probability of the exclusion of both variables,
$\mathbb{P}(\overline{a}\cap b)$ and $\mathbb{P}(a\cap \overline{b})$ denote the posterior probability of including each variable separately.
The first measure of jointness is simply $\mathbb{P}(a\cap b)$.
However, this measure ignores much of the information about the relationships between the regressors.
@Doppelhofer+2009 measure is defined as:

$$J_{DW}=\text{log}\left[\frac{\mathbb{P}(a\cap b) \cdot \mathbb{P}(\overline{a}\cap \overline{b})}{\mathbb{P}(\overline{a}\cap b) \cdot \mathbb{P}(a\cap \overline{b})}\right]. \tag{42}$$

If $J_{DW} < -2$, $-2 < J_{DW} < -1$, $-1 < J_{DW} < 1$, $1 < J_{DW} < 2$, and $J_{DW} > 2$, the authors classify the regressors as strong substitutes, significant substitutes, not significantly related, significant complements, and strong complements, respectively.
Jointness measure proposed by @Ley+2007 is given by:

$$J_{LS}=\frac{\mathbb{P}(a\cap b)}{\mathbb{P}(\overline{a}\cap b)+\mathbb{P}(a\cap \overline{b})}. \tag{43}$$

The measure takes values in the range $[0, \infty)$, with higher values indicating a stronger complementary relationship.
Finally, @Hofmarcher+2018 measure of jointness is:

$$J_{HCGHM}=\frac{(\mathbb{P}(a\cap b)+\rho) \cdot \mathbb{P}(\overline{a}\cap \overline{b})+\rho)-(\mathbb{P}(\overline{a}\cap b)+\rho) \cdot \mathbb{P}(a\cap \overline{b})+\rho)}{(\mathbb{P}(a\cap b)+\rho) \cdot \mathbb{P}(\overline{a}\cap \overline{b})+\rho)+(\mathbb{P}(\overline{a}\cap b)+\rho) \cdot \mathbb{P}(a\cap \overline{b})+\rho)+\rho}. \tag{44}$$

@Hofmarcher+2018 advocate the use of the @Jeffreys+1946 prior, which results in $\rho=\frac{1}{2}$.
The measure takes values from -1 to 1, where values close to -1 indicate substitutes, and those close to 1 complements.

## Data preparation

This section demonstrates how to prepare the data for estimation.
The first step involves installing the package and subsequently loading it into the R session.

```{r eval=FALSE}
install.packages("badp")
```

```{r}
library(badp)
```

Throughout the manuscript,
we use the data from @Moral+2016 on the determinants of economic growth.
The package includes the data
along with a detailed description of all variables.

```{r}
economic_growth[1:12,1:10]
```

Since it is common for researchers to store their data in an alternative format:

```{r}
original_economic_growth[1:12,1:10]
```

where there is an already existing column
with the lagged dependent variable,
we provide the `join_lagged_col` function
to transform the data into the desired format.
The user needs to specify the dependent variable column (`col`),
the lagged dependent variable column (`col_lagged`),
the column identifying the cross-sections (`entity_col`),
the column with the time index (`timestamp_col`),
and the change in the number of time units from period to period
(`timestep`).

```{r}
economic_growth <- join_lagged_col(
  df            = original_economic_growth,
  col           = gdp,
  col_lagged    = lag_gdp,
  timestamp_col = year,
  entity_col    = country,
  timestep      = 10
)
```

Once the data is in the correct format,
the user can perform further data preparation
using the `feature_standardization` function.
It allows to perform demeaning (entity/time effects)
or scaling (standardization) as needed.
Often there are columns to which the transformation should not be applied.
These can be specified with the `excluded_cols`.
It is also possible to group elements of the data frame with respect to a given
column with the `group_by_col`.
Finally, with the `scale` parameter we can decide whether we want to apply
both demeaning and scaling or just demeaning.

For example, we can first standardize all features:

```{r}
data_standardized_features <- feature_standardization(
    df            = economic_growth,
    excluded_cols = c(country, year, gdp)
  )
```

and then apply cross-sectional demeaning (fixed time effects):

```{r}
data_prepared <- feature_standardization(
    df            = data_standardized_features,
    group_by_col  = year,
    excluded_cols = country,
    scale         = FALSE
  )
```

Note that the example below is the data preparation scheme which was
used in @Moral+2016.
There is no need to apply panel demeaning (entity fixed effects)
in this framework as can be seen in equation (13). [In theory the results should be the same with entity fixed effects.
However, because we use numerical methods some discrepancies might occur.]


## Estimation of the model space

To perform a BMA analysis, we need values of the parameters
as well as various statistics for each considered model
as explained in Section [Model setup and Bayesian model averaging](#model-setup-and-bayesian-model-averaging).
We refer to the object that consolidates both the parameters and the statistics
as the **model space**.
The core function of the package, `optim_model_space`,
is used to estimate the model space using numerical optimization:

```{r eval=FALSE}
full_model_space <- optim_model_space(
  df            = data_prepared,
  dep_var_col   = gdp,
  timestamp_col = year,
  entity_col    = country,
  init_value    = 0.5
)
```

The function returns a list with five named arguments
which are explained in the subsections below.
A progress bar is displayed to easily track the ongoing computation.

Since the MLEs for the parameters are found through numerical optimization,
more advanced users can use the `control` parameter
to control the way the optimization is performed.
We refer to the function manual for more details
and `stats` package for more details.

As an alternative to the approach developed by @Moral+2016, the user may estimate the model space using the non-nested approach described at the end of the [Model setup](#model-setup) subsection. To do so, the user needs to set the parameter `nested` to `FALSE`.

```{r eval=FALSE}
model_space_nonnested <- optim_model_space(
  df            = data_prepared,
  dep_var_col   = gdp,
  timestamp_col = year,
  entity_col    = country,
  init_value    = 0.5,
  nested = FALSE
)
```

In the remainder of the manuscript, we use the results obtained under the nested approach. However, all functionalities in the package operate in the same way under the non-nested approach.

### Model space parameters

The first element of the list contains the estimated MLEs of the parameters
for each considered model.
Each column represents a single model, and rows correspond to the parameters.
For example, to display the first 10 parameters for 5 models we can call:

```{r}
full_model_space$params[1:10, 1:5]
```

`NA` value means that the corresponding parameter is not present
in the given model.

### Model space statistics

The second element of the list provides:

- the values of the likelihood function at the estimated MLE (first row),
- the Bayesian information criterion (second row),
- and the standard errors for the parameters describing linear relations,
  i.e. the beta parameters from equation (9),
  for each model (remaining rows).

```{r}
full_model_space$stats[, 1:5]
```

Again, each column represents a single considered model.

Two types of standard errors are provided,
both derived from the Hessian of the maximized log-likelihood function.
The first type consists of the regular standard errors,
calculated using the inverse of the observed information matrix:

$$I(\hat{\theta}) = -\frac{\partial^2 l(\hat{\theta})}{\partial \theta \partial \theta'} \tag{45}$$

where $\hat{\theta}$ are the estimated MLE parameters,
$I(\hat{\theta})$ is the information matrix
and $l(\hat{\theta}) = \log L(\hat{\theta})$
is the natural logarithm of the likelihood function.
The variance covariance matrix is given by:

$$Var(\hat{\theta}) = I(\hat{\theta})^{-1}, \tag{46}$$

and the standard errors by

$$\text{SE}(\hat{\theta}) = \sqrt{\text{diag}(Var(\hat{\theta}))}. \tag{47}$$

where the square root is obviously applied separately
to each coordinate of the vector with diagonal values.
The second type are the robust standard errors
or heteroscedasticity consistent standard errors.
To understand how they work,
we first have to rewrite the equation (18) in a form
which will display the contribution of each entity on the likelihood value.
First note that:

$$L(\theta) \propto - \frac{1}{2} tr \{ \Sigma_{11}^{-1} U_1' U_1 \} - \sum_{i=1}^{N} \frac{1}{2} \log \det \Sigma_{11} - \frac{1}{2} \log \det (\frac{H}{N}) \tag{48}$$

Now,
because of the cyclic property of the trace we can rewrite the first term as:

$$- \frac{1}{2} tr \{ \Sigma_{11}^{-1} U_1' U_1 \} =
    - \frac{1}{2} tr \{ U_1 \Sigma_{11}^{-1} U_1' \} =
    - \frac{1}{2} \sum_{i=1}^N u_i \Sigma_{11}^{-1} u_i' \tag{49}$$

where $u_i$ is a row vector
corresponding to the data relating to the single entity $i$.
Hence, the entire likelihood function can be rewritten
as a sum of contributions from each entity:

$$L(\theta) \propto \sum_{i=1}^{N} -\frac{1}{2} (\log \det \Sigma_{11} + \log \det (\frac{H}{N}) + u_i \Sigma_{11}^{-1} u_i') \tag{50}$$

From there we can see that the contribution of a single entity $i$ is:

$$l_i(\theta) \propto -\frac{1}{2} (\log \det \Sigma_{11} + \log \det (\frac{H}{N}) + u_i \Sigma_{11}^{-1} u_i'). \tag{51}$$

Now if we consider a multivariate function $\mathbf{l}(\theta)$
of all such contributions (with single contributions as its coordinates),
we can find its gradient at the MLE:
$G(\hat{\theta}) = \frac{\mathbf{l}(\hat{\theta})}{\partial \theta}$.
Then the robust variance is:

$$Var_{R}(\hat{\theta}) = I(\hat{\theta})^{-1} \cdot G'(\hat{\theta}) G(\hat{\theta}) \cdot I(\hat{\theta})^{-1} \tag{52}$$

and the robust standard errors are given by:

$$\text{SE}_{R}(\hat{\theta}) = \sqrt{\text{diag}(\hat{V}_{R}(\hat{\theta}))}. \tag{53}$$

where the square root is again applied to each coordinate separately.


### Parallelization

The `optim_model_space` function
is the most computationally intensive part of the package.
Therefore, the function provides an option for parallel computing.
If the user's data contains only a few regressors, the sufficient option is

```{r eval=FALSE}
model_space <- optim_model_space(
  df            = data_prepared,
  dep_var_col   = gdp,
  timestamp_col = year,
  entity_col    = country,
  init_value    = 0.5
)
```

However, for larger datasets,
it is better to take advantage of parallel computing.
Then the numerical optimization used to find MLEs
can be computed on separate threads for each model.
To do this, first load the `parallel` package and set up a cluster.

```{r eval=FALSE}
library(parallel)
# Here we try to use all available cores on the system.
# You might want to lower the number of cores depending on your needs.
cores <- detectCores()
cl <- makeCluster(cores)
setDefaultCluster(cl)
```

Then the user just needs to provide this cluster to the function:

```{r eval=FALSE}
model_space <- optim_model_space(
  df            = data_prepared,
  dep_var_col   = gdp,
  timestamp_col = year,
  entity_col    = country,
  init_value    = 0.5,
  cl            = cl
)
```


### Precomputed model spaces

Even with parallelization, `optim_model_space` call may be time-consuming.
Hence, for users who want to quickly start practicing using the `badp`,
we provide already computed model space objects included with the package:

- `full_model_space` is the model space built with the entire data
  used by @Moral+2016,
- `small_model_space` is a smaller model space built with only a
  subset of regressors.


## Performing Bayesian model averaging

### Bayesian model averaging: The `bma` function

The `bma` function enables users to perform Bayesian model averaging using
the object obtained with the `optim_model_space` function.
The `round` parameter specifies the decimal place
to which the BMA statistics should be rounded in the results.

```{r}
bma_results <- bma(full_model_space, round = 3)
```

The `bma` function returns a list containing 16 elements.
However, most of these elements are only required for other functions.
The main objects of interest are the two tables with the BMA statistics.
The results obtained with binomial model prior are first on the list.

```{r}
bma_results[[1]]
```

PIP denotes the posterior inclusion probability, PM denotes the posterior mean, PSD denotes the posterior standard deviation, and PSDR denotes the posterior standard deviation calculated using robust standard errors.
These are the four main results of BMA with respect to the assessment of individual regressors.
PMcon, PSDcon, and PSDRcon denote the posterior mean, posterior standard deviation, and posterior standard deviation based on robust standard errors, respectively, conditional on the inclusion of the variable.
Users should base their interpretation of the results on conditional BMA statistics
only when they believe that certain regressors must be included.
Finally, for a given parameter we can consider all models that include this parameter,
and check if it has a positive or negative value.
$\%(+)$ denotes the percentage of models with positive value for a given parameter across all models that include that parameter.
A value of $\%(+)$ equal to $0\%$ or $100\%$ indicates coefficient sign stability.

The PIP for all the regressors shows that none of them can be considered very strong according to the classification by @Raftery+1995.
This also applies to the population variable (`pop`), which has a PIP of 0.990 due solely to approximation.
These results are corroborated by the ratios of PM to PSD and PSDR.
In particular, for the absolute value of the PM to PSDR ratio, only the population variable exceeds 1.3, while investment (`ish`) and the democracy index (`polity`) are above 1.
This finding led @Moral+2016 to emphasize the fragility of economic growth determinants.
The only variable that can be considered robust across all metrics is the lagged GDP (`gdp_lag`).
However, the results change when using the binomial-beta model prior, which is included as the second object in the `bma` list.

```{r}
bma_results[[2]]
```

In the case of the binomial-beta model prior, the PIPs for all the regressors increase.
Population is classified as very strong, while all other regressors are classified as strong or positive according to posterior inclusion probabilities.
There are also considerable changes in the PM to PSD and PSD_R ratios.
The absolute value of the PM to PSD ratio exceeds two for investment and the democracy index, and is above 1.3 for population, investment price (`ipr`), trade openness (`open`), and life expectancy (`lnlex`).
However, these results are less pronounced when using robust standard errors, with only population, trade openness, and the democracy index remaining above 1.3.
Consequently, the results are not robust with respect to the choice of prior model specification.
The reasons behind these differences will become clear once other functionalities of the package are explored.

The last object in the list is a table containing the prior and posterior expected model sizes for the binomial and binomial-beta model priors.
Importantly, these numbers reflect only the number of regressors in a model and do not include the lagged dependent variable, which is present in every model by construction.

```{r}
bma_results[[16]]
```

The results show that, after observing the data, the researcher should expect around seven and eight and a half regressors in the model under the binomial and binomial-beta model priors, respectively.
These numbers may seem high; however, they are driven by relatively substantial PIPs.
This illustrates the importance of focusing on both posterior inclusion probabilities and the ratios of posterior mean to posterior standard deviation when assessing the robustness of the regressors.

### Prior and posterior model probabilities

The `model_pmp` function allows the user to compare prior and posterior model probabilities over the entire model space in the form of a graph.
The models are ranked from the one with the highest to the one with the lowest posterior model probability.
The function returns a list with three objects:

1. a graph for the binomial model prior;
2. a graph for the binomial-beta model prior;
3. a combined graph for both binomial and binomial-beta model priors.

The user can retrieve each graph separately from the list; however, the function automatically displays a combined graph.

```{r fig=TRUE}
for_models <- model_pmp(bma_results)
```

The graphs demonstrate that most of the posterior probability mass is concentrated within just a couple of models.
To view the results for only the best models, the user can use the `top` parameter.

```{r fig=TRUE}
for_models <- model_pmp(bma_results, top = 10)
```

The last graph for the binomial-beta prior is particularly illuminating in terms of explaining the very high values of posterior inclusion probabilities.
Almost 70% of the posterior probability mass is concentrated in just one model; therefore, variables included in this model will have very high PIP values.
The model in question will be identified after implementing `model_sizes` (and `best_models`, which is covered in Section [Selecting the best models](#selecting-the-best-models)).
Nevertheless, the results from the graph suggest that the best model is the one that includes all the regressors or none
(because the prior value is around $\frac{1}{9}$ on the plot).

The `model_sizes` function displays prior and posterior model probabilities on a graph for models of different sizes.
The graphs exclude the lagged dependent variable; therefore, the model with zero regressors still includes the lagged dependent variable.
Similarly to the `model_pmp` function it returns a list with three objects:

1. a graph for the binomial model prior;
2. a graph for the binomial-beta model prior;
3. a combined graph for both binomial and binomial-beta model priors.

Again, the user can retrieve each graph separately from the list; however, the function automatically displays a combined graph.

```{r fig=TRUE}
size_graphs <- model_sizes(bma_results)
```

The graph in panel b) again explains why PIPs are so high in the case of the binomial-beta model prior.
The model with all the regressors accounts for almost 70% of the total posterior probability mass,
while the remaining portion is concentrated on models with a high number of regressors.
In contrast, the posterior probability mass for the binomial model prior is centered around models with seven regressors.
This graph clearly illustrates the impact of changes in the model prior on posterior probabilities.

### Selecting the best models

The `best_models` function allows the user to view a chosen number of the best models in terms of posterior model probability.
The function returns a list containing nine objects:

1. An inclusion table stored as an array object;
2. A table with estimation results using regular standard errors, stored as an array object;
3. A table with estimation results using robust standard errors, stored as an array object;
4. An inclusion table stored as a knitr object;
5. A table with estimation results using regular standard errors, stored as a knitr object;
6. A table with estimation results using robust standard errors, stored as a knitr object;
7. An inclusion table stored as a gTree object;
8. A table with estimation results using regular standard errors, stored as a gTree object;
9. A table with estimation results using robust standard errors, stored as a gTree object;

The parameters `estimate` and `robust` pertain only to the results that will be automatically displayed after running the function.
The parameter `criterion` determines whether the models should be ranked according to posterior model probabilities calculated using the binomial (1) or binomial-beta (2) model prior.
To obtain the inclusion array for the 10 best models ranked with the binomial model prior, the user needs to run:

```{r}
best_8_models <- best_models(bma_results, criterion = 1, best = 8)
best_8_models[[1]]
```

1 indicates the presence of a given regressor in a model, while the last row displays the posterior model probability of that model.
To obtain a knitr table with estimation output with regular standard errors for best 3 models ranked with binomial-beta model prior, the user needs to run:

```{r}
best_3_models <- best_models(bma_results, criterion = 2, best = 3)
best_3_models[[5]]
```

The comparison of the last two tables further highlights the importance of the model prior.
The best model under the binomial model prior accounts for around 9% of the posterior probability mass,
while the best model under the binomial-beta model prior accounts for over 69%.
Finally, to obtain a gTree table with estimation output using robust standard errors for the top 3 models ranked by the binomial-beta model prior,
the user needs to run:

```{r fig=TRUE}
best_3_models <- best_models(bma_results, criterion = 2, best = 3)
best_3_models[[9]]
```

The comparison of the last two tables and the estimation outputs with regular and robust standard errors demonstrates
how the results change when switching between these two variance estimators.

### Calculating jointness measures

Within the BMA framework, it is possible to establish the nature of the relationship between pairs of examined regressors using the jointness measures.
This can be accomplished using the `jointness` function.
The latest jointness measure, introduced by @Hofmarcher+2018, has been shown to outperform older alternatives developed by @Ley+2007 and @Doppelhofer+2009 [see Section [Model priors and jointness](#model-priors-and-jointness) for the interpretations of jointness measures].
Therefore, the @Hofmarcher+2018 measure is the default option in the `jointness` function.

```{r}
jointness(bma_results)
```

Above the main diagonal the user can find the results for the binomial model prior, and below the results for the binomial-beta model prior.
All the values in the table are positive, indicating complementary relationships between the regressors.
Notably, the values for the binomial-beta prior are substantially higher than those for the binomial prior.
This result is not surprising, as the model with all the regressors accounts for almost 70% of the total posterior probability mass.

To obtain the results for the @Ley+2007 measure, the user should run:

```{r}
jointness(bma_results, measure = "LS")
```

The values corroborate the results obtained using the @Hofmarcher+2018 measure.
All the regressors exhibit complementary relationships, which are visibly stronger under the binomial-beta model prior.

However, the @Doppelhofer+2009 measure yields a slightly different outcome:

```{r}
jointness(bma_results, measure = "DW")
```

In this case, some pairs of regressors have negative values of the jointness measure under the binomial model prior;
however, these values are very close to zero, indicating unrelated variables.
Once again, the values for the binomial-beta model prior are higher, demonstrating how the results are influenced by the choice of model prior.

### Visualizing model coefficients and posterior distributions

The `coef_hist` function allows the user to plot the distribution of estimated coefficients.
It returns a list containing a number of objects equal to the number of regressors plus one.
The first object in the list is a graph of the coefficients for the lagged dependent variable, while the remaining objects are graphs of the coefficients for the other regressors.
The graph for the lagged dependent variable collects coefficients from the entire model space, whereas the graphs for the other regressors only collect coefficients from the models that include the given regressor (half of the model space).

There are two main options for visualizing the coefficient distributions. The first option uses a histogram. The `coef_hist` function provides the user with options for controlling the bin widths of the histogram (`BW`, `binW`, `BN`, and `num`).
The default is `BW = FD`, which selects bin widths using the Freedman-Diaconis method.

```{r fig=TRUE}
coef_plots <- coef_hist(bma_results)
coef_plots[[1]]
```

The second option allows the user to plot kernel densities.

```{r fig=TRUE}
coef_plots2 <- coef_hist(bma_results, kernel = 1)
coef_plots2[[1]]
```

The choice of appropriate plotting options is left to the user's preferences regarding the style of presentation and the size of the model space.

```{r fig=TRUE}
library(gridExtra)
grid.arrange(coef_plots[[1]], coef_plots[[2]], coef_plots2[[1]],
             coef_plots2[[2]], nrow = 2, ncol = 2)
```

One additional option available to the user with the `coef_hist` function is weighting coefficients by posterior model probabilities. This can be accomplished using the `weight` parameter, whose default value is set to `NULL`. Setting `weight` to `"binomial"` or `"beta"` results in plotting histograms based on the binomial or binomial-beta model prior, respectively. For example, in the case of the binomial-beta model prior, it can be observed that the value of the posterior mean is mainly driven by the model characterized by the highest posterior model probability.

```{r fig=TRUE}
coef_plots3 <- coef_hist(bma_results, weight = "beta")
coef_plots3[[1]]
```

The `posterior_dens` allows the user to plot posterior distributions of coefficients. Similarly to the `coef_hist` function, `posterior_dens` returns a list containing a number of objects equal to the number of regressors plus one. The first object in the list is a graph of the posterior distribution for the lagged dependent variable, while the remaining objects are graphs of the posterior distribution for the other regressors. The user needs to specify whether to plot posterior distributions based on the binomial (`prior = "binomial"`) or binomial-beta (`prior = "beta"`) model prior, as well as whether to use regular (`SE = "standard"`) or robust (`SE = "robust"`) standard errors used in the calculation of posterior objects.

```{r fig=TRUE}
distPlots <- posterior_dens(bma_results, prior = "binomial", SE = "standard")
grid.arrange(distPlots[[2]], distPlots[[3]], nrow = 2, ncol = 1)
```

## Changes in model priors

This section provides a more detailed description of the available model prior options.
The subsection on changing expected model size discusses the consequences of changes in the expected model size, while the subsection on the dilution prior describes the dilution prior.

### Changing expected model size

The `bma` function calculates BMA statistics using both the binomial and binomial-beta model priors.
By default, the `bma` function sets the expected model size (EMS) to $K/2$, where $K$ denotes the total number of regressors.
The binomial model prior with $EMS = K/2$ leads to a uniform model prior, assigning equal probabilities to all models.
In contrast, the binomial-beta model prior with $EMS = K/2$ assumes equal probabilities across all model sizes.
However, the user can modify the prior model specification by changing the `EMS` parameter.

First, consider the consequence of concentrating prior probability mass on small models by setting $EMS = 2$.

```{r}
bma_results2 <- bma(full_model_space, round = 3, EMS = 2)
```

Before turning to the main BMA results, let us focus on the changes in the posterior probability mass with respect to model sizes.

```{r}
bma_results2[[16]]
```

The results show that decreasing the prior expected model size led to a considerable decline in the posterior expected model size.
The consequences of this change in the prior expected model size are best illustrated using the prior and posterior probability mass over model sizes.

```{r fig=TRUE}
size_graphs2 <- model_sizes(bma_results2)
```

For both the binomial and binomial-beta model priors, the prior probability mass is more concentrated on small model sizes.
However, for the binomial model prior, the center of the posterior probability mass shifted to medium-sized models, while it remained on large models for the binomial-beta model prior.
Nevertheless, the posterior model probability for the model with all regressors decreased from nearly 0.7 for $EMS = 4.5$ to less than 0.3.
There are also substantial changes in the distribution of the posterior probability mass over the model space.

```{r fig=TRUE}
model_graphs2 <- model_pmp(bma_results2)
```

Both panels of the graph show that the prior and posterior model probabilities have substantially decoupled from each other.
This strongly indicates that the prior and the data are suggesting vastly different model choices.
The tall blue spike represents the model with no regressors.
The main BMA posterior statistic for the binomial model prior also experienced a significant change.

```{r}
bma_results2[[1]]
```

Posterior inclusion probabilities drop considerably for all the regressors, except for population, which remains almost unchanged.
Interestingly, the ratios for all variables declined, with population being the exception.
The ratio for population remains above two for regular standard errors and 1.7 for robust standard errors.
This outcome indicates that population performs relatively better in smaller models.
The results for binomial-beta model prior are given below.

```{r}
bma_results2[[2]]
```

The change in PIPs is again significant, though not as pronounced as in the case of the binomial model prior.
Changes in the ratios are relatively small and irregular for both regular and robust standard errors.
The most pronounced change is the drop in the value of the ratios for the democracy index (`polity`), indicating that this regressor performs better in larger models.

It is also very instructive to examine the jointness measures calculated under the new prior specification.

```{r}
jointness(bma_results2, measure = "HCGHM", rho = 0.5, round = 3)
```

On the one hand, the results obtained with the binomial-beta model prior did not change in any significant manner.
On the other hand, the results obtained with the binomial model prior changed substantially.
The measure indicates that population is a substitute for both the investment price (`ipr`) and the democracy index, as well as, to a lesser extent, secondary education (`sed`) and population growth (`pgrw`).

Next, to consider the consequences of concentrating prior probability mass on large models, `EMS` was set to eight.

```{r}
bma_results8 <- bma(full_model_space, round = 3, EMS = 8)
bma_results8[[16]]
```

The posterior model size increased for the binomial prior; however, it remained almost unchanged for the binomial-beta model prior.
The most interesting aspect is the new graphs of prior and posterior probability mass over the model sizes.

```{r fig=TRUE}
size_graphs8 <- model_sizes(bma_results8)
```

In both cases, the posterior probability mass has concentrated near the models with all the regressors.
However, in the case of the binomial-beta model prior, the model with all the regressors captures most of the posterior probability mass (almost 96%).
This conclusion is further supported by the graphs of posterior model probability across the entire model space.

```{r fig=TRUE}
model_graphs8 <- model_pmp(bma_results8)
```

Panel (a) demonstrates that the change in the expected model size led to a substantial increase in the posterior model probability for the model with all regressors under the binomial model prior.
It now accounts for over 70% of the total posterior probability mass.
The increase in the expected model size also influenced the main BMA statistics.

```{r}
bma_results8[[1]]
```

The PIPs increased considerably.
Population is classified as very strong, while the other regressors are classified as strong or positive.
Interestingly, all the ratios have improved as well, except for population.
The change in the results for the binomial-beta model prior is less pronounced.

```{r}
bma_results8[[2]]
```

With the increase in expected model size, population is classified as very strong,
and all the other regressors are classified as strong in terms of the posterior inclusion probability criterion.
Similarly to the case of the binomial prior, all the ratios increased except for population.

Again, it is instructive to examine the jointness measures.

```{r}
jointness(bma_results8, measure = "HCGHM", rho = 0.5, round = 3)
```

The values of the measures show that all the regressors exhibit a very strong complementary relationship.
This outcome, once again, underscores the importance of carefully considering the prior when interpreting jointness measures.

### Dilution prior

One of the main issues associated with identifying robust regressors is multicollinearity.
Some regressors may approximate the same underlying factor influencing the dependent variable.
Multicollinearity may result from the absence of observable variables associated with a specific theory or from a theory failing to provide a unique candidate for a regressor.
Moreover, some regressors may share a common determinant.
Although @Moral+2013 and @Moral+2016 addressed this issue to some extent, researchers have another option to mitigate multicollinearity: the dilution prior proposed by @George+2010 which was described in detail in Section [Model priors and jointness](#model-priors-and-jointness).

To apply the dilution prior, the user must set `dilution = 1` in the `bma` function.
The user can also manipulate the dilution parameter $\omega$.
The default option is `dil.Par = 0.5`, as recommended by @George+2010.

```{r}
bma_results_dil <- bma(
  model_space = full_model_space,
  round       = 3,
  dilution    = 1
  )
```

The effect of implementing the dilution prior is well depicted by the distribution of prior probability mass over the model sizes.

```{r fig=TRUE}
size_graphs_dil <- model_sizes(bma_results_dil)
```

The change in the prior distribution is more visible for the binomial-beta model prior.
In panel b, the prior probability mass has decreased for larger models and increased for smaller models.
However, this change is not uniform, as models characterized by the highest degree of multicollinearity are subject to the greatest penalty in terms of prior probability mass.

Before moving to the BMA statistics, it is instructive to examine the change in the `dil.Par` parameter.

```{r fig=TRUE}
bma_results_dil01 <- bma(
  model_space = full_model_space,
  round       = 3,
  dilution    = 1,
  dil.Par     = 0.1
)
size_graphs_dil01 <- model_sizes(bma_results_dil01)
```

As we can see, decreasing the value of $\omega$ diminishes the impact of dilution on the model prior.
Conversely, raising the `dil.Par` parameter increases the degree of dilution.

```{r fig=TRUE}
bma_results_dil2 <- bma(
  model_space = full_model_space,
  round       = 3,
  dilution    = 1,
  dil.Par     = 2
)
size_graphs_dil2 <- model_sizes(bma_results_dil2)
```

An especially strong impact can be seen for the binomial-beta prior.

However, even after giving such priority to the penalty for multicollinearity,
the main BMA statistics remain stable.

```{r}
bma_results_dil2[[2]]
```

Hence, we see that @Moral+2016's claim about the fragility of growth regressors withstands the test of various manipulations in the model prior.

## Concluding remarks

This manuscript introduces the `badp` package, which enables Bayesian model averaging for dynamic panels with weakly exogenous regressors — a methodology developed by @Moral+2012, @Moral+2013, @Moral+2016.
This package allows researchers to simultaneously address model uncertainty and reverse causality and is the only R package offering these capabilities.
It provides flexible options for specifying model priors, including dilution prior that accounts for multicollinearity.
The package also includes graphical tools for visualizing prior and posterior model probabilities across model space and model sizes, as well as functions for plotting histograms and kernel densities of the estimated coefficients.
Additionally, it allows researchers to compute jointness measures introduced by @Doppelhofer+2009, @Ley+2007, @Hofmarcher+2018 to assess whether pairs of regressors act as substitutes or complements.
Users can also perform Bayesian model selection to examine in detail the most probable models based on posterior model probability.

The manuscript outlines the methodological approach, while the detailed explanation can be found in @Moral+2012, @Moral+2013, @Moral+2016.
Users unfamiliar with this approach can easily learn to apply it through the hands-on tutorial provided in the manuscript.
The package's functionalities are illustrated using the original dataset from @Moral+2016 in the context of analyzing the determinants of economic growth.
The results of the examination illustrate that fragility of growth determinants is a persistent feature of the data, confirming @Moral+2016 claims.
The various empirical exercises underscore two important aspects of any BMA analysis.
First, the results should always be validated through extensive changes in prior specifications.
Second, the robustness of the regressors must be evaluated using both
posterior inclusion probabilities and the ratios of the posterior mean to the posterior standard deviation,
as these measures can often lead to differing conclusions.

## References