---
title: "Introduction to the EAinference package"
author: "Seunghyun Min"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to EAinference}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

`EAinference` package is designed to facilitate simulation based inference of lasso type estimators. 
The package includes

- Fitting the lasso, group lasso, scaleld lasso and scaled group lasso models
- Gaussian and wild multiplier bootstrap for lasso, group lasso, scaled lasso, scaled group
lasso and their de-biased estimators
- Importance sampler for approximating p-values in these methods
- Markov chain Monte Carlo lasso sampler with applications in post-selection inference.

## Loss functions
Let the unknown true model be
$$ y = X\beta_0 + \epsilon,$$
where $\beta_0$ is a unknown true coefficient vector and $\epsilon_i \sim_{iid} N(0,\sigma^2)$. 

We use following loss functions. 
<!--$$ \ell_{lasso}(\beta ; \lambda) = \frac{1}{2}||y-X\beta||^2_2 + n\lambda\sum_i w_i|\beta_i|$$-->
$$ \ell_{grlasso}(\beta ; \lambda) = \frac{1}{2}||y-X\beta||^2_2 + n\lambda\sum_j w_j||\beta_{(j)}||_2,$$
<!--$$ \ell_{slasso}(\beta, \sigma ; \lambda) = \frac{1}{2\sigma}||y-X\beta||^2_2 + \frac{\sigma}{2} + n\lambda\sum_i w_i|\beta_i|$$-->
$$ \ell_{sgrlasso}(\beta, \sigma ; \lambda) = \frac{1}{2\sigma}||y-X\beta||^2_2 + \frac{\sigma}{2} + n\lambda\sum_j w_j||\beta_{(j)}||_2,$$
where $grlasso$ and $sgrlasso$ represent group lasso and scaled group lasso, respectively,
and $w_j$ is the weight factor for the $j$-th group. Loss functions for lasso and 
scaled lasso can be treated as special cases of group lasso and group scaled lasso
when every group is a singleton, respectively.

## Fit lasso, group lasso, scaled lasso and scaled group lasso models

`lassoFit` computes lasso, group lasso, scaled lasso and scaled group lasso solutions.
Users can either provide the value of $\lambda$ or choose to use cross-validation
by setting `lbd = "cv.min"` or `lbd = "cv.1se"`. By letting `lbd = "cv.min"`, users
can have $\lambda$ which gives minimum mean-squared error, while `lbd = "cv.1se"`
is the largest $\lambda$ within 1 standard deviaion error bound of `"cv.min"`. 

```{r}
library(EAinference)
set.seed(1234)
n <- 30; p <- 50
Niter <-  10
group <- rep(1:(p/5), each = 5)
weights <- rep(1, p/5)
beta0 <- c(rep(1,5), rep(0, p-5))
X <- matrix(rnorm(n*p), n)
Y <- X %*% beta0 + rnorm(n)
# set lambda = .5
lassoFit(X = X, Y = Y, lbd = .5, group = group, weights = weights, type="grlasso")
# use cross-validation
lassoFit(X = X, Y = Y, lbd = "cv.1se", group = group, weights = weights, type="grlasso")
```
## Parametric bootstrap sampler

`PBsampler` supports two bootstrap methods; Gaussian bootstrap and wild multiplier bootstrap. 
Due to the fact that the sampling distirbution of $(\hat{\beta}, S)$, coefficient estimator and its subgradient, is characterized by $(\beta_0, \sigma^2, \lambda)$, users are required to provide `PE`(either the estimate of $\beta_0$ or the estimate of $E(y) = X\beta_0$), `sig2`(or estimate of $\sigma^2$)
and `lbd`($\lambda$ from above loss functions).

By specifying two sets of arguments,
`(PE_1, sig2_1, lbd_1)` and `(PE_2, sig2_2, lbd_2)`, users can sample from the mixture distribution.
In this way, samples will be drawn from `(PE_1, sig2_1, lbd_1)` with 1/2 probability and from 
`(PE_2, sig2_2, lbd_2)` with another 1/2 probability. 

```{r}
set.seed(1234)
n <- 5; p <- 10
Niter <-  3
group <- rep(1:(p/5), each = 5)
weights <- rep(1, p/5)
beta0 <- c(rep(1,5), rep(0, p-5))
X <- matrix(rnorm(n*p), n)
Y <- X %*% beta0 + rnorm(n)
#
# Using non-mixture distribution
#
PBsampler(X = X, PE_1 = rep(0, p), sig2_1 = 1, lbd_1 = .5,
weights = weights, group = group, type = "grlasso", niter = Niter, parallel = FALSE)
#
# Using mixture distribution
#
PBsampler(X = X, PE_1 = rep(0, p), sig2_1 = 1, lbd_1 = .5,
 PE_2 = rep(1, p), sig2_2 = 2, lbd_2 = .3, weights = weights,
 group = group, type = "grlasso", niter = Niter, parallel = FALSE)
```
    
## Importance sampler for high dimensional data
Importance sampler enables users to access the inference of an extreme region.
This is done by using a proposal distiribution that is denser around the extreme region.

Say that we are interested in computing the expectation of a function of a random variable, $h(X)$.
Let $f(x)$ be the true or target distribution and $g(x)$ be the proposal distribution. 
We can approximate the expectation by a weighted average of samples drawn from 
the proposal distribution as follows,

$$
\begin{eqnarray}
E_f\Big[h(X)\Big] &=& E_g \Big[h(X)\frac{f(X)}{g(X)} \Big]\\ 
&\approx& \frac{1}{N}\sum_{i=1}^N h(x_i)\frac{f(x_i)}{g(x_i)}.
\end{eqnarray}
$$
By using `hdIS` method, one can easily compute the importance weight which is the ratio of 
target density over proposal density; $f(x_i)/g(x_i)$ from above equation.

Users can simply draw samples from the proposal distribution using `PBsampler` and 
plug in the result into `hdIS` with target distribution parameters in order to 
compute the importance weights. 

```{r}
# Target distribution parameter
PETarget <- rep(0, p)
sig2Target <- .5
lbdTarget <- .37
# Proposal distribution parameter
PEProp1 <- rep(1, p)
sig2Prop1 <- .5
lbdProp1 <- 1

# Draw samples from proposal distribution
PB <- PBsampler(X = X, PE_1 = PEProp1, sig2_1 = sig2Prop1,
 lbd_1 = lbdProp1, weights = weights, group = group, niter = Niter,
 type="grlasso", PEtype = "coeff")

# Compute importance weights
hdIS(PB, PETarget = PETarget, sig2Target = sig2Target, lbdTarget = lbdTarget,
log = TRUE)
```

## Metropolis-Hastings Lasso Sampler

In this section, we introduce `MHLS` method, a Markov Chain Monte Carlo(MCMC) sampler for lasso estimator. 
Although bootstrapping is one of the most convenient sampling methods, it has a clear
limitation which is that sampling from the conditional distribution is impossible. 
In contrast, MCMC sampler can easily draw samples from the 
conditional distribution. Here, we introduce `MHLS` function which draws lasso samples 
under the fixed active set, A. 

```{r}
weights <- rep(1,p)
lbd <- .37
LassoResult <- lassoFit(X = X,Y = Y,lbd = lbd, type = "lasso", weights = weights)
B0 <- LassoResult$B0
S0 <- LassoResult$S0
Result <- MHLS(X = X, PE = B0, sig2 = 1, lbd = 1,
    weights = weights, B0 = B0, S0 = S0, niter = 100, burnin = 0,
    PEtype = "coeff")
Result
```
We provide summary and plot functions for `MHLS` results. 
```{r}
summary(Result)
```
```{r, fig.width = 7, fig.height = 4.5}
plot(Result, index=c(1,4,9))
```

## Post-selection Inference

`postInference.MHLS` is a method for post-selection inference. 
The inference is provided by `MHLS` results from multiple chains. 
In order to achieve the robust result, different `PE` values are used
for each chain. After drawing samples of $(\hat{\beta}, S)$ with MH sampler, 
we then refit the estimator to remove the bias of the lasso estimator. The final
output is the $(1-a)$ quantile of each active coefficient. 

```{r}
postInference.MHLS(X = X, Y = Y, lbd = lbd, sig2.hat = 1, alpha = .05,
 method = "coeff", nChain = 5, niterPerChain = 20, 
 parallel = !(.Platform$OS.type == "windows"))
```