---
title: "MoEClust: Gaussian Parsimonious Clustering Models with Gating and Expert Network Covariates and a Noise Component"
author: "Keefe Murphy"
date: "`r Sys.Date()`"
urlcolor: blue
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{MoEClust}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
  %\VignetteDepends{grDevices}
  %\VignetteDepends{lattice}
  %\VignetteDepends{mclust}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(fig.width=7, fig.height=7, fig.align = 'center', 
                      fig.show='hold', warning=FALSE, 
                      message=FALSE, progress=FALSE, 
                      collapse=TRUE, comment="#>")

if(isTRUE(capabilities("cairo"))) {
  knitr::opts_chunk$set(dev.args=list(type="cairo"))
}
```

## Introduction
__MoEClust__ is an R package which fits finite Gaussian Mixtures of Experts models using a range of parsimonious covariance parameterisations via the EM/CEM algorithm, i.e. allows incorporation of covariates into the mixing proportions and/or Gaussian densities of finite Gaussian mixture models under the various parsimonious covariance parameterisations in the GPCM family (e.g. __mclust__). These models were introduced by [Murphy and Murphy (2020)](https://doi.org/10.1007/s11634-019-00373-8). The package also facilitates the inclusion of an additional noise component, and allows visualisation of Gaussian mixture of experts models with parsimonious covariance parameterisations using generalised pairs plots.

The most important function in the __MoEClust__ package is: `MoE_clust`, for fitting the model via the EM/CEM algorithm with gating and/or expert network covariates, supplied via formula interfaces.  `MoE_compare` is provided for conducting model selection between different results from `MoE_clust` using different covariate combinations &/or initialisation strategies, etc.

`MoE_stepwise` is provided for conducting a greedy forward stepwise search to identify the optimal model in terms of the number of components, GPCM covariance type, and the subsets of gating/expert network covariates.

`MoE_control` allows supplying additional arguments to `MoE_clust` and `MoE_stepwise` which govern, among other things, controls on the inclusion of an additional noise component and controls on the initialisation of the allocations for the EM/CEM algorithm.

A dedicated plotting function exists for visualising the results using generalised pairs plots, for examining the gating network, and/or log-likelihood, and/or clustering uncertainties, and/or similarity matrix, and/or graphing model selection criteria values. The generalised pairs plots (`MoE_gpairs`) visualise all pairwise relationships between clustered response variables and associated continuous, categorical, and/or ordinal covariates in the gating &/or expert networks, coloured according to the MAP classification, and also give the marginal distributions of each variable (incl. the covariates) along the diagonal.

An `as.Mclust` method is provided to coerce the output of class `"MoEClust"` from `MoE_clust` to the `"Mclust"` class, to facilitate use of plotting and other functions for the `"Mclust"` class within the __mclust__ package. As per __mclust__, __MoEClust__ also facilitates modelling with an additional noise component (with or without the mixing proportion for the noise component depending on covariates). 

Finally, a `predict` method is provided for predicting the fitted response and probability of cluster membership (and by extension the MAP classification) for new data, in the form of new covariates and new response variables, or new covariates only.

Other functions also exist, e.g. `MoE_crit`, `MoE_dens`, `MoE_estep`, and `aitken`, which are all used within `MoE_clust` but are nonetheless made available for standalone use. The package also contains two data sets: `CO2data` and `ais`. 

This vignette aims to demonstrate the __MoEClust__ models via application to these two well-known univariate and multivariate data sets, respectively.

### Installing MoEClust
__MoEClust__ will run in Windows, Mac OS X, or Linux. To install it you first need to install [R](https://cran.r-project.org/). Installing [RStudio](https://posit.co/) as a nice desktop environment for using R is also recommended.

Once in R you can type at the R command prompt:

```{r, eval=FALSE}
install.packages('devtools')
devtools::install_github('Keefe-Murphy/MoEClust')
```

to install the latest development version of the package from the __MoEClust__ [GitHub page](https://github.com/Keefe-Murphy/MoEClust). 

To instead install the latest stable official release of the package from CRAN go to R and type:

```{r, eval=FALSE}
install.packages('MoEClust')
```

```{r, echo=FALSE}
suppressMessages(library(mclust))
```

In either case, if you then type:

```{r}
library(MoEClust)
```

it will load in all the __MoEClust__ functions.

The GitHub version contains a few more features but some of these may not yet be fully tested, and occasionally this version might be liable to break when it is in the process of being updated. If you find bugs or want to suggest new features please visit the __MoEClust__ [GitHub issues page](https://github.com/Keefe-Murphy/MoEClust/issues). 

## CO2 Data
Load the CO2 data.

```{r}
data(CO2data)
CO2 <- CO2data$CO2
GNP <- CO2data$GNP
```

Fit various MoEClust mixture models to cluster the CO2 data, allowing the GNP variable to enter the gating &/or expert networks, or neither, via a formula interface. Also consider models with equal mixing proportions. Note that for models with covariates in the gating network, or models with equal mixing proportions, we don't need to fit single-component models (though it could be done!) as this would merely duplicate the single-component models within `m1` and `m3`, respectively. 

```{r, echo=FALSE}
m1 <- MoE_clust(CO2, G=1:3, verbose=FALSE)
m2 <- MoE_clust(CO2, G=2:3, gating= ~ GNP, verbose=FALSE)
m3 <- MoE_clust(CO2, G=1:3, expert= ~ GNP, verbose=FALSE)
m4 <- MoE_clust(CO2, G=2:3, gating= ~ GNP, expert= ~ GNP, verbose=FALSE)
m5 <- MoE_clust(CO2, G=2:3, equalPro=TRUE, verbose=FALSE)
m6 <- MoE_clust(CO2, G=2:3, expert= ~ GNP, equalPro=TRUE, verbose=FALSE)
```

```{r, eval=FALSE}
m1 <- MoE_clust(CO2, G=1:3)
m2 <- MoE_clust(CO2, G=2:3, gating= ~ GNP)
m3 <- MoE_clust(CO2, G=1:3, expert= ~ GNP)
m4 <- MoE_clust(CO2, G=2:3, gating= ~ GNP, expert= ~ GNP)
m5 <- MoE_clust(CO2, G=2:3, equalPro=TRUE)
m6 <- MoE_clust(CO2, G=2:3, expert= ~ GNP, equalPro=TRUE)
```

Choose the best model among these. Specify `optimal.only=TRUE` so that only the optimal model in each set of models is included in the comparison.

```{r}
comp <- MoE_compare(m1, m2, m3, m4, m5, m6, optimal.only=TRUE)
```

Now, see if a better model can be found using greedy forward stepwise selection, with the aid of the `MoE_stepwise` function, on the same data. By default, this starts from a single-component model and successively adds components and covariates to both networks, while evaluating each action over all possible model types, until a model which is optimal according to some criterion is found.

```{r}
(mod1 <- MoE_stepwise(CO2, GNP))
```

Next, conduct another stepwise search considering models with a noise component. This notably starts from a model with only a uniform noise component and then proceeds as above to add Gaussian components and covariates one at a time, accepting each action that increases a chosen criterion (`"bic"`, by default).

```{r} 
(mod2 <- MoE_stepwise(CO2, GNP, noise=TRUE))
```

Finally, compare all sets of results to choose the optimal model.

```{r}
(best <- MoE_compare(mod1, mod2, comp, pick=1)$optimal)
```

```{r}
(summ <- summary(best, classification=TRUE, parameters=FALSE, networks=FALSE))
```

Visualise the results for the optimal model using a generalised pairs plot.

```{r}
plot(best, what="gpairs", jitter=FALSE)
```

Visualise the density of the mixture distribution.

```{r, echo=FALSE}
res    <- best
G      <- res$G
expert <- res$expert
x.name <- names(res$net.covs)
y.name <- names(res$data)
x      <- as.matrix(res$net.covs)
y      <- as.matrix(res$data)
plot(x=x, y=y, main=substitute(atop(paste('CO'[2], " Data"), paste(Mname, " model, G=", rG, ", equal mixing proportions, with expert network covariate: GNP")), list(Mname=res$modelName, rG=G)), ylab=expression('CO'[2]), xlab="GNP", type="n", cex.main=0.9)
x.new  <- setNames(seq(par("usr")[1L], par("usr")[2L], length=500L), rep("GNP", 500L))
y.new  <- setNames(seq(par("usr")[3L], par("usr")[4L], length=500L), rep("CO2", 500L))
grid   <- expand.grid(x.new, y.new)
getden <- function(x, y, res) {
  sig  <- res$parameters$variance$modelName
  Epr  <- predict(expert, newdata=setNames(data.frame(x), names(x)[1]))
  den  <- do.call(cbind, lapply(seq_len(G), function(k) dnorm(y, Epr[[k]], sqrt(res$parameters$variance$sigmasq[ifelse(sig == "V", k, 1)]))))
  rowSums(sweep(den, 1, res$parameters$pro, "*", check.margin=FALSE))
}
mat    <- matrix(getden(grid[,1L], grid[,2L], res), length(x.new), length(y.new))
image(x.new, y.new, mat, col=c("white", grDevices::heat.colors(30L, rev=TRUE)), xlab="GNP", ylab=paste('CO'[2]), add=TRUE)
box(lwd=1)
contour(x.new, y.new, mat, add=TRUE, col="grey60", labcex=0.75)
points(GNP, CO2)
```

Convert from the `"MoEClust"` class to the `"Mclust"` class in order to further visualise the results. Examine the `"classification"` and `"uncertainty"` options.

```{r, fig.height=5.5, fig.width=5.5}
mod <- as.Mclust(comp$optimal)
plot(mod, what="classification")
```

```{r, fig.height=5.5, fig.width=5.5}
plot(mod, what="uncertainty")
```

Predictions can also be made from `MoEClust` models: the response, probability of cluster membership, and the MAP classification can be predicted for the fitted data or for new data (in the form of new covariates and new response variables, or new covariates only). Let's predict the response variable using the optimal model fit above to the CO2 data.

```{r, eval=FALSE}
as.vector(predict(comp$optimal)$y)
```

```{r, echo=FALSE}
as.vector(suppressWarnings(predict(comp$optimal)$y))
```

Now let's build a model on some of the CO2 data and retain the indices of the withheld observations:

```{r, echo=FALSE}
set.seed(4321)
```

```{r, results="hide"}
ind <- sample(1:nrow(CO2data), 2)
res <- MoE_clust(CO2data[-ind,]$CO2, G=3, expert= ~ GNP, 
                 equalPro=TRUE, network.data=CO2data[-ind,])
```

Now we can make predictions on the withheld data, either by using the withheld covariates only, or by also using the withheld response variables. Note that `newdata` can be either a list with component(s) `new.x` (and optionally `new.y`) or a single matrix/data.frame with the appropriate columns.

```{r, eval=FALSE}
# Using new covariates only
predict(res, newdata = CO2data[ind,], use.y = FALSE)[1:3]

# Using both new covariates & new response variables
predict(res, newdata = CO2data[ind,])[1:3]         
```

```{r, echo=FALSE}
# Using new covariates only
suppressWarnings(predict(res, newdata = CO2data[ind,], use.y = FALSE)[1:3])

# Using both new covariates & new response data
predict(res, newdata = CO2data[ind,])[1:3]         
```

## AIS Data
Load the Australian Institute of Sports data.

```{r}
data(ais)
hema <- ais[,3:7]
```

Examine the various additional options around initialisation of the algorithm:

```{r, eval=FALSE}
?MoE_control
```

Fit a parsimonious Gaussian mixture of experts MoEClust model to the hematological variables within the AIS data, supplying `sex` in the expert network and `BMI` in the gating network via formula interfaces. Include an additional noise component by specifying it's prior mixing proportion `tau0`. Toggle between allowing the mixing proportion for the noise component depend on the gating concomitant or not via the `noise.gate` argument. This time, allow the printing of messages to the screen.

```{r, eval=FALSE}
mod <- MoE_clust(hema, G=1:3, expert= ~ sex, gating= ~ BMI, 
                 network.data=ais, tau0=0.1, noise.gate=FALSE)
```

```{r, echo=FALSE}
mod <- suppressWarnings(MoE_clust(hema, G=1:3, expert= ~ sex, gating= ~ BMI, network.data=ais, tau0=0.1, noise.gate=FALSE, verbose=FALSE))
```

Visualise the results for the optimal model using a generalised pairs plot.

```{r}
plot(mod, what="gpairs")
```

Replace the scatterplots in response vs. response panels with bivariate density contours and add corresponding parametric marginal density estimates along the diagonal. Note that this is liable to be slow for models with expert network covariates.

```{r}
plot(mod, what="gpairs", response.type="density", show.dens=TRUE, buffer=0.1)
```

Visualise the clustering uncertainty for the optimal model using a generalised pairs plot.

```{r}
plot(mod, what="gpairs", response.type="uncertainty")
```

Instead visualise the clustering uncertainty in the form of an ordered profile plot (`type="barplot"` can also be specified here).

```{r}
plot(mod, what="uncertainty", type="profile")
```

Plot the BIC of the visited models.

```{r, echo=FALSE}
plot(mod, what="criterion", legendArgs=list(x="right"))
```

Plot the gating network of the optimal model against the gating concomitant `BMI`. Note the flat horizontal line of grey circles corresponding to the noise component due to the specification of `noise.gate=FALSE` in the original function call.

```{r}
plot(mod, what="gating", x.axis=ais$BMI, xlab="BMI")
```

For the optimal model, plot the log-likelihood vs. the number of EM iterations.

```{r, eval=FALSE}
plot(mod, what="loglik")
```

```{r, echo=FALSE}
plot(mod, what="loglik")
```

Produce further visualisations for the Gaussian components with the aid of the `lattice` library.

```{r, eval=FALSE}
require("lattice")
z <- factor(mod$classification[mod$classification > 0], 
            labels=paste0("Cluster", seq_len(mod$G)))
splom(~ hema | ais$sex, groups=z, xlab=NULL)
```

```{r, echo=FALSE, fig.height=4}
require("lattice", quietly=TRUE)
z <- factor(mod$classification[mod$classification > 0], labels=paste0("Cluster", seq_len(mod$G)))
splom(~ hema | ais$sex, groups=z, xlab=NULL)
```

```{r, eval=FALSE}
splom(~ hema | z, groups=ais$sex, xlab=NULL)
```

```{r, echo=FALSE, fig.height=4}
require("lattice", quietly=TRUE)
splom(~ hema | z, groups=ais$sex, xlab=NULL)
```

---------

## References
Murphy, K. and Murphy, T. B. (2020). Gaussian parsimonious clustering models with covariates and a noise component. _Advances in Data Analysis and Classification_, 14(2): 293--325. <[doi:10.1007/s11634-019-00373-8](https://doi.org/10.1007/s11634-019-00373-8)>.

Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. _Journal of the American Statistical Association_, 97(458): 611--631.

Scrucca L., Fop M., Murphy T. B. and Raftery A. E. (2016). mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. _The R Journal_, 8(1): 289--317.