---
title: "Periodic HMMs"
author: "Jan-Ole Koslik"
output: rmarkdown::html_vignette
# output: pdf_document
vignette: >
  %\VignetteIndexEntry{Periodic HMMs}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: refs.bib
link-citations: yes
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  # fig.path = "img/",
  fig.align = "center",
  fig.dim = c(8, 6),
  out.width = "85%"
)
```

> Before diving into this vignette, we recommend reading the vignettes **Introduction to LaMa** and **Inhomogeneous HMMs**.

This vignette shows how to fit HMMs where the state process is a periodically inhomogeneous Markov chain. Formally, this means that for all $t$

$$
\Gamma^{(t+L)} = \Gamma^{(t)},
$$
where $\Gamma^{(t)}$ is the transition probability matrix at time $t$ and $L$ is the cycle length.
Such a setting can conveniently modelled by letting the off-diagonal elements be trigonometric functions of a cyclic variable such as time of day. While this model is a special case of the general, inhomogeneous HMM, it is often more interpretable and very important in statistical ecology, hence we discuss it separately.

```{r, setup}
# loading the package
library(LaMa)
```

### Setting parameters for simulation

We simulate a 2-state HMM with Gaussian state-dependent distributions. For the periodic inhomogeneity, we choose a bimodal activity pattern. All $L$ transition probability matrices can conveniently be calculated using `tpm_p()`. Under the hood, this performs a basis expansion using `trigBasisExp()` into sine and cosine terms and uses linear predictos of the form
$$
\eta^{(t)}_{ij} = \beta_0^{(ij)} + \sum_{k=1}^K \bigl( \beta_{1k}^{(ij)} \sin(\frac{2 \pi k t}{L}) + \beta_{2k}^{(ij)} \cos(\frac{2 \pi k t}{L}) \bigr)
$$
for the off-diagonal entries of the transition probability matrix.
The special case of periodically inhomogeneous Markov chains also allows the derivation of a so-called **periodically stationary distribution** [@koslik2023inference] which we can compute this distribution using `stationary_p()`.

```{r parameters}
# parameters
mu = c(4, 14)   # state-dependent means
sigma = c(3, 5) # state-dependent standard deviations

L = 48 # half-hourly data: 48 observations per day
beta = matrix(c(-1, 1, -1, -1, 1,
                -2, -1, 2, 2, -2), nrow = 2, byrow = TRUE)
Gamma = tpm_p(seq(1, 48, by = 1), L, beta, degree = 2)
Delta = stationary_p(Gamma)

# having a look at the periodically stationary distribution
color = c("orange", "deepskyblue")
plot(Delta[,1], type = "b", lwd = 2, pch = 16, col = color[1], bty = "n", 
     xlab = "time of day", ylab = "Pr(state 1)")
# only plotting one state, as the other probability is just 1-delta
```

### Simulating data

```{r data}
# simulation
tod = rep(1:48, 50) # time of day variable, 50 days
n = length(tod)
set.seed(123)
s = rep(NA, n)
s[1] = sample(1:2, 1, prob = Delta[tod[1],]) # initial state from stationary dist
for(t in 2:n){
  # sampling next state conditional on previous one and the periodic t.p.m.
  s[t] = sample(1:2, 1, prob = Gamma[s[t-1],,tod[t]])
}
# sampling observations conditional on the states
x = rnorm(n, mu[s], sigma[s])

oldpar = par(mfrow = c(1,2))
plot(x[1:400], bty = "n", pch = 20, ylab = "x", 
     col = color[s[1:400]])
boxplot(x ~ tod, xlab = "time of day")
# we see a periodic pattern in the data
par(oldpar)
```

## Trigonometric modeling of the transition probalities

### Writing the negative log-likelihood function

We specify the likelihood function and pretend we know the degree of the trigonometric link which, in practice, is never the case. Again we use `tpm_p()` and we compute the periodically stationary start by using `stationary_p()` with the additional argument that specifies which time point to compute.

```{r mllk}
nll = function(par, x, tod){
  beta = matrix(par[1:10], nrow = 2) # matrix of coefficients
  Gamma = tpm_p(tod = 1:48, L = 48, beta = beta, degree = 2) # calculating all L tpms
  delta = stationary_p(Gamma, t = tod[1]) # periodically stationary start
  mu = par[11:12]
  sigma = exp(par[13:14])
  # calculate all state-dependent probabilities
  allprobs = matrix(1, length(x), 2)
  for(j in 1:2) allprobs[,j] = dnorm(x, mu[j], sigma[j])
  # return negative for minimization
  -forward_p(delta, Gamma, allprobs, tod)
}
```

### Fitting an HMM to the data

```{r model, warning=FALSE}
par = c(beta = c(-1,-2, rep(0, 8)), # starting values state process
        mu = c(4, 14), # initial state-dependent means
        logsigma = c(log(3),log(5))) # initial state-dependent sds
system.time(
  mod <- nlm(nll, par, x = x, tod = tod)
)
```

### Visualising results

Again, we use `tpm_p()` and `stationary_p()` to tranform the parameters.
```{r visualization}
# transform parameters to working
beta_hat = matrix(mod$estimate[1:10], nrow = 2)
Gamma_hat = tpm_p(tod = 1:48, L = 48, beta = beta_hat, degree = 2)
Delta_hat = stationary_p(Gamma_hat)
mu_hat = mod$estimate[11:12]
sigma_hat = exp(mod$estimate[13:14])

delta_hat = apply(Delta_hat, 2, mean)

oldpar = par(mfrow = c(1,2))
hist(x, prob = TRUE, bor = "white", breaks = 40, main = "")
curve(delta_hat[1]*dnorm(x, mu_hat[1], sigma_hat[1]), add = TRUE, lwd = 2, 
      col = color[1], n=500)
curve(delta_hat[2]*dnorm(x, mu_hat[2], sigma_hat[2]), add = TRUE, lwd = 2, 
      col = color[2], n=500)
curve(delta_hat[1]*dnorm(x, mu_hat[1], sigma_hat[1])+
        delta_hat[2]*dnorm(x, mu[2], sigma_hat[2]),
      add = TRUE, lwd = 2, lty = "dashed", n = 500)
legend("topright", col = c(color[1], color[2], "black"), lwd = 2, bty = "n",
       lty = c(1,1,2), legend = c("state 1", "state 2", "marginal"))

plot(Delta_hat[,1], type = "b", lwd = 2, pch = 16, col = color[1], bty = "n", 
     xlab = "time of day", ylab = "Pr(state 1)")
par(oldpar)
```

## Efficieny and convenience

While it is convenient to use `tpm_p()`, it performs the basis expansion into sine and cosine terms each time it is evaluated.
This is wasteful in model estimation as these terms stay fixed. A better alternative is to first build the corresponding design matrix. This can be done conveniently using the `cosinor()` function, either by itself or in a formula passed to `make_matrices()`. First let's call `cosinor()` by itself:

```{r cosinor}
tod = 1:24 # cyclic time of day variable
Z = cosinor(tod, period = c(24, 12)) # design matrix
Z = cbind(intercept = 1, Z)
head(Z, 2)
```

The cosinor function excepts a `period` argument which specifies the period of the trigonometric functions. As you can see, `period` can be a vector, leading to a larger basis expansion, i.e. more flexibility. If your model involves other covariates than time of day, say temperature (`temp`), it might be more convenient to use `make_matrices()` with a formula:


```{r cosinor_form}
data = data.frame(tod = rep(1:24, 2), 
                  temp = rnorm(48, 20, 5))
modmat = make_matrices(~ temp * cosinor(tod, 24), data)
Z = modmat$Z
head(Z, 2)
```

In both cases, the transition probability matrix can then be calculated using `tpm_g()` or `tpm_p()`:

```{r tpm}
# coefficient matrix
(beta = matrix(c(-2,-2, runif(2*(ncol(Z)-1))), nrow = 2))
# constructing t.p.m.s
Gamma = tpm_p(Z = Z, beta = beta) # not first arguments in tpm_p
Gamma = tpm_g(Z, beta) # but first arguments in tpm_g
```


<!-- ## Nonparametric modeling of the transition probalities -->

<!-- `Lcpp` also makes non-parametric modeling trivially easy. Here we model the transition probabilities using cyclic P-splines similar to @feldmann2023. We do so in first calculating the design matrix using `mgcv` which we can easily be handled by `tpm_p()`. -->

<!-- ### Building the cyclic spline design matrix -->

<!-- ```{r cSpline} -->
<!-- nk = 8 # number of basis functions -->
<!-- tod = 1:48 -->
<!-- L = 48 -->
<!-- k = L * 0:nk / nk # equidistant knots -->
<!-- Z = mgcv::cSplineDes(tod, k) ## cyclic spline design matrix -->

<!-- # plotting the B-Spline basis functions -->
<!-- plot(Z[,1], type = "l", lwd = 2, col = 1, bty = "n", -->
<!--      xlab = "time of day", ylab = "basis functions", ylim = c(0,0.8)) -->
<!-- for(i in 2:nk){ -->
<!--   lines(Z[,i], lwd = 2, col = i) -->
<!-- }  -->
<!-- ``` -->

<!-- ### Writing the negative log-likelihood function -->

<!-- We only need to make small changes to the likelihood function. Most importantly we use `tpm_p()` with the additional argument `Z`, which allows using a bespoke design matrix. In general, a penalty for the curvature should also be added, which is done in the last lines. -->

<!-- ```{r mllk2} -->
<!-- mllk_np = function(theta.star, x, z, Z, lambda){ -->
<!--   beta = matrix(theta.star[1:(2+2*nk)], nrow = 2) # nk params per off-diagonal element -->
<!--   Gamma = tpm_p(tod = 1:48, L = 48, beta = beta, Z = Z) # calculating all L tpms -->
<!--   delta = stationary_p(Gamma, t = z[1]) # periodically stationary HMM -->
<!--   mu = theta.star[2+2*nk + 1:2] -->
<!--   sigma = exp(theta.star[2+2*nk + 2 + 1:2]) -->
<!--   # calculate all state-dependent probabilities -->
<!--   allprobs = matrix(1, length(x), 2) -->
<!--   for(j in 1:2){ allprobs[,j] = stats::dnorm(x, mu[j], sigma[j]) } -->
<!--   # return negative for minimization -->
<!--   l = forward_p(delta, Gamma, allprobs, z) -->
<!--   # penalize curvature -->
<!--   penalty = sum(diff(beta[1,-1], differences = 2)^2)+ -->
<!--     sum(diff(beta[2,-1], differences = 2)^2) -->
<!--   return(-l + lambda*penalty) -->
<!-- } -->
<!-- ``` -->

<!-- ### Fitting a non-parametric HMM -->

<!-- ```{r model2, warning=FALSE} -->
<!-- theta.star = c(-1,-2, rep(0, 2*nk), # starting values state process -->
<!--                4, 14 ,log(3),log(5)) # starting values state-dependent process -->
<!-- s = Sys.time() -->
<!-- mod_np = nlm(mllk_np, theta.star, x = x, z = z, Z = Z, lambda = 0) -->
<!-- # in this case we don't seem to need a lot of penalization -->
<!-- Sys.time()-s -->
<!-- ``` -->

<!-- The model fit is still quite fast for non-parametric modeling. -->

<!-- ### Visualizing results -->

<!-- Again, we use `tpm_p()` and `stationary_p()` to tranform the unconstraint parameters to working parameters. -->

<!-- ```{r visualization2} -->
<!-- # transform parameters to working -->
<!-- beta_hat_np = matrix(mod_np$estimate[1:(2+2*nk)], nrow = 2) -->
<!-- Gamma_hat_np = tpm_p(tod = 1:48, L = 48, beta = beta_hat_np, Z = Z) -->
<!-- Delta_hat_np = stationary_p(Gamma_hat_np) -->

<!-- # comparing the two fits -->
<!-- plot(Delta_hat_np[,1], type = "l", lwd = 3, col = "purple", bty = "n",  -->
<!--      xlab = "time of day", ylab = "Pr(state 1)") -->
<!-- # parametric fit -->
<!-- lines(Delta_hat[,1], lwd = 3, col = color[1]) -->
<!-- ``` -->

## References