\documentclass{article}
% \VignetteIndexEntry{Guide to using moveHMM}
% \VignetteEngine{knitr::knitr}

\usepackage[utf8]{inputenc}
\usepackage[bf,font={small,sl}]{caption} % for pretty captions
\usepackage{natbib} % for the bibliography
\usepackage{amsmath} % for align, cases...
\usepackage{amsfonts} % for mathbb...
\usepackage{listings} % for lstlisting
\usepackage{color} % for lstlisting background color
\usepackage{booktabs} % pretty tables
\usepackage{url}
\usepackage[margin=1in]{geometry}
\usepackage{natbib}
\usepackage{hyperref}
\renewcommand{\baselinestretch}{1.3}
\usepackage{parskip}

\title{\textbf{moveHMM: an R package for the analysis of animal movement data}}
\author{Th\'eo Michelot, Roland Langrock \& Toby Patterson}

\begin{document}
\maketitle

<<init, echo=FALSE, message=FALSE>>=
library(MASS)
library(boot)
library(sp)
library(Rcpp)
library(moveHMM)
@

\setcounter{tocdepth}{2}    % entries down to \subsubsections in the TOC
\tableofcontents

\newpage
\section{Background}

\subsection{Introduction}

The analysis of animal movement data has become increasingly important in terrestrial and marine ecology. A substantial part of the literature on statistical modelling of animal movement data has focused on the intuitive approach of decomposing movement time series into distinct behavioural modes (a.k.a.\ bouts, states), via the use of so-called state-switching models. Some early references on this topic are \cite{blackwell1997}, \cite{morales2004}, and \cite{jonsen2005}. In cases where the position data are highly accurate (e.g., from GPS), and regular in time, hidden Markov models (HMMs) can be used to efficiently classify animal movement data into states. HMMs are increasingly popular in this field, due to their flexibility and to the associated very efficient recursive algorithms available for conducting statistical inference \citep{patterson2009, langrock2012, mcclintock2020}.

\texttt{moveHMM} is an R package which implements HMMs and associated tools (state decoding, model selection, model checking, etc.) tailored to animal movement modelling. Particular attention was paid to computational efficiency with the fitting algorithm implemented in C++. The computational speed makes it feasible to very large data sets --- e.g.\ tens of thousands of positions collected for each of a dozen individual animals --- on standard desktop PCs. The package also allows users to incorporate covariate data into their models, which is particularly useful when inferring the drivers of changes in behaviour. Our hope is that the \texttt{moveHMM} package will provide users who collect movement data with an interface to sophisticated and adequate methods for a statistical analysis of their data. The package is structured so as to allow the users to prepare their data for analysis, fit a variety of HMMs to their data, and perform diagnostics on these fitted models. The package is presented in \cite{michelot2016}, where its use is illustrated on simulated movement data of wild haggises.

\subsection{HMMs for animal movement}

For full details of the HMM approach in the context of animal movement modelling, the user should refer to the relevant primary publications (see, e.g., \citealp{patterson2009}, \citealp{langrock2012}, \citealp{zucchini2016}). Here, we briefly highlight the essential features of the HMM approach.

The standard HMM approach to model an individual animal's movement considers bivariate time series comprising the step length and the turning angle at each time point (see illustration in Figure \ref{stepsandturns}). The associated locations need to be {\em sampled at equally spaced points in time} (though missing data on an otherwise regular grid can easily be handled) and are assumed to be {\em observed with no or only negligible error}. The \texttt{moveHMM} package is restricted to such discrete time data and involves the assumption that the locations are observed with zero, or at least negligible error.

\begin{figure}[ht]
  \centering
  \includegraphics[width=\textwidth]{step_angle_figure}
  \caption{Illustration of step lengths and turning angles}\label{stepsandturns}
\end{figure}

At each time point, the parameters of the step length distribution (e.g., a gamma distribution) and the parameters of the turning angle distribution (e.g., a von Mises distribution) are determined by an underlying unobserved state. There are finitely many states which provide rough classifications of the movement (e.g.\ more active vs.\ less active), often interpreted as proxies for the animal's behavioural states (e.g.\ transiting vs.\ foraging). The sequence of states is assumed to be generated by a Markov chain, usually with a tendency of remaining in a state for some time before switching to another state. The corresponding state transition probabilities, as well as the parameters characterising the state-dependent distributions, are model parameters to be estimated. The number of states (movement modes) is unknown and has to be specified by the user. Typically with movement data one assumes a low number of states (say $\leq$ 4). For example, an animal may be foraging (low speed, high rates of turning) and transiting (high speeds, low rates of turning). Biologically interesting inference often involves modelling the state transition probabilities as functions of environmental covariates.

For a given set of model parameters, the likelihood of the data can be calculated using a recursive algorithm (the forward algorithm, cf.\ \citealp{patterson2017}), which in a very effective way considers all possible state sequences that might have given rise to the observed time series. This makes numerical maximization of the (log-)likelihood, and hence maximum likelihood estimation, feasible in most cases. Having estimated the model parameters (and examined useful diagnostics of model fit), the user can estimate the most likely sequence of behavioural states. We encourage the user to consult a good primary text such as \cite{zucchini2016} to familiarize themselves with the technical details of hidden Markov models.


\newpage
\section{Common challenges and other resources}

Here we briefly outline some common challenges for the analysis of animal movement data using hidden Markov models, and provide references to resources that discuss them. For a 1-hour introduction to analysing animal movement data with HMMs in R, see \href{https://www.youtube.com/watch?v=WELTpbB5BuU}{this webinar organised by the Ecological Forecasting Initiative}.

\subsection{Starting parameter values}

Model fitting in moveHMM requires the numerical optimisation of the likelihood function, which measures how plausible the observed data are given a set of parameter values. The optimiser requires initial parameter values, from where to start exploring the parameter space (to find the optimum), which is done in \texttt{fitHMM}. The choice of initial parameters should be based on exploratory data visualisation, and can be challenging in many applications, especially for complex models (e.g., with many states). The moveHMM vignette ``A short guide to choosing initial parameter values for the estimation in moveHMM'' discusses this challenge, and offers some guidelines to help with the choice of starting values.

\subsection{Choice of the number of states}

In HMMs, the number of states needs to be chosen prior to analysis, rather than estimated as part of model fitting. This choice can be difficult, in particular because standard model selection criteria (AIC, BIC) tend to select very large numbers of states, which are difficult to interpret. AIC behaves this way because adding states greatly increases the flexibility of the model to capture features in the data, but this usually comes at the cost of interpretability and generalisability. \cite{pohle2017} discussed this problem, and suggested that the number of states should be generally be informed by biological expertise and model checking, rather than by following model selection criteria.

\subsection{Noisy or irregular data}

Key assumptions of HMMs is that the data should be observed with negligible measurement error (to obtain reliable step lengths and turning angles), and at regular time intervals (although missing values on a regular grid can be dealt with). Small violations of these assumptions can be accommodated, e.g., GPS measurement error may be treated as negligible is it is small compared to the scale of movement steps. However, in cases with large errors and/or very irregular locations, the framework cannot be applied directly. One common approach has been to first filter and regularise such data using another model, and then apply the HMM approach to the output trajectories. The first step can for example be implemented with the state-space models implemented in the R packages crawl \citep{johnson2008} and foieGras \citep{jonsen2020}. This approach has the crucial flaw of ignoring the uncertainty associated with the preprocessed track. An alternative approach, suggested by \cite{mcclintock2017}, is to generate many plausible tracks from a state-space model, fit an HMM to each one of them, and then combine the results to obtain estimates that account for the uncertainty in the filtering step. This approach is implemented in the package momentuHMM, which extends moveHMM to more general HMMs \citep{mcclintock2018}.

\newpage
\section{Illustration of moveHMM workflow: elk movement analysis} \label{sec_application}

Before we provide a detailed description of the various features of the \texttt{moveHMM} package in the subsequent section, we illustrate a typical HMM-based analysis of movement data using the main functions of the package, via an example. We use the data from \cite{morales2004}, collected on four elk in Canada.

\subsection{Step 1: movement data preparation}

The input data need to have the correct format for subsequent processing and analysis. The data need to be provided as a \texttt{data.frame}, with two mandatory columns:
\begin{itemize}
\item Easting or longitude (default name: \texttt{x})
\item Northing or latitude (default name: \texttt{y})
\end{itemize}
It is possible to have a column ``ID'', which contains the identifiers of the observed animals. If no column named ``ID'' is provided, all the observations will be considered to belong to a single animal. Additional columns are considered as covariates. Note that, within this package, covariates need to have numerical values (rather than e.g. character values).

\subsubsection{Load and format the tracking data}

The elk data considered in \cite{morales2004} are loaded with the package, as the data frame \texttt{elk\_data}.

The data frame has four columns: ``ID'', ``Easting'', ``Northing'', and ``dist\_water''. The last one is the distance of the animal to water, which for illustration purposes we want to include in the model as a covariate.

<<trackData2,size='small'>>=
head(elk_data)
@

The easting and northing values are expressed in meters in the data, and we decide that we want to deal with distances in kilometers for the step lengths. To achieve this, we transform the coordinates into kilometers.
<<trackData4,size='small'>>=
elk_data$Easting <- elk_data$Easting/1000
elk_data$Northing <- elk_data$Northing/1000
@

\noindent As a result, this is what the data look like:
<<trackData5,size='small'>>=
head(elk_data)
@

\subsubsection{Use \texttt{prepData}}
The data are in the proper format, and can be processed using \texttt{prepData} to compute step lengths and angles. We choose the arguments carefully:
\begin{itemize}
\item \texttt{type} specifies whether the coordinates are easting/northing (\texttt{type="UTM"}) or longitude/latitude (\texttt{type="LL"}) values. The latter is the default, so we need to call the function with the argument \texttt{type="UTM"}, to indicate that UTM coordinates are provided.
\item \texttt{coordNames} are the names of the coordinates in the input data frame. The default is ``x'' and ``y'', so we need to call the function with the argument \texttt{coordNames=c("Easting","Northing")}.
\end{itemize}

\noindent The call to the function for these data is,

<<prepData,size='small'>>=
data <- prepData(elk_data,type="UTM",coordNames=c("Easting","Northing"))
@

The step lengths and turning angles are computed, and the returned object is a data frame.

<<data,size='small'>>=
head(data)
@

Note that the coordinates have been renamed ``x'' and ``y''. This makes the processing of the data simpler.

If the data set contains covariates which have missing values, then those are imputed using the closest non-missing value, by default the previous one if it is available. This is arbitrary and might not be appropriate in all situations.\\

It is also possible to print summary information about the data, using the function \texttt{summary}, e.g.
<<summary,size='small'>>=
summary(data)
@

\subsubsection{Use \texttt{plot.moveData}}
Once the data have been preprocessed, they can be plotted using the generic function \texttt{plot}. This displays maps of the animals' tracks, times series of the steps and angles, and histograms of the steps and angles. A few plotting options are available. These are described in the documentation. To plot all animals' tracks on a single map, we call:

<<plot_moveData1,size='small',eval=FALSE>>=
plot(data,compact=T)
@
<<plot_moveData2,echo=FALSE,results='hide'>>=
pdf(file="plot_moveData.pdf")
plot(data,compact=T,ask=FALSE)
dev.off()
@

The resulting map, and the steps and angles graphs for the first animal, are displayed in Figure \ref{moveData} (we omit the graphs for the three other animals, which are also displayed when using the above command).

\begin{figure}[ht]
  \includegraphics[width=0.49\textwidth,page=1]{plot_moveData}
  \includegraphics[width=0.49\textwidth,page=2]{plot_moveData}
  \caption{Map of the animals' tracks (left) -- each color represents an animal. Time series and histograms of the step lengths and turning angles for one individual, ``elk-115'' (right).}
  \label{moveData}
\end{figure}

The time series of step lengths is one way to check the data for outliers.

\subsection{Step 2: model specification and model fitting}
\subsubsection{fitHMM}
The function \texttt{fitHMM} is used to fit an HMM to the data. Its arguments are described in the documentation. Here are a few choices we make:
\begin{itemize}
\item \texttt{nbStates=2}, i.e. we fit a 2-state HMM to the data;
\item \texttt{beta0=NULL} and \texttt{delta0=NULL}, i.e.\ we use the default values for the initial values \texttt{beta0} and \texttt{delta0};
\item \texttt{formula=$\sim$dist\_water}, i.e. the transition probabilities are functions of the covariate ``dist\_water'';
\item \texttt{stepDist="gamma"}, to model the step lengths with the gamma distribution (note that it is the default, so we do not need to explicitely specify it);
\item \texttt{angleDist="vm"}, to model the turning angles with the von Mises distribution (default);
\item \texttt{angleMean=NULL}, because we want to estimate the mean of the angle distribution (default);
\item \texttt{stationary=FALSE}, as due to the covariates the process is not stationary (default).
\end{itemize}

We also need to specify initial values for the parameters of the state-dependent distributions, to be used by the optimization function. Note that this choice is crucial, and that \emph{the algorithm might not find the global optimum of the likelihood function if the initial parameters are poorly chosen}. The initial parameters should be specified in two vectors, \texttt{stepPar0} (for the step distribution) and \texttt{anglePar0} (for the angle distribution). The necessary parameters of each distribution are detailed in Section \ref{sec_distrib}.

Zero-inflation (as described in Section \ref{sec:zeroinflation}) must be included in the step length distribution if some steps are of length exactly zero (which is the case for the elk data). To do so, another parameter is added to the step distribution: its mass on zero.

Here, the initial values are chosen such that they correspond to the commonly observed pattern in two-state HMMs for animal movement data, with state 1 involving relatively short steps and many turnings (hence the choice of a small initial value for the mean of the gamma step length distribution and an initial value of pi for the mean turning angle) and state 2 involving longer steps and fewer turnings (hence the choice of a larger initial value for the mean of the gamma step length distribution and an initial value of 0 for the mean turning angle).

For numerical stability, we decide to standardize the covariate values before fitting the model (explanations in Section \ref{sec:instability}).

<<fitHMM,size='small',eval=FALSE>>=
## standardize covariate values
data$dist_water <-
    (data$dist_water-mean(data$dist_water))/sd(data$dist_water)

## initial parameters for gamma and von Mises distributions
mu0 <- c(0.1,1) # step mean (two parameters: one for each state)
sigma0 <- c(0.1,1) # step SD
zeromass0 <- c(0.1,0.05) # step zero-mass
stepPar0 <- c(mu0,sigma0,zeromass0)
angleMean0 <- c(pi,0) # angle mean
kappa0 <- c(1,1) # angle concentration
anglePar0 <- c(angleMean0,kappa0)

## call to fitting function
m <- fitHMM(data=data,nbStates=2,stepPar0=stepPar0,
            anglePar0=anglePar0,formula=~dist_water)
@

<<savemodels,echo=FALSE,eval=FALSE>>=
## code to fit and save the 2 and 3-state models loaded below
library(moveHMM)
trackData <- read.table("http://www.esapubs.org/archive/ecol/E085/072/elk_data.txt",
                        sep="\t",header=TRUE)[(1:735),c(1,2,3,7)]
colnames(trackData)[1] <- "ID"
colnames(trackData)[4] <- "dist_water"
trackData$Easting <- trackData$Easting/1000
trackData$Northing <- trackData$Northing/1000

data <- prepData(trackData,type="UTM",coordNames=c("Easting","Northing"))
data$dist_water <- (data$dist_water-mean(data$dist_water))/sd(data$dist_water)

## Fit a 2-state HMM
mu0 <- c(0.1,1)
sigma0 <- c(0.1,1)
zeromass0 <- c(0.05,0.01)
stepPar0 <- c(mu0,sigma0,zeromass0)
angleMean0 <- c(pi,0)
kappa0 <- c(1,1)
anglePar0 <- c(angleMean0,kappa0)

m <- fitHMM(data=data,nbStates=2,stepPar0=stepPar0,anglePar0=anglePar0,formula=~dist_water)

## Fit a 3-state HMM
mu0 <- c(0.1,0.5,3)
sigma0 <- c(0.05,0.5,1)
zeromass0 <- c(0.05,0.01,0.01)
stepPar0 <- c(mu0,sigma0,zeromass0)
angleMean0 <- c(pi,pi,0)
kappa0 <- c(1,1,1)
anglePar0 <- c(angleMean0,kappa0)

m3 <- fitHMM(data=data,nbStates=3,stepPar0=stepPar0,anglePar0=anglePar0,
             formula=~dist_water)

save(m,m3,file="models.RData")
@

<<fitHMM2,echo=FALSE,cache=TRUE>>=
load("models.RData")
@

The returned object, \texttt{m}, is of the class \texttt{moveHMM}. It can be printed in order to obtain the maximum likelihood estimates of all model parameters.

<<print_moveHMM,size='small'>>=
m
@

The argument \texttt{knownStates} of the function \texttt{fitHMM} makes it possible to set some values of the state process to fixed values, prior to fitting the model. This can be useful e.g.\ when the animal's behaviour is known for some time points, but we discourage users to take advantage of this option to make the states match their expectations (instead of letting the data speak for themselves).

\subsubsection{Dealing with numerical instability} \label{sec:instability}
As mentioned above, the numerical maximization routine might not identify the global maximum of the likelihood function, or even fail to converge altogether, for poorly chosen initial values of the parameters. In such a case, the optimization routine \texttt{nlm} might produce an error such as:
<<nlm-error, eval = FALSE, highlight = FALSE>>=
  Error in nlm(nLogLike, wpar, nbStates, bounds, parSize, data, stepDist,  :
  non-finite value supplied by 'nlm'
@

The best way to deal with such numerical problems is to test different sets of initial values, possibly chosen randomly. By comparing the resulting estimates for the different initial values used, one usually obtains a good feeling for any potential sensitivity of the numerical search to its chosen starting point. Note, however, that in any case there will usually be no certainty that the global maximum of the likelihood, i.e.\ the maximum likelihood estimate, has been identified.

During preliminary tests on the elk data, we noticed that, in this example, the numerical search is highly sensitive to the choice of the initial parameters \texttt{beta0}. This is due to the high values of the covariate: a small change in the associated regression coefficients can make a big difference in the likelihood function. In such cases, it is advisable to standardize the covariate values before fitting the model, for example by calling:
<<normalize,size='small',eval=FALSE>>=
data$dist_water <-
    (data$dist_water-mean(data$dist_water))/sd(data$dist_water)
@

This allows for greater numerical stability, with the convergence of the fitting function depending less on the choice of initial values. The value of the maximum log-likelihood is not affected by the standardization of the covariate values, only the maximum likelihood estimate of \texttt{beta} is.

\subsubsection{Confidence intervals}
Confidence intervals for the model parameters can be computed with the function \texttt{CI}, passing as an argument the object created by \texttt{fitHMM}. It is possible to give the significance level of the desired confidence interval as an argument, e.g.\ 0.99 for 99\% confidence intervals. By default, 95\% confidence intervals are returned.

Below we show the 95\% confidence intervals for the parameters of the 2-state model fitted to the elk data. \texttt{CI(m)\$stepPar} corresponds to the bounds of the confidence intervals for the step parameters (\texttt{CI(m)\$stepPar\$lower} and \texttt{CI(m)\$stepPar\$upper}). \texttt{CI(m)\$anglePar} and \texttt{CI(m)\$beta} are respectively the bounds of the confidence intervals for the angle parameters and the regression coefficients of the transition probabilities.

<<conf,size='small',warning=FALSE>>=
CI(m)
@

\noindent Note that a warning message is also output:
<<ci-warn, eval = FALSE, highlight = FALSE>>=
  Warning message:
  In CI(m) :
  Some of the parameter estimates seem to lie close to
  the boundaries of their parameter space. The associated
  CIs are probably unreliable (or might not be computable).
@

Here, this message refers to the zero-inflation parameter in the second state. Its estimate is very close to zero, the inferior boundary of its range (this parameter is in the interval [0,1]), and this causes the corresponding confidence interval to be unreliable. The function can sometimes fail to compute such a confidence interval, and returns \texttt{NA} instead, as in this example.

\subsection{Step 3: further inference and visualisation tools}
Various options are available for the class \texttt{moveHMM}, and here we explain how to use them in the elk example.

\subsubsection{Plot the model}
The fitted model can be plotted, using the generic function \texttt{plot}. A few graphical options are available and listed in the documentation. Here, we call:
<<plot_moveHMM,size='small',eval=FALSE>>=
plot(m, plotCI=TRUE)
@
<<plot_moveHMM2,echo=FALSE,results='hide'>>=
pdf(file="plot_moveHMM.pdf")
plot(m,ask=FALSE,plotCI=TRUE)
dev.off()
@

\noindent This outputs:
\begin{itemize}
\item an histogram of step lengths of all animals, with the fitted state-dependent densities,
\item an histogram of turning angles of all animals, with the fitted state-dependent densities,
\item plots of the transition probabilities as functions of the covariate considered,
\item a map of each animal's track, colored by states.
\end{itemize}

Figure \ref{moveHMM} displays those plots, but showing only one of the plotted maps, namely the one corresponding to the first animal, ``elk-115''. The state-dependent densities are weighted by the relative frequency of each state in the most probable state sequence (decoded with the Viterbi algorithm, see Section \ref{sec_statedec}). For example, if according to the most probable state sequence, one third of the observations is allocated to the first state, and two thirds to the second state, the plots of the densities in the first state are weighted with a factor 1/3, and in the second state with a factor 2/3.

\begin{figure}[!htb]
  \includegraphics[width=0.49\textwidth,page=1]{plot_moveHMM}
  \includegraphics[width=0.49\textwidth,page=2]{plot_moveHMM} \\
  \includegraphics[width=0.49\textwidth,page=3]{plot_moveHMM}
  \includegraphics[width=0.49\textwidth,page=4]{plot_moveHMM}
  \caption{Output of \texttt{plot.moveHMM}. Histogram of step lengths with fitted distributions (top-left), histogram of turning angles with fitted distributions (top-right), transition probabilities as functions of ``dist\_water'' with 95\% confidence intervals (bottom-left), and map of decoded track for the first animal (bottom-right).}
  \label{moveHMM}
\end{figure}

The first state (in orange on the plots) corresponds to short steps, and angles centered around $\pi$, and the second state (in blue on the plots) corresponds to longer steps, and angles centered around $0$.

The plots of the transition probabilities as function of the covariate indicate that animals tend to switch from the second state to the first state when they are far from water, whereas they stay in the second state when closer to water.

\subsubsection{State decoding} \label{sec_statedec}
Two functions can be used to decode the state process.

\paragraph{Viterbi algorithm} To globally decode the state process, the Viterbi algorithm is implemented in the function \texttt{viterbi}. This function outputs the most likely sequence of states to have generated the observation, under the fitted model. Below are the most probable states for the first 25 observations of the first individual:

<<states,size='small',cache=TRUE>>=
states <- viterbi(m)
states[1:25]
@

\paragraph{State probabilities} To get more accurate information on the state process, it is possible to compute the state probabilities for each observation, using \texttt{stateProbs}. This returns a matrix with as many columns as there are states in the model, and as many rows as there are observations (stacking all animals' observations). The elements of the matrix are defined as
\begin{center}
  \texttt{stateProbs(m)[t,j]} = $\Pr(S_t=j)$
\end{center}
where $\{S_t\}$ is the state process.\\

\noindent For example:
<<stateProbs,size='small',cache=TRUE>>=
sp <- stateProbs(m)
head(sp)
@

The state with highest probability according to \texttt{stateProbs} might not be the same as the state in the most probable sequence returned by the Viterbi algorithm. This is because the Viterbi algorithm performs ``global decoding'', whereas the state probabilities are ``local decoding''. For more details, see \cite{zucchini2016}.\\

The function \texttt{plotStates} can be used to visualize the results of \texttt{viterbi} and \texttt{stateProbs}. Figure \ref{fig:plotStates2} shows the plots of the most likely state sequence decoded by the Viterbi algorithm, as well as both columns of the matrix of state probabilities, for one individual, ``elk-115''. It was obtained with the following command:
<<plotStates,size='small',eval=FALSE>>=
plotStates(m,animals="elk-115")
@

<<plotStates2, fig.width='4in', fig.height='4in', out.width='4in', out.height='4in', fig.pos='ht', fig.align='center', fig.cap="Decoded states sequence (top row), and state probabilities of observations (middle and bottom rows) for elk-115", cache=TRUE, echo=FALSE, results='hide'>>=
plotStates(m,animals="elk-115",ask=FALSE)
@

\subsubsection{Stationary state probabilities}
For a transition probability matrix $\boldsymbol{\Gamma}$, the stationary distribution is the vector $\boldsymbol{\delta}$ that solves the equation $\boldsymbol\delta=\boldsymbol\delta\boldsymbol\Gamma$, subject to $\sum_{i=1}^N \delta_i = 1$ (see Section \ref{sec:stationarity} for more information). It reflects the long-term proportion of time the model spends in each state.

When the transition probabilities are time-varying (i.e.\ functions of covariates), the stationary distribution does not exist. However, for fixed values of the covariates, we can obtain one transition probability matrix, and thus one stationary distribution. The function \texttt{plotStationary} does this over a grid of values of each covariate, and plots the resulting stationary state probabilites. They can be interpreted as the long-term probabilities of being in each state at different values of the covariate.

<<plotStationary, fig.width='4in', fig.height='4in', out.width='4in', out.height='4in', fig.pos='htbp', fig.align='center', fig.cap='Output of \\texttt{plotStationary}. Stationary state probabilities, as functions of the distance to water, with 95\\% confidence intervals.'>>=
plotStationary(m, plotCI=TRUE)
@

\subsubsection{Model selection with AIC}
The generic method \texttt{AIC} is available to compare \texttt{moveHMM} models. For example, we now fit a 3-state HMM to the data, and want to compare the AICs of the 2-state and 3-state models.

<<AIC,size='small',eval=FALSE>>=
# initial parameters
mu0 <- c(0.1,0.5,3)
sigma0 <- c(0.05,0.5,1)
zeromass0 <- c(0.05,0.0001,0.0001)
stepPar0 <- c(mu0,sigma0,zeromass0)
angleMean0 <- c(pi,pi,0)
kappa0 <- c(1,1,1)
anglePar0 <- c(angleMean0,kappa0)

# fit the 3-state model
m3 <- fitHMM(data=data,nbStates=3,stepPar0=stepPar0,
             anglePar0=anglePar0,formula=~dist_water)
@

\noindent And, to compare them:
<<AIC2,size='small'>>=
AIC(m,m3)
@

In terms of AIC, the 3-state model is favoured over the 2-state model in this example.

\subsubsection{Model checking}
The pseudo-residuals (a.k.a.\ quantile residuals) of the model can be computed with \texttt{pseudoRes}. These follow a standard normal distribution if the fitted model is the true data-generating process. In other words, a deviation from normality indicates a lack of fit. For more theoretical background on pseudo-residuals, see \cite{zucchini2016}. The pseudo-residuals of the 2-state model fitted to the elk data are displayed in Figure \ref{fig:plotPR}. They can be computed and plotted with the following commands.
<<pseudoRes,size='small',eval=FALSE>>=
# compute the pseudo-residuals
pr <- pseudoRes(m)

# time series, qq-plots, and ACF of the pseudo-residuals
plotPR(m)
@

<<plotPR, fig.width='4in', fig.height='5in', out.width='4in', out.height='5in', fig.pos='ht!', fig.align='center', fig.cap="Time series, qq-plots, and autocorrelation functions of the pseudo-residuals of the 2-state model.", cache=TRUE, echo=FALSE, results='hide', message=FALSE>>=
plotPR(m)
@

If some steps are of length zero (i.e. if the step distribution is zero-inflated), the corresponding pseudo-residuals are plotted as segments on the qq-plot. It is the case for the smallest step pseudo-residual located in the bottom-left corner of the step qq-plot in Figure \ref{fig:plotPR}. The pseudo-residuals of discrete data are defined as segments, and in this case, the segments start in $-\infty$.

\subsection{Dealing with one-dimensional data}
It is sometimes of interest to model one-dimensional movement data, like e.g. dive data. This can be done in moveHMM by setting all values of the second coordinate to zero in the data, and use the option \texttt{angleDist="none"}.

In the case where one-dimensional data are provided, the plotting functions for the data and for the model will output plots of the first coordinate as a function of time, instead of a map of the track.

\newpage
\section{Package features}
In this section, we describe the global structure of the package, and then describe in more detail the main functions required to fit an HMM to movement data.

The package is articulated in terms of two S3 classes: \texttt{moveData} and \texttt{moveHMM}. The first extends the native R data frame, essentially gathering time series of the movement metrics of interest, namely the step lengths and turning angles, as well as the covariate values. A \texttt{moveHMM} object is a fitted model, which stores in particular the values of the MLE of the parameters.

To create a \texttt{moveData} object, the function \texttt{prepData} is called on the tracking data (track points coordinates). Then, the function \texttt{fitHMM} is called on the \texttt{moveData}, and returns a \texttt{moveHMM}.

Both classes can be used through their methods (e.g. \texttt{plot.moveData}, \texttt{AIC.moveHMM}), and a variety of other functions can be called on \texttt{moveHMM} objects. All functions are described in more detail in Section \ref{sec_main_functions} and their use is explained on an example in Section \ref{sec_application}. Figure \ref{struct} illustrates the links between the main components of the package.

\begin{figure}[ht]
  \centering
  \includegraphics[width=0.5\textwidth]{struct}
  \caption{Structure of the main components of the package. The blue boxes are S3 classes, and the green boxes are functions. The arrows indicate input and output of data.}
  \label{struct}
\end{figure}

Note we will occasionally omit the class to which a method belongs, if the context makes it clear. Besides, as illustrated in the example in Section \ref{sec_application}, it is not necessary to specify the class when calling the R function (e.g. calling \texttt{plot} on a \texttt{moveHMM} object automatically refers to \texttt{plot.moveHMM}).

\subsection{Model options}

\subsubsection{Distributions} \label{sec_distrib}
Here is the list of distributions included, with the names they have in the package.
\begin{itemize}
\item Step length: gamma (``gamma''), Weibull (``weibull''), exponential (``exp'') and log-normal (``lnorm'').
\item Turning angle: von Mises (``vm'') and wrapped-Cauchy (``wrpcauchy''). It is also possible to specify \texttt{angleDist="none"}, if the angles are not modelled.
\end{itemize}

The parameters depend on the distribution used. The gamma distribution expects the mean and standard deviation, and all other distributions expect the same parameters as the corresponding R density function, i.e.
\begin{center}
  \begin{tabular}{lcll}
    \toprule
    Distribution & & Parameters & \\
    \midrule
    gamma & & mean $\in (0,+\infty)$ & standard deviation $\in (0,+\infty)$\\
    Weibull & & shape $\in (0,+\infty) $ & scale $\in (0,+\infty)$ \\
    log-normal & & location $\in \mathbb{R}$ & scale $\in (0,+\infty)$\\
    exponential & & rate $\in (0,+\infty)$ & \\
    \midrule
    von Mises & & mean $\in (-\pi,\pi]$ & concentration $\in (0,+\infty)$ \\
    wrapped Cauchy & & mean $\in (-\pi,\pi]$ & concentration $\in (0,1)$\\
    \bottomrule
  \end{tabular}
\end{center}

For the gamma distribution, the link between the mean/standard deviation (expected by \texttt{fitHMM}) and shape/rate (expected by \texttt{dgamma}) is given by:
\begin{equation*}
\text{shape} = \dfrac{\text{mean}^2}{\text{SD}^2},
\qquad
\text{rate} = \dfrac{\text{mean}}{\text{SD}^2}
\end{equation*}


\subsubsection{Zero-inflation} \label{sec:zeroinflation}

If some steps are exactly equal to zero, then strictly positive distributions such as the gamma are inadequate. In such cases, zero-inflated distributions can be considered. A zero-inflated step length distribution simply assumes that there is a probability $z$ of observing a 0 and a probability of $1-z$ of observing a positive value distributed according to a standard positive distribution (e.g.\ a gamma). Within the package \texttt{moveHMM}, zero-inflation will automatically be included if there are zero steps. In that case the (state-dependent) values $z$ will be estimated, with the remaining positive distribution, weighted by $1-z$, specified as one of the available standard step length distributions, listed in Section \ref{sec_distrib}.

\subsubsection{Covariates}

In practice it is often of interest to model the state transition probabilities as functions of time-varying covariates. This can be done by assuming the Markov chain to be time-varying, with transition probability matrix $\boldsymbol{\Gamma}^{(t)}=\left( \gamma_{ij}^{(t)} \right)$, linking the transition probabilities to the covariate(s) via the multinomial logit link. In the general case of $N$ states,
\begin{equation*}\label{covtpm} \gamma_{ij}^{(t)}=\Pr \bigl(S_{t}=j\vert S_{t-1}=i \bigr)=\frac{\exp (\eta_{ij}) }{\sum_{k=1}^N\exp (\eta_{ik})},
\end{equation*}
where
$$ \eta_{ij} =
\begin{cases}
  \beta_{0}^{(ij)} + \sum_{l=1}^p \beta_{l}^{(ij)} w_{lt} & \text{ if } i\neq j, \\
  0 & \text{otherwise,}
\end{cases}
$$
for $i,j=1,\ldots,N$. Here $\{ S_t \}$ is the state process, $w_{lt}$ is the $l$-th covariate at time $t$ and $p$ is the number of covariates considered. The $\beta$ parameters directly affect the off-diagonal elements in $\boldsymbol{\Gamma}^{(t)}$ --- with an increase in the linear predictor $\eta_{ij}$ resulting in an increase in $\gamma_{ij}^{(t)}$ --- and hence also the diagonal entries due to the row constraints (with the entries in each row summing to one). Note in particular that we have to fix $\eta_{ii}=0$ for all $i$ since otherwise the model would be overparameterized (not identifiable).

Within \texttt{moveHMM}, the $\beta$ coefficients for the off-diagonal transition probabilities are stored in an $(p+1) \times (N\cdot(N-1))$ matrix. For example, for a 3-state HMM with two covariates, the matrix \texttt{beta} is
\begin{equation*}
  \begin{pmatrix}
    \beta_0^{(12)} & \beta_0^{(13)} & \beta_0^{(21)} & \beta_0^{(23)} & \beta_0^{(31)} & \beta_0^{(32)}\\
    \beta_1^{(12)} & \beta_1^{(13)} & \beta_1^{(21)} & \beta_1^{(23)} & \beta_1^{(31)} & \beta_1^{(32)}\\
    \beta_2^{(12)} & \beta_2^{(13)} & \beta_2^{(21)} & \beta_2^{(23)} & \beta_2^{(31)} & \beta_2^{(32)}\\
  \end{pmatrix}
\end{equation*}
Here the first row corresponds to the intercept terms and the other two rows to the slope coefficients associated with the two covariates. There are as many columns as there are off-diagonal entries in the $3 \times 3$ transition probability matrix, and that matrix is filled row-wise (i.e.\ column 1 in \texttt{beta} is linked to $\gamma_{12}^{(t)}$, column 2 is linked to $\gamma_{13}^{(t)}$, column 3 is linked to $\gamma_{21}^{(t)}$, etc.).\\

In practice, many movement models involve only two states, in which case the above equations boil down to
\begin{align*}
  \boldsymbol{\Gamma}^{(t)}  & =
                               \begin{pmatrix}
                                 \dfrac{1}{1+\exp\left(\beta_{0}^{(12)} + \sum_{l=1}^p \beta_{l}^{(12)} w_{lt}\right)} &
                                 \dfrac{\exp\left(\beta_{0}^{(12)} + \sum_{l=1}^p \beta_{l}^{(12)} w_{lt}\right)}{1+\exp\left(\beta_{0}^{(12)} + \sum_{l=1}^p \beta_{l}^{(12)} w_{lt}\right)} \\[20pt]
                                 \dfrac{\exp\left(\beta_{0}^{(21)} + \sum_{l=1}^p \beta_{l}^{(21)} w_{lt}\right)}{1+\exp\left(\beta_{0}^{(21)} + \sum_{l=1}^p \beta_{l}^{(21)} w_{lt}\right)} &
                                 \dfrac{1}{1+\exp\left(\beta_{0}^{(21)} + \sum_{l=1}^p \beta_{l}^{(21)} w_{lt}\right)}
                               \end{pmatrix} \\[20pt]
  = &
      \begin{pmatrix}
	1-\text{logit}^{-1} \left( \beta_{0}^{(12)} + \sum_{l=1}^p \beta_{l}^{(12)} w_{lt} \right) &
        \text{logit}^{-1} \left( \beta_{0}^{(12)} + \sum_{l=1}^p \beta_{l}^{(12)} w_{lt} \right) \\[10pt]
	\text{logit}^{-1} \left( \beta_{0}^{(21)} + \sum_{l=1}^p \beta_{l}^{(21)} w_{lt} \right)  &
        1- \text{logit}^{-1} \left( \beta_{0}^{(21)} + \sum_{l=1}^p \beta_{l}^{(21)} w_{lt} \right)
      \end{pmatrix}
\end{align*}
The inverse logit link function is applied in order to map the real-valued predictor onto the interval $[0,1]$ (with the above multinomial logit link representing a generalization of this approach to the case of $N>2$ states). In the case of two states, the matrix $\beta$ in \texttt{moveHMM} is structured as follows:
\begin{equation*}
  \begin{pmatrix}
    \beta_0^{(12)} & \beta_0^{(21)} \\
    \beta_1^{(12)} & \beta_1^{(21)} \\
    \vdots         &   \vdots       \\
    \beta_p^{(12)} & \beta_p^{(21)}
  \end{pmatrix}.
\end{equation*}


\subsubsection{Stationarity}
\label{sec:stationarity}

The function \texttt{fitHMM} includes the option of fitting a stationary model (using the option \texttt{stationary=TRUE}, with the default being \texttt{stationary=FALSE}). This is only possible if no covariates are incorporated into the model. (Otherwise the transition probabilities will be time-dependent, such that the Markov chain is non-homogeneous and in particular cannot be stationary.) When no covariates are considered and the option \texttt{stationary=TRUE} is selected, then the initial state distribution of the Markov chain will automatically be chosen as the stationary  distribution (a.k.a.\ steady-state distribution) implied by the estimated transition probability matrix (as opposed to being estimated when \texttt{stationary=FALSE}). This stationary distribution is the vector $\boldsymbol{\delta}$ that solves the equation $\boldsymbol{\delta}=\boldsymbol{\delta}\boldsymbol{\Gamma}$ subject to $\sum_{i=1}^N \delta_i=1$. In practice, this solution almost always exists.

\subsection{Main functions} \label{sec_main_functions}

\subsubsection{prepData}
Tracking data usually consist of time series of either easting-northing coordinates or longitude-latitude values. However, with the HMM approach the derived quantities step lengths and turning angles are modelled.

The function \texttt{prepData} computes the steps and angles from the coordinates. As input, this function takes an R data frame with columns ``x'' (either easting or longitude) and ``y'' (either northing or latitude). If the names of the coordinates columns are not ``x'' and ``y'', then the argument \texttt{coordNames} should specify them. If several animals were observed, there should also be a column ``ID'' which identifies the animal being observed. If there is no ``ID'' column, all observations will be considered to be associated with a single animal. All additional columns are considered as covariates. In addition to the data frame, \texttt{prepData} takes an argument \texttt{type}, which can either be ``LL'' (longitude-latitude, the default) or ``UTM''. The former indicates that the coordinates are longitude-latitude values, and the latter that they are easting-northing values.

To compute the step lengths, \texttt{prepData} calls the function \texttt{spDistN1} from the package \texttt{sp}. The step lengths are in the unit of the input if easting/northing are provided, and in kilometres if longitude/latitude are provided.

\texttt{prepData} outputs a data frame, with the same columns as the input, plus columns ``step'' and ``angle''. This object is of the class \texttt{moveData}, and can be plotted using the generic function \texttt{plot}.

\subsubsection{fitHMM}
Using the function \texttt{fitHMM}, an HMM can be fitted to an object of class \texttt{moveData}, via numerical maximum likelihood. The list of the arguments of \texttt{fitHMM} is detailed in the documentation. The maximum likelihood estimation is carried out using the R function \texttt{nlm}.

This function outputs a list of information about the model. Most elements of that list are only meant to be used by the \texttt{moveHMM} functions (see Sections \ref{sec_methods} and \ref{sec_functions}), but a few can be informative \textit{per se}:
\begin{itemize}
\item \texttt{mle} contains the estimates of the parameters of the model;
\item \texttt{mod} contains the output of the optimization function \texttt{nlm}, including \texttt{mod\$minimum} (minimum of the negative log-likelihood) and \texttt{mod\$hessian}, the Hessian of the negative log-likelihood function at its minimum.
\end{itemize}

\subsubsection{Generic methods} \label{sec_methods}
Methods (i.e.\ class functions) are available for both \texttt{moveData} and \texttt{moveHMM} objects, to operate on them. For details on the options see the documentation, and for an example of their use, see Section \ref{sec_application}.

\begin{itemize}
\item \texttt{plot.moveData} plots a few graphs to illustrate the data: a map of each animal's track, time series of the steps and angles, histograms of the steps and angles.
\item \texttt{summary.moveData} outputs some summary information about a \texttt{moveData} object: the number of animals, the number of observations for each animal, and quantiles of the covariate values.
\item \texttt{plot.moveHMM} plots a few graphs to illustrate the fitted model: a map of each animal's track, colored by states, plots of the estimated density functions, plots of the transition probabilities as functions of the covariates.
\item \texttt{print.moveHMM} prints the value of the maximum log-likelihood, and the maximum likelihood estimates of the parameters of the model.
\item \texttt{AIC.moveHMM} returns the AIC of one or several fitted models.
\end{itemize}

\subsubsection{Other operations in \texttt{moveHMM}} \label{sec_functions}
Other functions can be called on a \texttt{moveHMM} object, for further analysis.
\begin{itemize}
\item \texttt{CI} computes confidence intervals for the step length distribution parameters, for the turning angle distribution parameters, and for the regression coefficients of the transition probabilities.
\item \texttt{pseudoRes} computes the pseudo-residuals of the model. These can be used to assess the goodness of fit. If the model is the true data-generating process, then the pseudo-residuals follow a standard normal distribution \citep{zucchini2016}.
\item \texttt{stateProbs} computes the probabilities of the underlying Markov chain being in the different states, at each observation, under the fitted model.
\item \texttt{viterbi} computes the sequence of most probable states, under the fitted model, using the Viterbi algorithm \citep{zucchini2016}.
\item \texttt{plotStates} plots the most probable state sequence (as decoded with \texttt{viterbi}), and the state probabilities (as computed with \texttt{stateProbs}).
\item \texttt{plotPR} plots time series, qq-plots, and the sample autocorrelation (ACF) functions of the pseudo-residuals of the fitted model \citep{zucchini2016}. The qq-plots can be used to visually assess whether or not the pseudo-residuals are standard normally distributed. The points in the qq-plot will be close to the straight line if the model fits the data well. If the sample ACFs display a residual autocorrelation, then this is an indication that the model might not have captured all relevant correlation structure in the data.
\item \texttt{plotStationary} creates plots of the stationary state probabilities as functions of covariates, possibly with confidence intervals.
\end{itemize}

\subsubsection{simData}
The function \texttt{simData} simulates movement data from an HMM, given its parameters. The returned object is of the class \texttt{moveData}, and can be visualized using \texttt{plot}, or fitted using \texttt{fitHMM}. The arguments of \texttt{simData} are detailed in the documentation.

It is possible to call \texttt{simData} on a fitted model directly, to simulate data from it. This can be used to assess the fit, by checking that the simulated data display the same features as the real data.

\bibliographystyle{apalike}
\bibliography{refs}

\end{document}