---
title: "Anomaly Detection"
output: 
  rmarkdown::html_vignette:
    toc: true 
    toc_depth: 2
vignette: >
  %\VignetteIndexEntry{Anomaly Detection}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
```{r setup, include=FALSE}
library(httptest2)
.mockPaths("../tests/mocks")
start_vignette(dir = "../tests/mocks")

original_options <- options("NIXTLA_API_KEY"="dummy_api_key", digits=7)

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>", 
  fig.width = 7, 
  fig.height = 4
)
```

```{r}
library(nixtlar)
```

## 1. Anomaly detection
Anomaly detection plays a crucial role in time series analysis and forecasting. Anomalies, also known as outliers, are unusual observations that don't follow the expected time series patterns. They can be caused by a variety of factors, including errors in the data collection process, unexpected events, or sudden changes in the patterns of the time series. Anomalies can provide critical information about a system, like a potential problem or malfunction. After identifying them, it is important to understand what caused them, and then decide whether to remove, replace, or keep them.

`TimeGPT` has a method for detecting anomalies, and users can call it from `nixtlar`. This vignette will explain how to do this. It assumes you have already set up your API key. If you haven't done this, please read the [Get Started](https://nixtla.github.io/nixtlar/articles/get-started.html) vignette first. 

## 2. Load data 
For this vignette, we'll use the electricity consumption dataset that is included in `nixtlar`, which contains the hourly prices of five different electricity markets. 

```{r}
df <- nixtlar::electricity
head(df)
```

## 3. Detect Anomalies

To detect anomalies, use `nixtlar::nixtla_client_detect_anomalies`, which requires the following parameter:

- **df**: The time series data, provided as a data frame, tibble, or tsibble. It must include at least two columns: one for the timestamps and one for the observations. The default names for these columns are `ds` and `y`. If your column names are different, specify them with `time_col` and `target_col`, respectively. If you are working with multiple series, you must also include a column with unique identifiers. The default name for this column is `unique_id`; if different, specify it with `id_col`.

```{r}
nixtla_client_anomalies <- nixtlar::nixtla_client_detect_anomalies(df) 
head(nixtla_client_anomalies)
```

The `anomaly_detection` method from `TimeGPT` evaluates each observation and uses a prediction interval to determine if it is an anomaly or not. By default, `nixtlar::nixtla_client_detect_anomalies` uses a 99% prediction interval. Observations that fall outside this interval will be considered anomalies and will have a value of `True` in the `anomaly` column (`False` otherwise). To change the prediction interval, for example to 95%, use the argument `level=c(95)`. Keep in mind that multiple levels are not allowed, so when given several values, `nixtlar::nixtla_client_detect_anomalies` will use the maximum. 

## 4. Plot anomalies 
`nixtlar` includes a function to plot the historical data and any output from `nixtlar::nixtla_client_forecast`, `nixtlar::nixtla_client_historic`, `nixtlar::nixtla_client_detect_anomalies` and `nixtlar::nixtla_client_cross_validation`. If you have long series, you can use `max_insample_length` to only plot the last N historical values (the forecast will always be plotted in full). 

When using `nixtlar::nixtla_client_plot` with the output of `nixtlar::nixtla_client_detect_anomalies`, set `plot_anomalies=TRUE` to plot the anomalies. 

```{r}
nixtlar::nixtla_client_plot(df, nixtla_client_anomalies, plot_anomalies = TRUE)
```

```{r, include=FALSE}
options(original_options)
end_vignette()
```