--- title: "Anomaly Detection" output: rmarkdown::html_vignette: toc: true toc_depth: 2 vignette: > %\VignetteIndexEntry{Anomaly Detection} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} library(httptest2) .mockPaths("../tests/mocks") start_vignette(dir = "../tests/mocks") original_options <- options("NIXTLA_API_KEY"="dummy_api_key", digits=7) knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4 ) ``` ```{r} library(nixtlar) ``` ## 1. Anomaly detection Anomaly detection plays a crucial role in time series analysis and forecasting. Anomalies, also known as outliers, are unusual observations that don't follow the expected time series patterns. They can be caused by a variety of factors, including errors in the data collection process, unexpected events, or sudden changes in the patterns of the time series. Anomalies can provide critical information about a system, like a potential problem or malfunction. After identifying them, it is important to understand what caused them, and then decide whether to remove, replace, or keep them. `TimeGPT` has a method for detecting anomalies, and users can call it from `nixtlar`. This vignette will explain how to do this. It assumes you have already set up your API key. If you haven't done this, please read the [Get Started](https://nixtla.github.io/nixtlar/articles/get-started.html) vignette first. ## 2. Load data For this vignette, we'll use the electricity consumption dataset that is included in `nixtlar`, which contains the hourly prices of five different electricity markets. ```{r} df <- nixtlar::electricity head(df) ``` ## 3. Detect Anomalies To detect anomalies, use `nixtlar::nixtla_client_detect_anomalies`, which requires the following parameter: - **df**: The time series data, provided as a data frame, tibble, or tsibble. It must include at least two columns: one for the timestamps and one for the observations. The default names for these columns are `ds` and `y`. If your column names are different, specify them with `time_col` and `target_col`, respectively. If you are working with multiple series, you must also include a column with unique identifiers. The default name for this column is `unique_id`; if different, specify it with `id_col`. ```{r} nixtla_client_anomalies <- nixtlar::nixtla_client_detect_anomalies(df) head(nixtla_client_anomalies) ``` The `anomaly_detection` method from `TimeGPT` evaluates each observation and uses a prediction interval to determine if it is an anomaly or not. By default, `nixtlar::nixtla_client_detect_anomalies` uses a 99% prediction interval. Observations that fall outside this interval will be considered anomalies and will have a value of `True` in the `anomaly` column (`False` otherwise). To change the prediction interval, for example to 95%, use the argument `level=c(95)`. Keep in mind that multiple levels are not allowed, so when given several values, `nixtlar::nixtla_client_detect_anomalies` will use the maximum. ## 4. Plot anomalies `nixtlar` includes a function to plot the historical data and any output from `nixtlar::nixtla_client_forecast`, `nixtlar::nixtla_client_historic`, `nixtlar::nixtla_client_detect_anomalies` and `nixtlar::nixtla_client_cross_validation`. If you have long series, you can use `max_insample_length` to only plot the last N historical values (the forecast will always be plotted in full). When using `nixtlar::nixtla_client_plot` with the output of `nixtlar::nixtla_client_detect_anomalies`, set `plot_anomalies=TRUE` to plot the anomalies. ```{r} nixtlar::nixtla_client_plot(df, nixtla_client_anomalies, plot_anomalies = TRUE) ``` ```{r, include=FALSE} options(original_options) end_vignette() ```