Title: | Known Sub-Sequence Algorithm |
Version: | 0.0.1 |
Maintainer: | Iván Felipe Benavides <pipeben@gmail.com> |
Description: | Implements the Known Sub-Sequence Algorithm <doi:10.1016/j.aaf.2021.12.013>, which helps to automatically identify and validate the best method for missing data imputation in a time series. Supports the comparison of multiple state-of-the-art algorithms. |
License: | AGPL (≥ 3) |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.0 |
URL: | https://github.com/pipeben/kssa |
BugReports: | https://github.com/pipeben/kssa/issues |
Depends: | R (≥ 4.0) |
Suggests: | covr, testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
Imports: | magrittr, ggplot2, rlang, methods, forecast, imputeTS, stats, zoo, Metrics, dplyr, missMethods |
Date: | 2022-06-18 |
NeedsCompilation: | no |
Packaged: | 2022-06-18 20:12:46 UTC; Steve |
Author: | Iván Felipe Benavides
|
Repository: | CRAN |
Date/Publication: | 2022-06-21 19:40:02 UTC |
Pipe operator
Description
See magrittr::%>%
for details.
Usage
lhs %>% rhs
Arguments
lhs |
A value or the magrittr placeholder. |
rhs |
A function call using the magrittr semantics. |
Value
The result of calling rhs(lhs)
.
get_imputations function
Description
Function to get imputations from methods compared by kssa
Usage
get_imputations(x_ts, methods = "all", seed = 1234)
Arguments
x_ts |
A ts object with missing data to be imputed |
methods |
A string or string vector indicating the method or methods You can choose between the following:
For further details on these imputation methods please check packages |
seed |
Numeric. Any number |
Value
A list of imputed time series with the selected methods
Examples
# Example 1: Get imputed values for airgap_na_ts with the methods of
library("imputeTS")
library("kssa")
# Create 20% random missing data in tsAirgapComplete time series from imputeTS
airgap_na <- missMethods::delete_MCAR(as.data.frame(tsAirgapComplete), 0.2)
# Convert to time series object
airgap_na_ts <- ts(airgap_na, start = c(1959, 1), end = c(1997, 12), frequency = 12)
my_imputations <- get_imputations(airgap_na_ts, methods = "all")
# my_imputations contains the imputed time series with all methods.
# Access it and choose the one from the best method for your purposes
my_imputations$seadec
plot.ts(my_imputations$seadec)
# Example 2: Get imputed values for airgap_na_ts using only a subset of algorithms
library("imputeTS")
library("kssa")
# Create 20% random missing data in tsAirgapComplete time series from imputeTS
airgap_na <- missMethods::delete_MCAR(as.data.frame(tsAirgapComplete), 0.2)
# Convert to time series object
airgap_na_ts <- ts(airgap_na, start = c(1959, 1), end = c(1997, 12), frequency = 12)
my_imputations <- get_imputations(airgap_na_ts, methods = c("linear_i", "locf"))
# my_imputations contains the imputed time series with all applied
# methods (locf and linear interpolation).
# Access it and choose the one from the best method for your purposes
my_imputations$locf
plot.ts(my_imputations$locf)
kssa Algorithm
Description
Run the Known Sub-Sequence Algorithm to compare the performance of imputation methods on a time series of interest
Usage
kssa(
x_ts,
start_methods,
actual_methods,
segments = 5,
iterations = 10,
percentmd = 0.2,
seed = 1234
)
Arguments
x_ts |
Time series object |
start_methods |
String vector. The method or methods to start the algorithm. Same as for actual_methods |
actual_methods |
The imputation methods to be compared and validated. It can be a string vector containing the following You can choose between the following:
For further details on these imputation methods please check packages |
segments |
Integer. Into how many segments the time series will be divided |
iterations |
Integer. How many iterations to run |
percentmd |
Numeric. Percentage of missing data. Must match with the true percentage of missing data in x_ts |
seed |
Numeric. Random seed to choose |
Value
A list of results to be plotted with function kssa_plot
for easy interpretation
References
Benavides, I. F., Santacruz, M., Romero-Leiton, J. P., Barreto, C., & Selvaraj, J. J. (2022). Assessing methods for multiple imputation of systematic missing data in marine fisheries time series with a new validation algorithm. Aquaculture and Fisheries. Full text publication.
Examples
# Example 1: Compare all imputation methods
library("kssa")
library("imputeTS")
# Create 20% random missing data in tsAirgapComplete time series from imputeTS
airgap_na <- missMethods::delete_MCAR(as.data.frame(tsAirgapComplete), 0.2)
# Convert to time series object
airgap_na_ts <- ts(airgap_na, start = c(1959, 1), end = c(1997, 12), frequency = 12)
# Apply the kssa algorithm with 5 segments, 10 iterations, 20% of missing data,
# compare among all available methods in the package.
# Remember that percentmd must match with
# the real percentage of missing data in the input time series
results_kssa <- kssa(airgap_na_ts,
start_methods = "all",
actual_methods = "all",
segments = 5,
iterations = 10,
percentmd = 0.2
)
# Print and check results
results_kssa
# For an easy interpretation of kssa results
# please use function kssa_plot
# Example 2: Compare only locf and linear imputation
library("kssa")
library("imputeTS")
# Create 20% random missing data in tsAirgapComplete time series from imputeTS
airgap_na <- missMethods::delete_MCAR(as.data.frame(tsAirgapComplete), 0.2)
# Convert to time series object
airgap_na_ts <- ts(airgap_na, start = c(1959, 1), end = c(1997, 12), frequency = 12)
# Apply the kssa algorithm with 5 segments, 10 iterations, 20% of missing data,
# compare among all applied methods (locf and linear interpolation).
# Remember that percentmd must match with
# the real percentage of missing data in the input time series
results_kssa <- kssa(airgap_na_ts,
start_methods = c("locf", "linear_i"),
actual_methods = c("locf", "linear_i"),
segments = 5,
iterations = 10,
percentmd = 0.2
)
# Print and check results
results_kssa
# For an easy interpretation of kssa results
# please use function kssa_plot
kssa_plot function
Description
Function to plot the results of kssa for easy interpretation
Usage
kssa_plot(results, type, metric)
Arguments
results |
An object with results produced with function |
type |
A character value with the type of plot to show. It can be "summary" or "complete". |
metric |
A character with the performance metric to be plotted. It can be "rmse", "mase," "cor", or "smape"
For further details on these metrics please check package Metrics |
Value
A plot of kssa results in which imputation methods are ordered from lower to higher (left to right) error.
Examples
# Example 1: Plot the results from comparing all imputation methods
library("kssa")
library("imputeTS")
# Create 20% random missing data in tsAirgapComplete time series from imputeTS
airgap_na <- missMethods::delete_MCAR(as.data.frame(tsAirgapComplete), 0.2)
# Convert to time series object
airgap_na_ts <- ts(airgap_na, start = c(1959, 1), end = c(1997, 12), frequency = 12)
# Apply the kssa algorithm with 5 segments,
# 10 iterations, 20% of missing data, and
# compare among all available methods in the package.
# Remember that percentmd must match with
# the real percentage of missing data in the input time series
results_kssa <- kssa(airgap_na_ts,
start_methods = "all",
actual_methods = "all",
segments = 5,
iterations = 10,
percentmd = 0.2
)
kssa_plot(results_kssa, type = "complete", metric = "rmse")
# Conclusion: Since kssa_plot is ordered from lower to
# higher error (left to right), method 'linear_i' is the best to
# impute missing data in airgap_na_ts. Notice that method 'locf' is the worst
# To obtain imputations with the best method, or any method of preference
# please use function get_imputations
# Example 2: Plot the results when only applying locf and linear interpolation
library("kssa")
library("imputeTS")
# Create 20% random missing data in tsAirgapComplete time series from imputeTS
airgap_na <- missMethods::delete_MCAR(as.data.frame(tsAirgapComplete), 0.2)
# Convert to time series object
airgap_na_ts <- ts(airgap_na, start = c(1959, 1), end = c(1997, 12), frequency = 12)
# Apply the kssa algorithm with 5 segments,
# 10 iterations, 20% of missing data, and compare among all
# applied methods (locf and linear interpolation).
# Remember that percentmd must match with
# the real percentage of missing data in the input time series
results_kssa <- kssa(airgap_na_ts,
start_methods = c("linear_i", "locf"),
actual_methods = c("linear_i", "locf"),
segments = 5,
iterations = 10,
percentmd = 0.2
)
kssa_plot(results_kssa, type = "complete", metric = "rmse")