--- title: "Introduction to fdars" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to fdars} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, out.width = "100%" ) ``` ## What is Functional Data Analysis? Functional Data Analysis (FDA) is a branch of statistics that deals with data where each observation is a function, curve, or surface rather than a single number or vector. Examples include: - Temperature curves recorded over a day - Growth curves of children over time - Spectrometric measurements across wavelengths - Stock prices throughout trading hours In FDA, we treat each curve as a single observation and develop methods to analyze collections of such curves. ## The fdars Package **fdars** (Functional Data Analysis in Rust) provides a comprehensive toolkit for FDA with a high-performance Rust backend. Key features include: - **Fast computation**: 10-200x speedups over pure R implementations - **Comprehensive methods**: Depth functions, regression, clustering, outlier detection - **Flexible metrics**: Multiple distance measures including DTW - **2D support**: Analysis of surfaces in addition to curves ## Installation ```{r eval=FALSE} # Install from GitHub remotes::install_github("sipemu/fdars") ``` ## Getting Started ```{r setup} library(fdars) library(ggplot2) theme_set(theme_minimal()) ``` ### Creating Functional Data The core data structure is the `fdata` class. Create functional data from a matrix where rows are observations (curves) and columns are evaluation points: ```{r create-fdata} # Generate example data: 20 curves evaluated at 100 points set.seed(42) n <- 20 m <- 100 t_grid <- seq(0, 1, length.out = m) # Create curves: sine waves with random phase and noise X <- matrix(0, n, m) for (i in 1:n) { phase <- runif(1, 0, pi) X[i, ] <- sin(2 * pi * t_grid + phase) + rnorm(m, sd = 0.1) } # Create fdata object fd <- fdata(X, argvals = t_grid) fd ``` ### Adding Identifiers and Metadata You can attach identifiers and metadata (covariates) to functional data: ```{r metadata} # Create metadata with covariates meta <- data.frame( group = factor(rep(c("control", "treatment"), each = 10)), age = sample(20:60, n, replace = TRUE), response = rnorm(n) ) # Create fdata with IDs and metadata fd_meta <- fdata(X, argvals = t_grid, id = paste0("patient_", 1:n), metadata = meta) fd_meta # Access metadata fd_meta$id[1:5] head(fd_meta$metadata) ``` Metadata is preserved when subsetting: ```{r metadata-subset} fd_sub <- fd_meta[1:5, ] fd_sub$id fd_sub$metadata ``` ### Visualizing Functional Data ```{r plot-fdata} plot(fd) ``` ### Basic Operations ```{r basic-ops} # Compute mean function mean_curve <- mean(fd) # Center the data fd_centered <- fdata.cen(fd) # Compute functional variance variance <- var(fd) ``` ### Subsetting Select specific curves or evaluation points: ```{r subset} # First 5 curves fd_subset <- fd[1:5, ] # Specific range of t values fd_range <- fd[, t_grid >= 0.25 & t_grid <= 0.75] ``` ## Key Functionality Overview ### Depth Functions Depth measures how "central" a curve is within a sample. Higher depth indicates a more typical curve: ```{r depth} # Fraiman-Muniz depth depths <- depth(fd, method = "FM") head(depths) # Find the median curve (deepest) median_curve <- median(fd, method = "FM") ``` ### Distance Metrics Compute distances between curves using various metrics: ```{r distances} # L2 (Euclidean) distance dist_l2 <- metric.lp(fd) # Dynamic Time Warping dist_dtw <- metric.DTW(fd) ``` ### Regression Predict a scalar response from functional predictors: ```{r regression} # Generate response y <- rowMeans(X) + rnorm(n, sd = 0.1) # Principal component regression fit_pc <- fregre.pc(fd, y, ncomp = 3) print(fit_pc) ``` ### Clustering Group curves into clusters: ```{r clustering} # K-means clustering km <- cluster.kmeans(fd, ncl = 2, seed = 123) plot(km) ``` ### Outlier Detection Identify atypical curves: ```{r outliers} # Add an outlier X_out <- rbind(X, X[1, ] + 3) fd_out <- fdata(X_out, argvals = t_grid) # Detect outliers out <- outliers.depth.pond(fd_out) plot(out) ``` ## Next Steps Explore the other vignettes for detailed coverage of specific topics: - **Covariance Functions**: Generate Gaussian process samples with various kernels - **Depth Functions**: Comprehensive guide to functional depth measures - **Distance Metrics**: Distance and semimetric functions - **Regression**: Functional regression methods - **Clustering**: Functional k-means and optimal k selection - **Outlier Detection**: Methods for identifying atypical curves ## Performance The Rust backend provides significant speedups for computationally intensive operations. For example, computing depth for 1000 curves: ```{r performance, eval=FALSE} # Generate large dataset X_large <- matrix(rnorm(1000 * 200), 1000, 200) fd_large <- fdata(X_large) # Depth computation is fast even for large datasets system.time(depth(fd_large, method = "FM")) #> user system elapsed #> 0.045 0.000 0.045 ``` ## References - Ramsay, J.O. and Silverman, B.W. (2005). *Functional Data Analysis*. Springer. - Ferraty, F. and Vieu, P. (2006). *Nonparametric Functional Data Analysis*. Springer. - Febrero-Bande, M. and Oviedo de la Fuente, M. (2012). Statistical Computing in Functional Data Analysis: The R Package fda.usc. *Journal of Statistical Software*, 51(4), 1-28.