---
title: "Introduction to fdars"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to fdars}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5,
  out.width = "100%"
)
```

## What is Functional Data Analysis?

Functional Data Analysis (FDA) is a branch of statistics that deals with data
where each observation is a function, curve, or surface rather than a single
number or vector. Examples include:
- Temperature curves recorded over a day
- Growth curves of children over time
- Spectrometric measurements across wavelengths
- Stock prices throughout trading hours

In FDA, we treat each curve as a single observation and develop methods to
analyze collections of such curves.

## The fdars Package

**fdars** (Functional Data Analysis in Rust) provides a comprehensive toolkit
for FDA with a high-performance Rust backend. Key features include:

- **Fast computation**: 10-200x speedups over pure R implementations
- **Comprehensive methods**: Depth functions, regression, clustering, outlier detection
- **Flexible metrics**: Multiple distance measures including DTW
- **2D support**: Analysis of surfaces in addition to curves

## Installation

```{r eval=FALSE}
# Install from GitHub
remotes::install_github("sipemu/fdars")
```

## Getting Started

```{r setup}
library(fdars)
library(ggplot2)
theme_set(theme_minimal())
```

### Creating Functional Data

The core data structure is the `fdata` class. Create functional data from a
matrix where rows are observations (curves) and columns are evaluation points:

```{r create-fdata}
# Generate example data: 20 curves evaluated at 100 points
set.seed(42)
n <- 20
m <- 100
t_grid <- seq(0, 1, length.out = m)

# Create curves: sine waves with random phase and noise
X <- matrix(0, n, m)
for (i in 1:n) {
  phase <- runif(1, 0, pi)
  X[i, ] <- sin(2 * pi * t_grid + phase) + rnorm(m, sd = 0.1)
}

# Create fdata object
fd <- fdata(X, argvals = t_grid)
fd
```

### Adding Identifiers and Metadata

You can attach identifiers and metadata (covariates) to functional data:

```{r metadata}
# Create metadata with covariates
meta <- data.frame(
  group = factor(rep(c("control", "treatment"), each = 10)),
  age = sample(20:60, n, replace = TRUE),
  response = rnorm(n)
)

# Create fdata with IDs and metadata
fd_meta <- fdata(X, argvals = t_grid,
                 id = paste0("patient_", 1:n),
                 metadata = meta)
fd_meta

# Access metadata
fd_meta$id[1:5]
head(fd_meta$metadata)
```

Metadata is preserved when subsetting:

```{r metadata-subset}
fd_sub <- fd_meta[1:5, ]
fd_sub$id
fd_sub$metadata
```

### Visualizing Functional Data

```{r plot-fdata}
plot(fd)
```

### Basic Operations

```{r basic-ops}
# Compute mean function
mean_curve <- mean(fd)

# Center the data
fd_centered <- fdata.cen(fd)

# Compute functional variance
variance <- var(fd)
```

### Subsetting

Select specific curves or evaluation points:

```{r subset}
# First 5 curves
fd_subset <- fd[1:5, ]

# Specific range of t values
fd_range <- fd[, t_grid >= 0.25 & t_grid <= 0.75]
```

## Key Functionality Overview

### Depth Functions

Depth measures how "central" a curve is within a sample. Higher depth indicates
a more typical curve:

```{r depth}
# Fraiman-Muniz depth
depths <- depth(fd, method = "FM")
head(depths)

# Find the median curve (deepest)
median_curve <- median(fd, method = "FM")
```

### Distance Metrics

Compute distances between curves using various metrics:
```{r distances}
# L2 (Euclidean) distance
dist_l2 <- metric.lp(fd)

# Dynamic Time Warping
dist_dtw <- metric.DTW(fd)
```

### Regression

Predict a scalar response from functional predictors:

```{r regression}
# Generate response
y <- rowMeans(X) + rnorm(n, sd = 0.1)

# Principal component regression
fit_pc <- fregre.pc(fd, y, ncomp = 3)
print(fit_pc)
```

### Clustering

Group curves into clusters:

```{r clustering}
# K-means clustering
km <- cluster.kmeans(fd, ncl = 2, seed = 123)
plot(km)
```

### Outlier Detection

Identify atypical curves:

```{r outliers}
# Add an outlier
X_out <- rbind(X, X[1, ] + 3)
fd_out <- fdata(X_out, argvals = t_grid)

# Detect outliers
out <- outliers.depth.pond(fd_out)
plot(out)
```

## Next Steps

Explore the other vignettes for detailed coverage of specific topics:

- **Covariance Functions**: Generate Gaussian process samples with various kernels
- **Depth Functions**: Comprehensive guide to functional depth measures
- **Distance Metrics**: Distance and semimetric functions
- **Regression**: Functional regression methods
- **Clustering**: Functional k-means and optimal k selection
- **Outlier Detection**: Methods for identifying atypical curves

## Performance

The Rust backend provides significant speedups for computationally intensive
operations. For example, computing depth for 1000 curves:

```{r performance, eval=FALSE}
# Generate large dataset
X_large <- matrix(rnorm(1000 * 200), 1000, 200)
fd_large <- fdata(X_large)

# Depth computation is fast even for large datasets
system.time(depth(fd_large, method = "FM"))
#>    user  system elapsed
#>   0.045   0.000   0.045
```

## References
- Ramsay, J.O. and Silverman, B.W. (2005). *Functional Data Analysis*. Springer.
- Ferraty, F. and Vieu, P. (2006). *Nonparametric Functional Data Analysis*. Springer.
- Febrero-Bande, M. and Oviedo de la Fuente, M. (2012). Statistical Computing in Functional Data Analysis: The R Package fda.usc. *Journal of Statistical Software*, 51(4), 1-28.