The SignalY package provides a comprehensive framework for signal extraction from panel data, integrating multiple complementary methodologies:
The package distinguishes between latent structure (the underlying data-generating process) and phenomenological dynamics (observed variability), providing automated technical interpretation of results.
# Install from CRAN (when available)
install.packages("SignalY")
# Install development version from GitHub
# remotes::install_github("username/SignalY")
# For Bayesian methods (Horseshoe, HP-GC), install cmdstanr:
install.packages("cmdstanr", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))
cmdstanr::install_cmdstan()library(SignalY)
# Generate example data
set.seed(42)
n <- 100
p <- 20
# Create correlated predictors with factor structure
factors <- matrix(rnorm(n * 2), n, 2)
loadings <- matrix(runif(p * 2, -1, 1), p, 2)
X <- factors %*% t(loadings) + matrix(rnorm(n * p, 0, 0.5), n, p)
colnames(X) <- paste0("X", 1:p)
# True signal depends on 5 predictors
true_beta <- c(rep(1, 5), rep(0, 15))
Y <- X %*% true_beta + rnorm(n, 0, 0.5)
# Combine into data frame
data <- data.frame(Y = Y, X)
# Run comprehensive analysis
result <- signal_analysis(
data = data,
y_formula = "Y",
methods = "all",
verbose = TRUE
)
# View results
print(result)
summary(result)
plot(result)SignalY operationalizes the distinction between latent structure and phenomenological dynamics:
This distinction maps onto statistical concepts:
| Concept | Statistical Implementation |
|---|---|
| Signal | Trend component, common factors |
| Noise | Idiosyncratic shocks, measurement error |
| Sparsity | Few variables carry information |
| Persistence | Unit roots, long memory |
| Question | Recommended Method |
|---|---|
| What are the dominant frequencies? | Wavelets, EMD |
| Which predictors matter? | Horseshoe regression |
| Is there common factor structure? | PCA, DFM |
| Is the series stationary? | Unit root battery |
| What’s the trend vs cycle? | HP-GC filter |
# Apply wavelet decomposition
Y <- sin(seq(0, 4*pi, length.out = 128)) + rnorm(128, 0, 0.3)
wav_result <- filter_wavelet(
x = Y,
filter = "la8", # Daubechies least asymmetric, 8 vanishing moments
levels = c(3, 4), # Combine D3 + D4 (business cycle frequencies)
first_diff = FALSE
)
# Examine results
names(wav_result)
# [1] "combined" "smooth" "details" "level_variance" "filter" "levels"
# Combined signal captures 8-32 period cycles (for annual data)
plot(wav_result$combined, type = "l", main = "Wavelet-Filtered Signal")The wavelet filter choice matters:
"la8": Least asymmetric, 8 vanishing moments
(recommended for economic data)"d4": Daubechies 4, more compact support"haar": Simplest, for discontinuous signalsLevel selection corresponds to periodicities:
| Level | Period Range (annual data) |
|---|---|
| D1 | 2-4 years |
| D2 | 4-8 years |
| D3 | 8-16 years |
| D4 | 16-32 years |
| D5 | 32-64 years |
EMD is data-adaptive, requiring no basis function specification:
emd_result <- filter_emd(
x = Y,
max_imf = 10,
boundary = "wave"
)
# IMFs are ordered from highest to lowest frequency
matplot(emd_result$imf[, 1:3], type = "l",
main = "First 3 Intrinsic Mode Functions")
# Residue captures the trend
plot(emd_result$residue, type = "l", main = "EMD Residue (Trend)")The Grant-Chan embedded HP filter provides:
# Requires cmdstanr
hpgc_result <- filter_hpgc(
x = Y,
lambda = NULL, # Auto-select via DIC
n_chains = 4,
n_iter = 2000
)
# Compare to standard HP filter
plot(Y, type = "l", col = "gray")
lines(hpgc_result$trend, col = "blue", lwd = 2)
legend("topright", c("Original", "Bayesian Trend"),
col = c("gray", "blue"), lwd = c(1, 2))The Horseshoe prior provides:
# Fit Horseshoe regression
hs_fit <- fit_horseshoe(
y = Y,
X = X,
p0 = 5, # Expected non-zeros (can be NULL for auto)
n_chains = 4,
n_iter = 2000,
n_warmup = 1000,
adapt_delta = 0.95, # Target acceptance (increase if divergences)
use_qr = TRUE # QR decomposition for multicollinearity
)
# Examine shrinkage
print(hs_fit)
# Key outputs:
# - beta: Coefficient estimates
# - kappa: Shrinkage factors (0 = no shrinkage, 1 = full shrinkage)
# - m_eff: Effective number of non-zeros
# Select variables by shrinkage threshold
selection <- select_by_shrinkage(hs_fit, threshold = 0.5)
which(selection$selected) # Indices of selected predictorsThe shrinkage factor \(\kappa_j\) quantifies how much coefficient \(j\) is pulled toward zero:
\[\kappa_j = \frac{1}{1 + \tau^2 \lambda_j^2 / c^2}\]
| \(\kappa\) | Interpretation |
|---|---|
| 0.0-0.1 | Strong signal, minimal shrinkage |
| 0.1-0.3 | Moderate signal |
| 0.3-0.5 | Weak signal |
| 0.5-0.7 | Likely noise |
| 0.7-1.0 | Strong shrinkage, essentially zero |
pca_result <- pca_bootstrap(
X = X,
n_components = NULL, # Auto-select
rotation = "varimax", # Or "none", "oblimin"
n_boot = 1000,
block_length = NULL, # Auto: sqrt(T)
significance_level = 0.05
)
# Key outputs
pca_result$variance_explained # Proportion by component
pca_result$loadings # Variable loadings
pca_result$entropy # Entropy of loadings (concentration measure)
pca_result$loadings_significant # Bootstrap significance
# Interpretation
# Low entropy = concentrated loadings (few variables dominate)
# High entropy = diffuse loadings (many variables contribute)DFMs capture dynamic common factor structure:
dfm_result <- estimate_dfm(
X = X,
n_factors = NULL, # Auto-select via Bai-Ng IC
max_factors = 10,
var_lags = 1, # VAR lags for factor dynamics
ic_criterion = "bai_ng_2"
)
# Key outputs
dfm_result$n_factors # Optimal number
dfm_result$factors # Estimated factors (T x r)
dfm_result$loadings # Factor loadings (p x r)
dfm_result$variance_explained
dfm_result$var_coefficients # VAR transition matrixur_result <- test_unit_root(
x = Y,
tests = c("adf", "ers", "kpss", "pp")
)
# Synthesized conclusion
ur_result$synthesis
# $conclusion: "stationary", "trend_stationary", "difference_stationary", or "inconclusive"
# $confidence: "high", "medium", "low"
# $evidence: Detailed reasoning
# Individual test results
ur_result$tests$adf_none
ur_result$tests$ers_dfgls| Conclusion | Meaning | Implications |
|---|---|---|
| Stationary | Mean-reverting, transient shocks | Use levels in regression |
| Trend-stationary | Deterministic trend + stationary | Detrend before analysis |
| Difference-stationary | Stochastic trend (unit root) | First-difference or cointegration |
| Inconclusive | Mixed evidence | Use theory, consider both |
signal_analysis() orchestrates all methods:
result <- signal_analysis(
data = data,
y_formula = "Y", # Or formula: Y ~ X1 + X2 + X3
time_var = NULL, # Time variable name
group_var = NULL, # Panel group variable
methods = "all", # Or c("wavelet", "horseshoe", "pca")
# Method-specific configuration
filter_config = list(
wavelet_filter = "la8",
wavelet_levels = c(3, 4),
hpgc_n_chains = 4
),
horseshoe_config = list(
p0 = NULL, # Auto-calibrate
n_chains = 4,
adapt_delta = 0.95,
kappa_threshold = 0.5
),
pca_config = list(
rotation = "none",
n_boot = 1000
),
dfm_config = list(
ic_criterion = "bai_ng_2"
),
# General options
na_action = "interpolate", # Or "omit", "fail"
standardize = TRUE,
first_difference = FALSE,
verbose = TRUE,
seed = 42
)
# Access components
result$filters$wavelet
result$horseshoe
result$pca
result$dfm
result$unitroot
result$interpretationThe interpretation component synthesizes results:
result$interpretation$signal_characteristics
# $smoothness: Variance of second differences
# $smoothness_interpretation: "Very smooth", "Moderately volatile", etc.
# $trend_share: Proportion of variance from trend
result$interpretation$variable_selection
# $sparsity_ratio: Proportion shrunk to zero
# $n_selected: Number of selected predictors
# $top_predictors: Data frame of top variables
result$interpretation$factor_structure
# $pc1_entropy: Shannon entropy of PC1 loadings
# $topology_interpretation: "Concentrated", "Diffuse", etc.
result$interpretation$persistence
# $conclusion: Stationarity type
# $interpretation: Plain-language description
result$interpretation$overall_summary
# Combined narrative synthesisNot every analysis needs every method:
Statistical significance ≠ theoretical validity:
| Method | Complexity | Memory | Time (n=1000, p=100) |
|---|---|---|---|
| Wavelets | O(n log n) | Low | < 1 sec |
| EMD | O(n × IMFs) | Low | 1-5 sec |
| HP-GC | O(n × iter × chains) | Medium | 30-60 sec |
| Horseshoe | O(n × p × iter × chains) | High | 1-5 min |
| PCA | O(n × p²) | Medium | < 1 sec |
| DFM | O(n × p × factors) | Medium | 1-10 sec |
Stan-based methods (Horseshoe, HP-GC) automatically parallelize across chains.
Piironen, J., & Vehtari, A. (2017). Sparsity information and regularization in the horseshoe and other shrinkage priors. Electronic Journal of Statistics, 11(2), 5018-5051.
Bai, J., & Ng, S. (2002). Determining the Number of Factors in Approximate Factor Models. Econometrica, 70(1), 191-221.
Grant, A. L., & Chan, J. C. C. (2017). A Bayesian Model Comparison for Trend-Cycle Decompositions of Output. Journal of Money, Credit and Banking, 49(2-3), 525-552.
Percival, D. B., & Walden, A. T. (2000). Wavelet Methods for Time Series Analysis. Cambridge University Press.
Huang, N. E., et al. (1998). The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings of the Royal Society A, 454, 903-995.