--- title: "climatestatsr: A Comprehensive Guide to Statistical Tools for Climate Change Analysis" author: "Sadikul Islam" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true toc_depth: 4 number_sections: true fig_width: 7 fig_height: 4.5 vignette: > %\VignetteIndexEntry{climatestatsr: A Comprehensive Guide} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.align = "center", warning = FALSE, message = FALSE ) set.seed(2024) library(climatestatsr) ``` --- # Introduction The **climatestatsr** package provides a self-contained, dependency-light collection of statistical functions for climate change research. All methods operate on standard R vectors, matrices, and data frames and require only base-R `stats`, `graphics`, `grDevices`, and `utils`. ## Package structure | Family | Functions | |---|---| | Temporal analysis | `mk_test`, `sens_slope`, `change_point_detection`, `seasonal_decompose_climate`, `rolling_trend`, `temporal_homogeneity`, `trend_significance`, `autocorrelation_climate` | | Spatial analysis | `morans_i`, `hot_cold_spots`, `spatial_interpolate`, `spatial_trend_field`, `cluster_climate_zones`, `spatial_anomaly`, `elevation_lapse_rate` | | Extreme events | `fit_gev`, `rgev_sim`, `return_period`, `peaks_over_threshold`, `heat_wave_detection`, `cold_spell_detection`, `drought_spell`, `extreme_value_index` | | Climate indices | `spi`, `spei`, `pdsi_simple`, `heat_index`, `wind_chill`, `frost_days`, `growing_degree_days`, `diurnal_temp_range` | | Detection & attribution | `detection_attribution`, `fingerprint_analysis`, `optimal_fingerprint` | | Utilities | `fill_gaps_climate`, `homogenize_series`, `aggregate_climate`, `anomaly_baseline`, `standardize_climate`, `climate_summary` | ## Installation ```r install.packages("climatestatsr") ``` --- # Temporal Analysis Temporal analysis functions test for and quantify monotonic trends, abrupt shifts, and seasonal structure in climate time series. ## Mann-Kendall Trend Test — `mk_test()` The **Mann-Kendall test** (Mann 1945; Kendall 1975) is the standard non-parametric method for detecting monotonic trends in hydro-climatic records. It is rank-based and therefore robust to non-normality and outliers. The optional pre-whitening step (Yue and Wang 2002) removes AR(1) autocorrelation before computing the test statistic. ```{r mk_basic} # Simulate 50 years of warming at 0.025 °C/year years <- 1971:2020 temp <- 14.0 + 0.025 * seq_len(50) + stats::rnorm(50, 0, 0.4) result <- mk_test(temp) print(result) ``` ```{r mk_plot, fig.cap="Mann-Kendall result: time series with Sen slope (left) and value distribution (right)."} plot(result) ``` **Interpreting the output:** - `Z` — standardised statistic; |Z| > 1.96 indicates significance at α = 0.05 - `tau` — Kendall's rank correlation; ranges from −1 (monotone decrease) to +1 (monotone increase) - `p.value` — two-tailed probability under H₀ of no trend - `trend` — plain-language conclusion ```{r mk_prewhiten} # Apply AR(1) pre-whitening for autocorrelated series ar_temp <- as.numeric(stats::arima.sim(list(ar = 0.65), n = 60)) + seq(0, 3, length.out = 60) + 14 mk_pw <- mk_test(ar_temp, prewhiten = TRUE) cat("Pre-whitened MK: Z =", round(mk_pw$statistic, 3), " p =", round(mk_pw$p.value, 4), "\n") ``` ## Sen's Slope Estimator — `sens_slope()` **Sen's slope** (Sen 1968) is the median of all pairwise slopes between observations. It provides a robust estimate of trend magnitude unaffected by outliers. ```{r sens} ss <- sens_slope(years, temp) cat(sprintf( "Sen's slope : %+.4f °C/year\n", ss$slope)) cat(sprintf( "Rate/decade : %+.3f °C\n", ss$slope_decade)) cat(sprintf( "95%% CI : [%.4f, %.4f]\n", ss$slope_ci["lower"], ss$slope_ci["upper"])) ``` ```{r sens_plot, fig.cap="Observed temperature with Sen slope (red dashed) and 95% CI band."} plot(years, temp, pch = 16, cex = 0.7, col = "steelblue", xlab = "Year", ylab = "Temperature (°C)", main = "Annual Mean Temperature with Sen's Slope") abline(a = ss$intercept, b = ss$slope, col = "firebrick", lwd = 2, lty = 2) legend("topleft", legend = c("Observed", "Sen slope"), col = c("steelblue", "firebrick"), pch = c(16, NA), lty = c(NA, 2), lwd = c(NA, 2), bty = "n") ``` ## Change-Point Detection — `change_point_detection()` Two methods are available: - **Pettitt's test** (Pettitt 1979) — rank-based, detects a single mean shift. - **CUSUM** — cumulative-sum method with bootstrap p-value. ```{r cp} # Simulate a step change at observation 30 x <- c(stats::rnorm(30, mean = 14.0, sd = 0.5), stats::rnorm(30, mean = 15.5, sd = 0.5)) cp <- change_point_detection(x, method = "pettitt") cat(sprintf("Change point at index : %d\n", cp$change_point)) cat(sprintf("Mean before shift : %.2f °C\n", cp$mean_before)) cat(sprintf("Mean after shift : %.2f °C\n", cp$mean_after)) cat(sprintf("Significant (α=0.05) : %s\n", cp$significant)) ``` ```{r cp_plot, fig.cap="CUSUM series — the peak identifies the most probable break point."} cp_cusum <- change_point_detection(x, method = "cusum") plot(cp_cusum$cusum, type = "l", col = "steelblue", lwd = 2, xlab = "Index", ylab = "CUSUM", main = "CUSUM Change-Point Detection") abline(v = cp_cusum$change_point, col = "firebrick", lty = 2, lwd = 2) abline(h = 0, lty = 3, col = "gray50") legend("topleft", legend = c("CUSUM", sprintf("Break at %d", cp_cusum$change_point)), col = c("steelblue", "firebrick"), lty = c(1, 2), lwd = 2, bty = "n") ``` ## Seasonal Decomposition — `seasonal_decompose_climate()` Decomposes monthly or quarterly series into **trend**, **seasonal**, and **remainder** components using STL (Loess-based) or classical additive decomposition. ```{r decomp_data} n <- 360 # 30 years of monthly data t_idx <- seq_len(n) temp_m <- 15 + 0.003 * t_idx + 8 * sin(2 * pi * t_idx / 12) + stats::rnorm(n, 0, 0.5) ``` ```{r decomp_plot, fig.height=6, fig.cap="STL decomposition: original, trend, seasonal component, and remainder."} dc <- seasonal_decompose_climate(temp_m, frequency = 12, method = "stl") plot(dc) ``` The extracted trend can itself be tested for significance: ```{r decomp_trend} trend_mk <- mk_test(dc$trend[!is.na(dc$trend)]) cat("Trend component MK test: tau =", round(trend_mk$tau, 3), " p =", round(trend_mk$p.value, 4), "\n") ``` ## Rolling Trend Analysis — `rolling_trend()` Applies Sen's slope over a moving window to reveal **periods of accelerating or decelerating warming**. ```{r rolling, fig.cap="20-year rolling Sen slope: warming accelerated after index ~40."} temp2 <- 13.5 + c(0.01 * seq_len(50), 0.04 * seq_len(41)) + stats::rnorm(91, 0, 0.4) rt <- rolling_trend(temp2, window = 20, step = 2) plot(rt$mid_index, rt$slope_decade, type = "l", col = "steelblue", lwd = 2, xlab = "Mid-window index", ylab = "Trend (°C per decade)", main = "Rolling 20-Year Sen Slope") abline(h = 0, lty = 2, col = "gray50") polygon(c(rt$mid_index, rev(rt$mid_index)), c(rt$slope_decade * 1.15, rev(rt$slope_decade * 0.85)), col = adjustcolor("steelblue", 0.15), border = NA) ``` ## SNHT Homogeneity Test — `temporal_homogeneity()` The **Standard Normal Homogeneity Test** (Alexandersson 1986) detects inhomogeneities caused by station moves, instrument changes, or observer changes. ```{r snht} # Inhomogeneous series: +0.8 °C offset after observation 40 x_inh <- c(stats::rnorm(40, 0, 1), stats::rnorm(40, 0.8, 1)) snht <- temporal_homogeneity(x_inh) cat(sprintf("T0 statistic : %.2f (critical ≈ %.2f)\n", snht$T0, snht$critical)) cat(sprintf("Break at index: %d\n", snht$break_index)) cat(sprintf("Significant : %s\n", snht$significant)) ``` ```{r snht_plot, fig.cap="SNHT statistic series — peak marks the inhomogeneity break."} plot(snht$T_series, type = "l", col = "steelblue", lwd = 1.5, xlab = "Index", ylab = "T statistic", main = "SNHT — Homogeneity Test") abline(v = snht$break_index, col = "firebrick", lty = 2, lwd = 2) abline(h = snht$critical, col = "orange", lty = 3, lwd = 1.5) legend("topright", legend = c("T statistic", "Break point", "Critical value"), col = c("steelblue", "firebrick", "orange"), lty = c(1, 2, 3), lwd = 2, bty = "n") ``` ## Multiple-Station Trend Significance — `trend_significance()` Runs Mann-Kendall on every column of a matrix and applies **FDR** or **Bonferroni** correction for simultaneous testing. ```{r trendsig} mat <- matrix(stats::rnorm(50 * 30), nrow = 50) # impose trend in 10 stations mat[, 1:10] <- mat[, 1:10] + outer(seq_len(50), rep(0.05, 10)) ts_res <- trend_significance(mat, correction = "fdr") cat("Stations with significant trend:\n") print(table(ts_res$trend)) ``` ## Autocorrelation Analysis — `autocorrelation_climate()` Reports ACF, PACF and the Ljung-Box test for residual autocorrelation — important before applying parametric models. ```{r acf, fig.cap="ACF and PACF of an AR(1) climate series (ρ ≈ 0.7)."} ar_series <- as.numeric(stats::arima.sim(list(ar = 0.7), n = 100)) + 15 ac <- autocorrelation_climate(ar_series, max.lag = 20, plot = TRUE) cat("AR(1) sample coefficient:", round(ac$ar1, 3), "\n") cat("Ljung-Box p-value :", round(ac$ljung_box$p.value, 4), "\n") ``` --- # Spatial Analysis ## Moran's I — `morans_i()` **Global Moran's I** (Moran 1950) measures whether similar values cluster spatially (I > 0) or disperse (I < 0). ```{r moran} set.seed(42) n <- 50 coords <- data.frame(lon = stats::runif(n, -10, 10), lat = stats::runif(n, 40, 60)) # Temperature decreases with latitude → positive autocorrelation x_sp <- 26 - 0.35 * coords$lat + stats::rnorm(n, 0, 0.8) mi <- morans_i(x_sp, coords, n_perm = 499) cat(sprintf("Moran's I = %.4f\n", mi$I)) cat(sprintf("Z-score = %.3f (p = %.4f)\n", mi$Z, mi$p.value)) cat(mi$interpretation, "\n") ``` ```{r moran_plot, fig.cap="Scatter plot of spatial values coloured by latitude gradient."} col_ramp <- colorRampPalette(c("steelblue", "white", "firebrick"))(50) val_rank <- rank(x_sp) plot(coords$lon, coords$lat, pch = 21, cex = 1.5, bg = col_ramp[val_rank], xlab = "Longitude", ylab = "Latitude", main = sprintf("Spatial Field (Moran's I = %.3f)", mi$I)) ``` ## Hot-Spot and Cold-Spot Detection — `hot_cold_spots()` The **Getis-Ord Gi*** statistic (Getis and Ord 1992) identifies locations where high (hot spot) or low (cold spot) values cluster locally. ```{r hotspot} set.seed(5) vals <- ifelse(coords$lon > 4, stats::rnorm(n, 30, 2), stats::rnorm(n, 18, 2)) hs <- hot_cold_spots(vals, coords, dist_threshold = 5) print(table(hs$classification)) ``` ```{r hotspot_plot, fig.cap="Getis-Ord Gi* classification: hot spots (east), cold spots (west)."} cols <- c("hot spot" = "firebrick", "cold spot" = "steelblue", "not significant"= "gray80") plot(hs$lon, hs$lat, pch = 21, cex = 1.6, bg = cols[hs$classification], xlab = "Longitude", ylab = "Latitude", main = "Getis-Ord Gi* Hot/Cold Spots") legend("topright", legend = names(cols), pt.bg = cols, pch = 21, pt.cex = 1.4, bty = "n") ``` ## Spatial Interpolation — `spatial_interpolate()` Interpolates station values onto a regular grid using **IDW** or **loess spline**. ```{r interp, fig.cap="IDW interpolation from 25 stations to a 20×20 grid."} set.seed(7) obs <- matrix(stats::runif(50), ncol = 2, dimnames = list(NULL, c("lon","lat"))) vals_obs <- sin(obs[,"lon"] * 3) + cos(obs[,"lat"] * 3) grd <- as.matrix(expand.grid( lon = seq(0.05, 0.95, length.out = 20), lat = seq(0.05, 0.95, length.out = 20))) pred <- spatial_interpolate(obs, vals_obs, grd, method = "idw") image(matrix(pred, 20, 20), col = colorRampPalette(c("steelblue","white","firebrick"))(64), main = "IDW Interpolated Climate Field", xlab = "Longitude", ylab = "Latitude") points((obs[,"lon"] - 0.05) / 0.9, (obs[,"lat"] - 0.05) / 0.9, pch = 3, cex = 0.8) ``` ## Spatial Trend Field — `spatial_trend_field()` Computes **per-cell Mann-Kendall + Sen slope** across a space-time matrix — essential for mapping where warming is occurring. ```{r stf} set.seed(9) mat_st <- matrix(stats::rnorm(50 * 200), nrow = 50) mat_st[, 101:200] <- mat_st[, 101:200] + outer(seq_len(50), rep(0.04, 100)) stf <- spatial_trend_field(mat_st) cat(sprintf("Cells with significant trend: %d / %d (%.0f%%)\n", sum(stf$p.value < 0.05, na.rm = TRUE), nrow(stf), 100 * mean(stf$p.value < 0.05, na.rm = TRUE))) ``` ```{r stf_plot, fig.cap="Fraction of significant trend cells by group."} stf$group <- ifelse(seq_len(nrow(stf)) <= 100, "No trend", "Trend imposed") sig_frac <- tapply(stf$p.value < 0.05, stf$group, mean, na.rm = TRUE) barplot(sig_frac * 100, col = c("gray70", "firebrick"), ylab = "% Significant (α=0.05)", main = "Spatial Trend Field Results", border = NA) ``` ## Climate Zone Classification — `cluster_climate_zones()` K-means clustering on multi-variable climate normals delineates coherent climate regions. ```{r cluster} set.seed(1) clim_mat <- matrix( c(stats::rnorm(200, rep(c(5, 15, 25, 10, 20), each = 40), 2), stats::rnorm(200, rep(c(1200, 600, 200, 900, 400), each = 40), 80)), ncol = 2, dimnames = list(NULL, c("temp", "precip"))) cz <- cluster_climate_zones(clim_mat, k = 5) cat("Cluster sizes:\n") print(table(cz$cluster)) ``` ```{r cluster_plot, fig.cap="Five K-means climate zones in temperature–precipitation space."} pal <- c("#e41a1c","#377eb8","#4daf4a","#984ea3","#ff7f00") plot(clim_mat[, "temp"], clim_mat[, "precip"], pch = 21, cex = 0.9, bg = pal[cz$cluster], xlab = "Mean Temperature (°C)", ylab = "Annual Precipitation (mm)", main = "K-Means Climate Zone Classification") legend("topright", legend = paste("Zone", 1:5), pt.bg = pal, pch = 21, pt.cex = 1.2, bty = "n") ``` ## Elevation Lapse Rate — `elevation_lapse_rate()` Estimates the temperature lapse rate from station data — essential for statistical downscaling. ```{r lapse, fig.cap="Temperature vs elevation with fitted lapse rate."} elev <- seq(100, 3000, by = 100) temp_lapse <- 25 - 0.0065 * elev + stats::rnorm(30, 0, 0.4) lr <- elevation_lapse_rate(temp_lapse, elev) plot(elev, temp_lapse, pch = 16, cex = 0.8, col = "steelblue", xlab = "Elevation (m a.s.l.)", ylab = "Temperature (°C)", main = sprintf("Environmental Lapse Rate: %.2f °C / 1000 m", lr$lapse_rate_per_1000m)) lines(lr$data$elevation, lr$data$fitted, col = "firebrick", lwd = 2) legend("topright", legend = c("Station data", sprintf("Lapse rate = %.3f °C/1000 m (R² = %.3f)", lr$lapse_rate_per_1000m, lr$r_squared)), col = c("steelblue", "firebrick"), pch = c(16, NA), lty = c(NA, 1), lwd = c(NA, 2), bty = "n") ``` --- # Extreme Event Analysis ## GEV Distribution — `fit_gev()` and `rgev_sim()` The **Generalised Extreme Value** distribution (Coles 2001) is the limiting distribution for block-maxima. Three shape families are unified: Gumbel (ξ = 0), Fréchet (ξ > 0, heavy tail), Weibull (ξ < 0, bounded). ```{r gev} set.seed(10) ann_max <- rgev_sim(60, mu = 34, sigma = 4.5, xi = 0.12) gev <- fit_gev(ann_max) print(gev) ``` ## Return Levels — `return_period()` ```{r rl} rp <- return_period(gev, c(2, 5, 10, 20, 50, 100, 200)) print(rp) ``` ```{r rl_plot, fig.cap="GEV return level curve with 95% delta-method confidence band."} with(rp, { plot(T, level, type = "b", log = "x", pch = 16, col = "steelblue", xlab = "Return period (years)", ylab = "Temperature (°C)", main = "GEV Return Level Curve", ylim = range(c(lower, upper), na.rm = TRUE)) polygon(c(T, rev(T)), c(upper, rev(lower)), col = adjustcolor("steelblue", 0.18), border = NA) lines(T, lower, lty = 2, col = "steelblue") lines(T, upper, lty = 2, col = "steelblue") abline(v = c(10, 100), lty = 3, col = "gray60") }) ``` ## Peaks-Over-Threshold — `peaks_over_threshold()` The **POT / GPD** approach (Davison and Smith 1990) uses all exceedances above a threshold rather than only one value per year, yielding more efficient estimates for long return periods. ```{r pot} set.seed(11) daily_p <- stats::rexp(365 * 30, rate = 1 / 6) pot <- peaks_over_threshold(daily_p, threshold = 22, n_years = 30, return_periods = c(10, 50, 100, 200)) cat(sprintf("Threshold : %.0f mm\n", pot$threshold)) cat(sprintf("Peaks retained: %d\n", pot$n_excess)) cat(sprintf("GPD shape (ξ) : %.4f\n", pot$xi)) cat(sprintf("GPD scale (σ) : %.4f\n", pot$sigma)) cat("\nReturn levels:\n") print(pot$return_levels) ``` ## Heat Wave Detection — `heat_wave_detection()` ```{r heatwave} dates <- seq(as.Date("2000-01-01"), by = "day", length.out = 365 * 15) doy <- as.integer(format(dates, "%j")) tmax <- 28 + 10 * sin(2 * pi * doy / 365) + stats::rnorm(length(dates), 0, 2.5) hw <- heat_wave_detection(tmax, dates, threshold = "p95", min_days = 3) cat(sprintf("Heat wave events over 15 years : %d\n", nrow(hw))) cat(sprintf("Mean duration : %.1f days\n", mean(hw$duration))) cat(sprintf("Maximum peak temperature : %.1f °C\n", max(hw$peak_temp))) ``` ```{r heatwave_plot, fig.cap="Annual heat wave count and mean duration over 15 years."} hw$year <- as.integer(format(hw$start_date, "%Y")) ann_hw <- tapply(hw$duration, hw$year, length) barplot(ann_hw, col = "firebrick", border = NA, xlab = "Year", ylab = "Number of events", main = "Annual Heat Wave Frequency (p95 threshold, ≥3 days)") ``` ## Cold Spell Detection — `cold_spell_detection()` ```{r coldspell} cs <- cold_spell_detection( tmin = tmax - stats::rnorm(length(tmax), 12, 1), dates = dates, threshold = "p05", min_days = 3) cat(sprintf("Cold spell events: %d\n", nrow(cs))) if (nrow(cs) > 0) { cat(sprintf("Mean duration : %.1f days\n", mean(cs$duration))) } ``` ## Drought Spell Detection — `drought_spell()` ```{r drought} set.seed(12) spi_vals <- stats::rnorm(360) dates_m <- seq(as.Date("1990-01-01"), by = "month", length.out = 360) droughts <- drought_spell(spi_vals, dates_m, threshold = -1.0, min_duration = 2) cat(sprintf("Drought spells detected : %d\n", nrow(droughts))) cat(sprintf("Mean duration : %.1f months\n", mean(droughts$duration))) cat(sprintf("Maximum severity : %.2f\n", max(droughts$severity))) ``` ## Hill Tail-Index Estimator — `extreme_value_index()` The **Hill estimator** (Hill 1975) characterises how heavy-tailed a distribution is — important for wind speed, flood peaks, and other power-law climate variables. ```{r hill, fig.cap="Hill stability plot — plateau indicates a good choice of k."} set.seed(13) ws <- abs(stats::rnorm(500, 10, 4))^1.5 ev <- extreme_value_index(ws, k = 30) cat(sprintf("Hill tail index (k=%d): %.4f\n", ev$k_used, ev$hill_index)) plot(ev$hill_plot$k, ev$hill_plot$hill, type = "l", col = "steelblue", lwd = 1.5, xlab = "k (number of order statistics)", ylab = "Hill estimate", main = "Hill Stability Plot") abline(v = ev$k_used, col = "firebrick", lty = 2) ``` --- # Climate Indices ## Standardised Precipitation Index — `spi()` The **SPI** (McKee et al. 1993) standardises accumulated precipitation at any time scale (1 to 48+ months) against a gamma reference distribution. Values below −1.0 indicate moderate drought; below −2.0 indicate extreme drought. ```{r spi_data} precip <- stats::rgamma(360, shape = 2, scale = 35) # 30 years spi3 <- spi(precip, scale = 3) spi12 <- spi(precip, scale = 12) cat(sprintf("SPI-3 — mean: %.3f SD: %.3f\n", mean(spi3, na.rm = TRUE), stats::sd(spi3, na.rm = TRUE))) cat(sprintf("SPI-12 — mean: %.3f SD: %.3f\n", mean(spi12, na.rm = TRUE), stats::sd(spi12, na.rm = TRUE))) ``` ```{r spi_plot, fig.height=5, fig.cap="SPI-3 (top) and SPI-12 (bottom) drought indices over 30 years."} old_par <- graphics::par(mfrow = c(2, 1), mar = c(3, 4, 2, 1)) col3 <- ifelse(!is.na(spi3) & spi3 >= 0, "steelblue", "firebrick") graphics::plot(spi3, type = "h", col = col3, xlab = "", ylab = "SPI-3", main = "Standardised Precipitation Index") graphics::abline(h = c(-1, 1), lty = 2, col = "gray40") col12 <- ifelse(!is.na(spi12) & spi12 >= 0, "steelblue", "firebrick") graphics::plot(spi12, type = "h", col = col12, xlab = "Month", ylab = "SPI-12") graphics::abline(h = c(-1, 1), lty = 2, col = "gray40") graphics::par(old_par) ``` ## SPEI — `spei()` The **SPEI** (Vicente-Serrano et al. 2010) fits a log-logistic distribution to the climatic water balance (P − PET), making it sensitive to temperature- driven evapotranspiration increases — a key advantage under climate change. ```{r spei_calc} set.seed(14) tmin_m <- abs(stats::rnorm(360, 8, 3)) + 2 tmax_m <- tmin_m + stats::runif(360, 7, 14) pr_m <- stats::rgamma(360, 5, 0.06) sp6 <- spei(precip = pr_m, tmin = tmin_m, tmax = tmax_m, lat = 45, scale = 6) cat(sprintf("SPEI-6 — mean: %.3f SD: %.3f\n", mean(sp6, na.rm = TRUE), stats::sd(sp6, na.rm = TRUE))) ``` ```{r spei_plot, fig.cap="SPEI-6 — the index accounts for evapotranspiration demand."} col_sp <- ifelse(!is.na(sp6) & sp6 >= 0, "steelblue", "firebrick") graphics::plot(sp6, type = "h", col = col_sp, xlab = "Month", ylab = "SPEI-6", main = "SPEI-6 (Hargreaves PET, lat = 45°)") graphics::abline(h = c(-1, 1), lty = 2, col = "gray40") ``` ## Simplified PDSI — `pdsi_simple()` The **Palmer Drought Severity Index** integrates precipitation deficit and evapotranspiration over multiple months, making it sensitive to slowly evolving droughts. ```{r pdsi, fig.cap="Simplified PDSI — values below −2 indicate severe drought."} doy_m <- rep(1:12, 30) temp_p <- 10 + 12 * sin(pi * doy_m / 6) + stats::rnorm(360, 0, 1) pr_p <- pmax(0, 50 + 20 * cos(pi * doy_m / 6) + stats::rnorm(360, 0, 15)) pdsi_vals <- pdsi_simple(temp_p, pr_p, lat = 40) graphics::plot(pdsi_vals, type = "l", col = "steelblue", xlab = "Month", ylab = "PDSI", main = "Simplified Palmer Drought Severity Index") graphics::abline(h = c(-2, 2), lty = 2, col = c("firebrick","steelblue")) graphics::abline(h = 0, lty = 3, col = "gray50") ``` ## Heat Index — `heat_index()` The **Heat Index** (Rothfusz 1990; Steadman 1979) converts temperature and relative humidity to an apparent "feels-like" temperature. ```{r hi, fig.cap="Heat index surface: apparent temperature rises steeply with humidity."} temp_grid <- seq(25, 45, by = 2) rh_grid <- seq(30, 100, by = 5) hi_mat <- outer(temp_grid, rh_grid, FUN = function(t, r) heat_index(t, r)) image(temp_grid, rh_grid, hi_mat, col = colorRampPalette(c("lightyellow","orange","firebrick"))(64), xlab = "Air temperature (°C)", ylab = "Relative humidity (%)", main = "Heat Index (°C)") contour(temp_grid, rh_grid, hi_mat, levels = c(27, 32, 41, 54), add = TRUE, col = "white") ``` ## Wind Chill — `wind_chill()` ```{r wc, fig.cap="Wind chill surface: perceived temperature can be far below air temperature."} temp_wc <- seq(-30, 5, by = 2) wind_wc <- seq(5, 80, by = 5) wc_mat <- outer(temp_wc, wind_wc, FUN = wind_chill) image(temp_wc, wind_wc, wc_mat, col = colorRampPalette(c("navy","steelblue","white"))(64), xlab = "Air temperature (°C)", ylab = "Wind speed (km/h)", main = "Wind Chill Temperature (°C)") contour(temp_wc, wind_wc, wc_mat, levels = c(-40, -30, -20, -10), add = TRUE, col = "gray20") ``` ## Frost Days — `frost_days()` ```{r frost, fig.cap="Annual frost day count simulated over 10 years."} dates_d <- seq(as.Date("2010-01-01"), by = "day", length.out = 365 * 10) doy_d <- as.integer(format(dates_d, "%j")) tmin_d <- 5 - 15 * sin(2 * pi * doy_d / 365) + stats::rnorm(length(dates_d), 0, 2) fd <- frost_days(tmin_d, dates_d, by = "year") barplot(fd, col = "steelblue", border = NA, xlab = "Year", ylab = "Frost days", main = "Annual Frost Day Count (Tmin < 0 °C)") ``` ## Growing Degree Days — `growing_degree_days()` ```{r gdd, fig.cap="Cumulative GDD accumulation through the growing season."} dates_g <- seq(as.Date("2020-01-01"), as.Date("2020-12-31"), by = "day") doy_g <- as.integer(format(dates_g, "%j")) tmax_g <- 22 + 12 * sin(2 * pi * doy_g / 365) tmin_g <- 10 + 8 * sin(2 * pi * doy_g / 365) gdd_cum <- growing_degree_days(tmax_g, tmin_g, base_temp = 10, cumulative = TRUE) plot(gdd_cum, type = "l", col = "darkgreen", lwd = 2, xlab = "Day of year", ylab = "Cumulative GDD (base 10 °C)", main = "Growing Degree Days 2020") abline(v = 91, lty = 2, col = "gray60") # ~April 1 abline(v = 274, lty = 2, col = "gray60") # ~Oct 1 text(91, max(gdd_cum) * 0.05, "Apr", col = "gray40", adj = 0) text(274, max(gdd_cum) * 0.05, "Oct", col = "gray40", adj = 1) ``` ## Diurnal Temperature Range — `diurnal_temp_range()` ```{r dtr, fig.cap="Monthly mean DTR — a proxy for cloudiness and land-surface change."} dtr <- diurnal_temp_range(tmax_g, tmin_g, dates_g, by = "month") barplot(dtr, col = "steelblue", border = NA, names.arg = month.abb, xlab = "Month", ylab = "DTR (°C)", main = "Mean Diurnal Temperature Range by Month") ``` --- # Detection and Attribution ## Signal Detection — `detection_attribution()` A projection-based approach that tests whether the observed climate change is distinguishable from natural internal variability (Hasselmann 1979; Allen and Tett 1999). ```{r detect} set.seed(20) obs_anom <- cumsum(stats::rnorm(50, 0.03, 0.12)) nat_ens <- matrix(stats::rnorm(50 * 30, 0, 0.45), ncol = 30) forc_sig <- seq(0, 1.5, length.out = 50) da <- detection_attribution(obs_anom, nat_ens, forc_sig) cat(sprintf("Signal detected : %s\n", da$detected)) cat(sprintf("Z-score : %.3f\n", da$z_score)) cat(sprintf("p-value : %.4f\n", da$p.value)) cat(sprintf("Attribution fraction: %.1f%%\n", da$attribution_fraction * 100)) ``` ```{r detect_plot, fig.cap="Observed projection vs natural ensemble distribution — signal clearly separated."} hist(da$projection_natural, col = "steelblue", border = "white", breaks = 15, xlab = "Projection onto forced signal (Pearson r)", main = "Detection Test: Observed vs Natural Variability") abline(v = da$projection_observed, col = "firebrick", lwd = 3) legend("topright", legend = c("Natural ensemble", sprintf("Observed (r = %.3f)", da$projection_observed)), fill = c("steelblue", NA), border = c("white", NA), lty = c(NA, 1), col = c(NA, "firebrick"), lwd = c(NA, 3), bty = "n") ``` ## EOF Fingerprint Analysis — `fingerprint_analysis()` Extracts leading **Empirical Orthogonal Functions** (Lorenz 1956; von Storch and Zwiers 1999) to identify dominant spatial modes of climate variability or change. ```{r eof} set.seed(21) mat_eof <- matrix(stats::rnorm(60 * 200), nrow = 60) # Inject a dominant warming pattern in first 80 locations mat_eof[, 1:80] <- mat_eof[, 1:80] + outer(seq(0, 2, length.out = 60), rep(1, 80)) fp <- fingerprint_analysis(mat_eof, n_eof = 3) cat(sprintf("EOF1 explains: %.1f%% of variance\n", fp$var_explained[1] * 100)) cat(sprintf("EOF2 explains: %.1f%% of variance\n", fp$var_explained[2] * 100)) cat(sprintf("Cumulative (3 EOFs): %.1f%%\n", fp$cumvar[3] * 100)) ``` ```{r eof_plot, fig.cap="Leading PC time series — EOF1 captures the forced warming trend."} matplot(fp$pc[, 1:3], type = "l", lty = 1, lwd = 1.5, col = c("firebrick", "steelblue", "darkgreen"), xlab = "Time step", ylab = "PC score", main = "Leading EOF Principal Components") legend("topleft", legend = sprintf("PC%d (%.1f%%)", 1:3, fp$var_explained * 100), col = c("firebrick", "steelblue", "darkgreen"), lty = 1, lwd = 1.5, bty = "n") ``` ## Optimal Fingerprint Regression — `optimal_fingerprint()` Estimates **scaling factors** for anthropogenic (ANT) and natural (NAT) signals using GLS regression (Allen and Tett 1999; Hegerl and Zwiers 2011). ```{r ofp} set.seed(22) obs_c <- cumsum(stats::rnorm(50, 0.03, 0.10)) all_c <- cumsum(stats::rnorm(50, 0.028, 0.05)) + cumsum(stats::rnorm(50, 0.005, 0.03)) nat_c <- cumsum(stats::rnorm(50, 0, 0.12)) ofp <- optimal_fingerprint(obs_c, all_c, nat_c) cat(sprintf("ANT scaling factor: %.3f\n", ofp$beta_all)) cat(sprintf("NAT scaling factor: %.3f\n", ofp$beta_nat)) cat(sprintf("Residual variance : %.4f\n", ofp$residual_variance)) ``` --- # Data Quality and Utilities ## Gap Filling — `fill_gaps_climate()` Three methods: **linear interpolation**, **monthly climatology**, and **reference-station regression**. ```{r gaps} x_gaps <- c(10.2, NA, NA, NA, 14.0, 15.1, NA, 17.3, 18.0) cat("Original :", x_gaps, "\n") cat("Filled :", round(fill_gaps_climate(x_gaps), 2), "\n") ``` ## Homogenisation — `homogenize_series()` Corrects a detected SNHT break by shifting the earlier segment to match the mean of the later segment. ```{r homog} set.seed(25) x_inh <- c(stats::rnorm(40, 14.0, 0.5), stats::rnorm(40, 15.8, 0.5)) x_hom <- homogenize_series(x_inh) cat(sprintf("Before adjustment — mean of segment 1: %.2f\n", mean(x_inh[1:40]))) cat(sprintf("After adjustment — mean of segment 1: %.2f\n", mean(x_hom[1:40]))) cat(sprintf("Mean of segment 2 (reference) : %.2f\n", mean(x_hom[41:80]))) ``` ## Temporal Aggregation — `aggregate_climate()` ```{r agg} dates_agg <- seq(as.Date("2000-01-01"), by = "day", length.out = 365 * 5) temp_agg <- stats::rnorm(length(dates_agg), 15, 6) ann <- aggregate_climate(temp_agg, dates_agg, to = "annual") seas <- aggregate_climate(temp_agg, dates_agg, to = "seasonal") cat("Annual means:\n"); print(ann) ``` ## Anomaly Calculation — `anomaly_baseline()` ```{r anom, fig.cap="Temperature anomalies relative to 1961–1990 baseline."} yr <- 1950:2020 temp_b <- 13.5 + 0.022 * seq_len(71) + stats::rnorm(71, 0, 0.45) anom <- anomaly_baseline(temp_b, yr, 1961, 1990) plot(yr, anom, type = "l", col = "steelblue", lwd = 1.5, xlab = "Year", ylab = "Anomaly (°C)", main = "Temperature Anomalies (1961–1990 baseline)") abline(h = 0, lty = 2, col = "gray50") lines(stats::lowess(yr, anom, f = 0.3), col = "firebrick", lwd = 2.5) legend("topleft", legend = c("Annual anomaly", "LOWESS smoother"), col = c("steelblue", "firebrick"), lty = 1, lwd = c(1.5, 2.5), bty = "n") ``` ## Standardisation — `standardize_climate()` ```{r std} x_raw <- stats::rnorm(120, mean = 18, sd = 5) z <- standardize_climate(x_raw) cat(sprintf("Raw — mean: %.2f SD: %.2f\n", mean(x_raw), stats::sd(x_raw))) cat(sprintf("Std — mean: %.6f SD: %.6f\n", mean(z), stats::sd(z))) ``` ## Comprehensive Summary — `climate_summary()` ```{r csummary} temp_long <- 13.5 + 0.022 * seq_len(71) + stats::rnorm(71, 0, 0.45) res <- climate_summary(temp_long, variable_name = "Annual Mean Temperature (°C)") ``` --- # References Allen, M. R. and Tett, S. F. B. (1999). Checking for model consistency in optimal fingerprinting. *Climate Dynamics* **15**(6), 419–434. [doi:10.1007/s003820050291](https://doi.org/10.1007/s003820050291) Alexandersson, H. (1986). A homogeneity test applied to precipitation data. *International Journal of Climatology* **6**(6), 661–675. [doi:10.1002/joc.3370060607](https://doi.org/10.1002/joc.3370060607) Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate. *Journal of the Royal Statistical Society: Series B* **57**(1), 289–300. [doi:10.1111/j.2517-6161.1995.tb02031.x](https://doi.org/10.1111/j.2517-6161.1995.tb02031.x) Coles, S. (2001). *An Introduction to Statistical Modeling of Extreme Values*. Springer-Verlag, London. [doi:10.1007/978-1-4471-3675-0](https://doi.org/10.1007/978-1-4471-3675-0) Davison, A. C. and Smith, R. L. (1990). Models for exceedances over high thresholds. *Journal of the Royal Statistical Society: Series B* **52**(3), 393–442. [doi:10.1111/j.2517-6161.1990.tb01796.x](https://doi.org/10.1111/j.2517-6161.1990.tb01796.x) Getis, A. and Ord, J. K. (1992). The analysis of spatial association by use of distance statistics. *Geographical Analysis* **24**(3), 189–206. [doi:10.1111/j.1538-4632.1992.tb00261.x](https://doi.org/10.1111/j.1538-4632.1992.tb00261.x) Hegerl, G. C. and Zwiers, F. (2011). Use of models in detection and attribution of climate change. *Wiley Interdisciplinary Reviews: Climate Change* **2**(4), 570–591. [doi:10.1002/wcc.121](https://doi.org/10.1002/wcc.121) Hill, B. M. (1975). A simple general approach to inference about the tail of a distribution. *Annals of Statistics* **3**(5), 1163–1174. [doi:10.1214/aos/1176343247](https://doi.org/10.1214/aos/1176343247) Mann, H. B. (1945). Nonparametric tests against trend. *Econometrica* **13**(3), 245–259. [doi:10.2307/1907187](https://doi.org/10.2307/1907187) McKee, T. B., Doesken, N. J. and Kleist, J. (1993). The relationship of drought frequency and duration to time scales. *Proceedings of the 8th Conference on Applied Climatology*, 179–184. American Meteorological Society. Moran, P. A. P. (1950). Notes on continuous stochastic phenomena. *Biometrika* **37**(1), 17–23. [doi:10.2307/2332142](https://doi.org/10.2307/2332142) Pettitt, A. N. (1979). A nonparametric approach to the change-point problem. *Applied Statistics* **28**(2), 126–135. [doi:10.2307/2346729](https://doi.org/10.2307/2346729) Sen, P. K. (1968). Estimates of the regression coefficient based on Kendall's tau. *Journal of the American Statistical Association* **63**(324), 1379–1389. [doi:10.2307/2285891](https://doi.org/10.2307/2285891) Steadman, R. G. (1979). The assessment of sultriness. Part I. *Journal of Applied Meteorology* **18**(7), 861–873. [doi:10.1175/1520-0450(1979)018%3C0861:TAOSPI%3E2.0.CO;2](https://doi.org/10.1175/1520-0450(1979)018%3C0861:TAOSPI%3E2.0.CO;2) Thornthwaite, C. W. (1948). An approach toward a rational classification of climate. *Geographical Review* **38**(1), 55–94. [doi:10.2307/210739](https://doi.org/10.2307/210739) Vicente-Serrano, S. M., Begueria, S. and Lopez-Moreno, J. I. (2010). A Multiscalar Drought Index Sensitive to Global Warming. *Journal of Climate* **23**(7), 1696–1718. [doi:10.1175/2009JCLI2909.1](https://doi.org/10.1175/2009JCLI2909.1) von Storch, H. and Zwiers, F. W. (1999). *Statistical Analysis in Climate Research*. Cambridge University Press, Cambridge. [doi:10.1017/CBO9780511612336](https://doi.org/10.1017/CBO9780511612336) Yue, S. and Wang, C. Y. (2002). Applicability of prewhitening to eliminate the influence of serial correlation on the Mann-Kendall test. *Water Resources Research* **38**(6), 4-1–4-7. [doi:10.1029/2001WR000861](https://doi.org/10.1029/2001WR000861) --- ```{r session} sessionInfo() ```