--- title: "CoSMoS R | Complete Stochastic Modelling Solution" author: "Simon Michael Papalexiou, Francesco Serinaldi, Filip Strnad, Yannis Markonis, Kevin Shook" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true number_sections: yes vignette: > %\VignetteIndexEntry{CoSMoS R | Complete Stochastic Modelling Solution} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} header-includes: - \usepackage{amsmath} editor_options: chunk_output_type: inline --- *** ```{r setup, include=FALSE} knitr::opts_chunk$set(eval = TRUE, echo = TRUE, fig.width = 7, warning = FALSE, message = FALSE) library(CoSMoS) library(plot3D) ``` `CoSMoS` was conceived back in 2009 (see note in section 5) and was officially released on CRAN in April 2019. Since then `CoSMoS` has become one of the leading and most widely downloaded R packages for stochastic simulation of non-Gaussian time series. Designed with the end user in mind, **`CoSMoS` makes univariate, multivariate, or random field simulations easy**. You can reliably generate time series or fields of rainfall, streamflow, wind speed, relative humidity, or any other environmental variable in seconds. Just choose the probability distribution and correlation properties of the time series you want to generate, and it will do the rest.
~_Unified Theory of Stochastic Modelling_ | [Papalexiou (2018)](https://doi.org/10.1016/j.advwatres.2018.02.013) _"Hydroclimatic processes come in all “shapes and sizes”. They are characterized by different spatiotemporal correlation structures and probability distributions that can be continuous, mixed-type, discrete or even binary. Simulating such processes by reproducing precisely their marginal distribution and linear correlation structure, including features like intermittency, can greatly improve hydrological analysis and design. ... Here, a single framework is proposed that unifies, extends, and improves a general-purpose modelling strategy, based on the assumption that any process can emerge by transforming a specific “parent”" Gaussian process."_ ~_Precise Temporal Disaggregation_ | [Papalexiou et al. (2018)](https://doi.org/10.1029/2018WR022726) _"Hydroclimatic variables such as precipitation and temperature are often measured or simulated by climate models at coarser spatiotemporal scales than those needed for operational purposes. … Here we introduce a novel disaggregation method, named Disaggregation Preserving Marginals and Correlations (DiPMaC), that is able to disaggregate a coarse-scale time series to any finer scale, while reproducing the probability distribution and the linear correlation structure of the fine-scale process. DiPMaC is also generalized for arbitrary nonstationary scenarios to reproduce time varying marginals."_ ~_Random Fields Simplified_ | [Papalexiou and Serinaldi (2020)](https://doi.org/10.1029/2019WR026331) _"Nature manifests itself in space and time. The spatiotemporal complexity of processes such as precipitation, temperature, and wind, does not allow purely deterministic modeling. Spatiotemporal random fields have a long history in modeling such processes, and yet a single unified framework offering the flexibility to simulate processes that may differ profoundly does not exist. Here we introduce a blueprint to efficiently simulate spatiotemporal random fields that preserve any marginal distribution, any valid spatiotemporal correlation structure, and intermittency."_ ~_Advancing Space-Time Simulation_ | [Papalexiou, Serinaldi and Porcu (2021)](https://doi.org/10.1029/2020WR029466) _"...we advance random field simulation by introducing the concepts of general velocity fields and general anisotropy transformations. To illustrate the potential of CoSMoS, we simulate random fields with complex patterns and motion mimicking rainfall storms moving across an area, spiraling fields resembling weather cyclones, fields converging to (or diverging from) a point, and colliding air masses."_ *** # Step by step guide to timeseries simulation {#section_1} NOTE: For R code chunks running longer than ~10s, we report the CPU time for a Windows 10 Pro x64 laptop with Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz, 4-core, 32GB RAM. In most cases we can accurately simulate a process by reproducing its marginal distribution and its autocorrelation structure. The first is typically represented by a parametric probability distribution; the second either by a parametric correlation structure, or by specific correlation coefficient values. `CoSMoS` is the first software that enables the generation of time series while preserving any marginal distribution, correlation structure, and intermittency. Intermittency can be defined by the probability of zero values, and is essential in the simulation of precipitation at fine scales (e.g., daily, hourly) but also for any other intermittent process, such as flows of ephemeral streams, wind speed, etc. Accurate generation of time series requires knowledge of the statistical properties of the process under study. Processes such as precipitation, streamflow, temperature, wind, or evaporation may have very different properties; that is, they can be described by different probability distributions, correlation structures, and may or may not be intermittent. There are four main steps required to generate a time series: 1. Choosing a probability distribution 2. Choosing a correlation structure, 3. Adding intermittency, and 4. Validating the results. The four steps are explained below. ## Choosing a Probability Distribution The choice of a probability distribution to describe the variable you wish to simulate is crucial. The probability distribution expresses how frequently specific magnitudes of the variable occur in the data set. Air **temperature** is typically described by bell-shaped distributions like the [Normal](https://en.wikipedia.org/wiki/Normal_distribution). **Relative humidity** needs distributions defined in the interval (0,1) such as the [Beta](https://en.wikipedia.org/wiki/Beta_distribution) or the [Kumaraswamy](https://en.wikipedia.org/wiki/Kumaraswamy_distribution). **Streamflow** typically has heavy tails and power-type distributions might be needed to reproduce the behavior of the extremes. **Wind speed** typically can be modelled by positively skewed distributions such as the [Weibull](https://en.wikipedia.org/wiki/Weibull_distribution). **Precipitation** at fine scales has to be modelled by mixed-type (or else zero inflated) distributions, that is, having a probability value to describe the occurrence of zeros, and a continuous distribution, such the [Generalized Gamma](https://en.wikipedia.org/wiki/Generalized_gamma_distribution) or the [Burr type XII](https://en.wikipedia.org/wiki/Burr_distribution), to describe the frequency of nonzero values. `CoSMoS` supports **all the standard distribution** functions of `R` and from other packages (tens of distributions) but also comes with some built-in distributions such as Generalized Gamma, the [Pareto type II](https://en.wikipedia.org/wiki/Pareto_distribution), and the Burr type III and XII distributions. More details on the `CoSMoS` distributions and their parameters can be found in [Section 4](#section_4). Distributions may have differing numbers and types of parameters (e.g. location, scale, and shape parameters). For example, the Generalized Gamma distribution has one scale and two shape parameters. `CoSMoS` has built-in functions to fit distributions to data and assess their goodness-of-fit based on graphs; but you can also use any other package for distribution fitting. ## Choosing a Correlation Structure The correlation structure expresses how much the random variables depend upon each other. Processes at fine temporal scales are typically more correlated than those at coarse scales. CoSMoS offers two options to introduce correlations: (1) by defining specific lag autocorrelations as a list starting from lag 0 up to a desired limit, or (2) by using a pre-defined parametric autocorrelation structure. For example, you can generate a time series with lag-1 autocorrelation $\rho_1 = 0.8$ and Generalized Gamma marginal distribution by: ```{r, warning=FALSE, message=FALSE} ## (i) specifying the sample size no <- 1000 ## (ii) defining the type of marginal distribution and its parameters marginaldist <- "ggamma" param <- list(scale = 1, shape1 = .8, shape2 = .8) ## (iii) defining the desired autocorrelation acf.my <- c(1, 0.8) ## (iv) simulating ggamma_sim <- generateTS(n = no, margdist = marginaldist, margarg = param, acsvalue = acf.my) ## and (v) visually checking the generated time series quickTSPlot(ggamma_sim[[1]]) ``` You can specify also as many autocorrelation values as you wish. In this example, autocorrelation values are specified up to lag 4: ```{r, warning=FALSE, message=FALSE} acf <- c(1, 0.6, 0.5, 0.4, 0.3) #up to lag-4 ggamma_sim <- generateTS(n = no, margdist = marginaldist, margarg = param, acsvalue = acf) quickTSPlot(ggamma_sim[[1]]) ``` A better approach is to use an autocorrelation structure expressed by a parametric function. `CoSMoS` has built in autocorrelation structures than can be used by the `acs()` function. You can use the following ACS structures: * Fractional Gaussian Noise; this is the well-known one-parameter fGn ACS. * Weibull; a flexible two parameter of exponential form. * Pareto II: a two parameter power-type ACS. * Burr XII: a three parameter power-type ACS. For most practical applications a two-parameter ACS is sufficient; we suggest either the Weibull or the Pareto II. The `acs()` function produces autocorrelation values based on any of the four structures and any desired lag. Thus, instead of setting each autocorrelation coefficient explicitly (which might be problematic from a technical viewpoint), we can generate series preserving any one of the four ACSs. For example, you can easily define an ACS and visualize its values up to any lag. Here, all four ACSs are defined and visualized up to lag 10. You can see how changing the ACS function changes how the correlation coefficients decay to zero. ```{r, warning=FALSE, message=FALSE} ## specify lag lags <- 0:10 ## get the ACS f <- acs(id = "fgn", t = lags, H = .75) b <- acs(id = "burrXII", t = lags, scale = 1, shape1 = .6, shape2 = .4) w <- acs(id = "weibull", t = lags, scale = 2, shape = 0.8) p <- acs(id = "paretoII", t = lags, scale = 3, shape = 0.3) ## visualize the ACS dta <- data.table(lags, f, b, w, p) m.dta <- melt(data = dta, id.vars = "lags") ggplot(data = m.dta, mapping = aes(x = lags, y = value, group = variable, colour = variable)) + geom_point(size = 2.5) + geom_line(lwd = 1) + scale_color_manual(values = c("steelblue4", "red4", "green4", "darkorange"), labels = c("FGN", "Burr XII", "Weibull", "Pareto II"), name = "") + labs(x = bquote(lag ~ tau), y = "ACS") + scale_x_continuous(breaks = lags) + theme_light() ``` We can re-generate the previously generated series which used explicity defined correlations up to lag 4, now with a two parameter Pareto II ACF up to lag 30, which improves the modelling parsimony. More details about the parametric autocorrelation structures can be found in section 3.2 in [Papalexiou (2018)](https://doi.org/10.1016/j.advwatres.2018.02.013). ```{r, warning=FALSE, message=FALSE} acf <- acs(id = "paretoII", t = 0:30, scale = 1, shape = .75) ggamma_sim <- generateTS(n = no, margdist = marginaldist, margarg = param, acsvalue = acf) dta <- data.frame(time = 1:no, value = ggamma_sim[[1]]) quickTSPlot(dta$value) ``` Apart from the four autocorrelation functions provided by `acs()`, we can also create our own. Note that it is important to ensure that the function is positive definite, otherwise the autoregressive model cannot be fitted. This example shows the generation of a time series with the same Generalized Gamma marginal distribution as above, but with a user-defined exponential ACS function up to lag 30. Note that although the time series density is very similar to that in the previous plot, the texture of the time series is very different, due to the changed ACS function. ```{r, warning=FALSE, message=FALSE} my_acf <- exp(seq(0, -2, -0.1)) ggamma_sim <- generateTS(n = no, margdist = marginaldist, margarg = param, acsvalue = my_acf) quickTSPlot(ggamma_sim[[1]]) ``` ## Adding Intermittency (if required) `CoSMoS` easily simulates intermittent processes such as precipitation. The only extra step needed is to define the probability of zero events. For example, say you wish to generate 5 mutually independent timeseries having the previously defined Generalized Gamma distribution with the Pareto II ACS and probability zero p0 = 0.9. The generated data thus will have 90% of zero values (i.e. dry days). ```{r, fig.height = 7, warning=FALSE, message=FALSE} prob_zero <- .9 ## the argument `TSn = 5` enables the simulation of 5 timeseries. ggamma_sim <- generateTS(n = no, margdist = marginaldist, margarg = param, acsvalue = acf, p0 = prob_zero, TSn = 5) plot(x = ggamma_sim, main = "") + theme_light() ``` ## Validating the Results You can readily check some basic statistics of the generated time series using the `checkTS()` function, which return the first three moments, probability zero and the first three autocorrelation coefficients. ```{r, warning=FALSE, message=FALSE} checkTS(ggamma_sim) ``` *** # Multivariate and Random Fields Simulation {#section_2} The above methods can readily be extended to multidimensional cases, thus enabling the simulation of spatiotemporally correlated random vectors (i.e. correlated timeseries at multiple locations) and random fields (i.e. gridded data), as discussed in detail by [Papalexiou and Serinaldi (2020)](https://doi.org/10.1029/2019WR026331). This requires the definition of suitable spatiotemporal correlation structures (see [Porcu et al. (2020)](https://doi.org/10.1002/wics.1512) for a thorough review of this topic). ## Spatiotemporal Correlation Structures CoSMoS allows the definition of spatiotemporal correlation functions using the function `stcs()`, which the spatiotemporal counterpart of the purely temporal `acs()`. Two classes of spatiotemporal correlation functions are provided: * Clayton [(Papalexiou and Serinaldi 2020)](https://doi.org/10.1029/2019WR026331) * Gneiting [(Gneiting 2002)](https://doi.org/10.1198/016214502760047113) The `stcs()` function can produce values of the linear spatiotemporal correlation for any desired time lag and spatial distance using these two correlation function classes, which comprise a variety of structures covering most cases of practical interest. This example shows the Clayton-Weibull spatiotemporal correlation structure: ```{r, fig.height = 7, warning=FALSE, message=FALSE} ## specify grid of spatial and temporal lags d <- 51 st <- expand.grid(0:(d - 1), 0:(d - 1)) ## get the STCS wc <- stcfclayton(t = st[, 1], s = st[, 2], scfid = "weibull", tcfid = "weibull", copulaarg = 2, scfarg = list(scale = 20, shape = 0.7), tcfarg = list(scale = 5.1,shape = 0.8)) ## visualize the STCS wc.m <- matrix(data = wc, nrow = d) j <- tail(which(wc.m[1, ] > 0.15), 1) i <- tail(which(wc.m[, 1] > 0.15), 1) wc.m <- wc.m[1:i, 1:j] persp3D(z = wc.m, x = 1: nrow(wc.m), y = 1:ncol(wc.m), expand = 1, main = "", scale = TRUE, facets = TRUE, xlab="Time lag", ylab = "Distance", zlab = "STCF", colkey = list(side = 4, length = 0.5), phi = 20, theta = 120, resfac = 5, col= gg2.col(100)) ``` ## Multivariate Time Series Simulation Similar to the one-dimensional case (i.e. using the function `generateTS()`), `CoSMoS` provides the functions `generateMTS()` and `generateMTSFast()` to simulate multiple timeseries with specified (identical) marginals and spatiotemporal correlation structures (STCs). Gaussian Vector AutoRegressive (VAR) models are used to reproduce the parent-Gaussian STFC (see [Papalexiou (2018)](https://doi.org/10.1016/j.advwatres.2018.02.013) and [Papalexiou and Serinaldi (2020)](https://doi.org/10.1029/2019WR026331)). The VAR parameters see e.g. [(Biller and Nelson 2003)](https://doi.org/10.1145/937332.937333) corresponding to the desired spatiotemporal correlation function of the target process, are produced by the function `fitVAR()`. In this example, a set of five random locations is defined that could represent precipitation in five different places. A VAR is then fitted using * a 4^th^ order process (the spatiotemporal correlations will be preserved up to temporal lag 4), * the Burr XII marginal distribution, * a probability zero `p0 = 0.8`, * a Clayton-Weibull spatiotemporal correlation structure (see [Papalexiou and Serinaldi (2020)](https://doi.org/10.1029/2019WR026331)), using Weibull functions for both the temporal and spatial correlations. From the five locations, and the VAR, a set of five timeseries is generated. When the series are plotted, the spatio-temporal correlations among the series can be seen. ```{r, fig.height = 5, warning=FALSE, message=FALSE} ## set a sequence of hypothetical coordinates d <- 5 coord <- cbind(runif(d)*30, runif(d)*30) ## compute VAR model parameters fit <- fitVAR(spacepoints = coord, p = 4, margdist = "burrXII", margarg = list(scale = 3, shape1 = .9, shape2 = .2), p0 = 0.8, stcsid = "clayton", stcsarg = list(scfid = "weibull", tcfid = "weibull", copulaarg = 2, scfarg = list(scale = 25, shape = 0.7), tcfarg = list(scale = 3.1, shape = 0.8) ) ) ## generate correlated timeseries sim <- generateMTS(n = 500, STmodel = fit) ## visualize simulated timeseries dta <- melt(data = data.table(time = 1:nrow(sim), sim[,1:d]), id.vars = "time") ggplot(data = dta, mapping = aes(x = time, y = value)) + geom_line() + facet_grid(facets = variable ~ ., scales = "free_y") + theme_light() ``` The function `generateMTSFast()` generates multiple time series with approximately separable STCS [(Serinaldi and Kilsby 2018)](https://doi.org/10.1029/2018WR023055), using an algorithm which is computationally less expensive than that in `generateMTS()`. This allows the simulation of a larger number of cross-correlated and serially dependent timeseries, at the cost of using separable STCSs and accepting some lack of accuracy in the exact reproduction of some terms of the required STCS [(Serinaldi and Kilsby 2018)](https://doi.org/10.1029/2018WR023055). This example uses `generateMTSFast()` with similar parameters to the previous example using `generateMTS()`. ```{r, fig.height = 5, warning=FALSE, message=FALSE} ## set a sequence of hypothetical coordinates d <- 5 coord <- cbind(runif(d)*30, runif(d)*30) ## fit and generate correlated timeseries sim <- generateMTSFast(n = 500, spacepoints = coord, p0 = 0.7, margdist ="burrXII", margarg = list(scale = 3, shape1 = .9, shape2 = .2), stcsarg = list(scfid = "weibull", tcfid = "weibull", scfarg = list(scale = 25, shape = 0.7), tcfarg = list(scale = 3.1, shape = 0.8)) ) ## visualize simulated timeseries dta <- melt(data = data.table(time = 1:nrow(sim), sim[,1:d]), id.vars = "time") ggplot(data = dta, mapping = aes(x = time, y = value)) + geom_line() + facet_grid(facets = variable ~ ., scales = "free_y") + theme_light() ``` ## Random Fields Simulation ### Isotropic Random Fields `CoSMoS` simulates spatially and temporally correlated isotropic random fields with the functions `generateRF()` and `generateRFFast()`, which are analogs of `generateMTS()` and `generateMTSFast()`, with the same syntax. The only difference is the specification of the spatial points, which is an integer denoting the side length \eqn{m} of a square grid \eqn{(mxm)}. As was mentioned in the previous section, the algorithm in `generateRFFast()` is computationally less expensive than that of `generateRF()`, enabling the simulation of random fields over a greater number of grid points (see [Papalexiou (2018)](https://doi.org/10.1016/j.advwatres.2018.02.013), [Papalexiou and Serinaldi (2020)](https://doi.org/10.1029/2019WR026331), and [Serinaldi and Kilsby (2018)](https://doi.org/10.1029/2018WR023055) for more details). The example below shows the use of both `generateRF` and `generateRFFast`. The generation of random fields using `generateRF` took approximately 16 times as long as using `generateRFFast`. ```{r, warning=FALSE, message=FALSE} ## compute VAR model parameters ## CPU time: ~15s fit <- fitVAR(spacepoints = 20, p = 3, margdist ="burrXII", margarg = list(scale = 3, shape1 = .9, shape2 = .2), p0 = 0.8, stcsid = "clayton", stcsarg = list(scfid = "weibull", tcfid = "weibull", copulaarg = 2, scfarg = list(scale = 20, shape = 0.7), tcfarg = list(scale = 1.1, shape = 0.8) ) ) ## generate isotropic random fields with nonseparable correlation structure sim1 <- generateRF(n = 1000, STmodel = fit) ## fast simulation of isotropic random fields with separable correlation structure sim2 <- generateRFFast(n = 1000, spacepoints = 20, p0 = 0.7, margdist ="paretoII", margarg = list(scale = 1, shape = .3), stcsarg = list(scfid = "weibull", tcfid = "weibull", scfarg = list(scale = 20, shape = 0.7), tcfarg = list(scale = 1.1, shape = 0.8))) ``` ### Validating the Simulated Random Fields The properties of the generated random fields can be checked with `checkRF()`. If the parameter `method = "stat"` (the default), the function returns the first three moments, and the probability zero at 20 grid points. If the parameter `method = "statplot"`, `checkRF()` returns a multi-panel plot comparing the theoretical and empirical spatial correlation functions at 0, 1, and 2 time lags, the contour plot of the target STCS, and the empirical and target CDFs. If the parameter `method = "fields"`, the function returns level plots of the selected number of simulated random fields (`nfields`) to visualize the temporal persistence of spatial patterns. If the parameter `method = "movie"`, the fields are written to an animated GIF called "movieRF.gif" in the current working directory. This example returns the stats, stat plots and fields for the first simulation, which used `generateRF`. ```{r, fig.height = 6.5, warning=FALSE, message=FALSE} ## check random fields ## CPU time: ~20s checkRF(RF = sim1, nfields = 9*9, method = "stat") checkRF(RF = sim1, nfields = 9*9, method = "statplot") checkRF(RF = sim1, nfields = 9*9, method = "field") ``` ### Anisotropic Random Fields The functions `generateRF()` and `generateRFFast()` allows for the simulation of spatially and temporally correlated anisotropic random fields by specifying the arguments `anisotropyid` and `anisotropyarg` when calculating the model parameters via `fitVAR()`. Three types of anisotropic effects are supported, namely: affine (rotation and stretching), and swirl-like, and "wavy" deformation (see the documentation of the function `anisotropyT`, [Papalexiou, Serinaldi and Porcu (2021)](https://doi.org/10.1029/2020WR029466), and references therein for more details). The example below reports the simulation of swirl-like random fields mimicking for instance the shape of atmospheric cyclones. Firstly, one can visualize the the transformed of the coordinate system ```{r, fig.width = 4, fig.height = 4, warning=FALSE, message=FALSE} ## specify a grid of coordinates m = 30 aux <- seq(0, m - 1, length = m) coord <- expand.grid(aux, aux) ## transform the coordinate system at <- anisotropyTswirl(spacepoints = coord, x0 = floor(m / 2), y0 = floor(m / 2), b = 10, alpha = 1.8 * pi) ## visualize transformed coordinate system aux = data.frame(lon = at[ ,1], lat = at[ ,2], id1 = rep(1:m, each = m), id2 = rep(1:m, m)) ggplot(aux, aes(x = lon, y = lat)) + geom_path(aes(group = id1)) + geom_path(aes(group = id2)) + geom_point(col = 2) + theme_light() ``` Then, one can generate a set of swirl-like random fields with the desired spatiotemporal correlation structure and visualize them ```{r, fig.height = 7, warning=FALSE, message=FALSE} ## compute VAR model parameters fit <- fitVAR(spacepoints = m, p = 1, margdist = 'burrXII', margarg = list(scale = 3, shape1 = .9, shape2 = .2), p0 = 0.1, stcsid = "clayton", stcsarg = list(scfid = "weibull", tcfid = "weibull", copulaarg = 2, scfarg = list(scale = 2, shape = 0.7), tcfarg = list(scale = 20.1, shape = 0.8)), anisotropyid = "swirl", anisotropyarg = list(x0 = floor(m / 2), y0 = floor(m / 2), b = 10, alpha = 1.8 * pi) ) ## generate set.seed(9) sim3 <- generateRF(n=25, STmodel = fit) ## check checkRF(RF = sim3, nfields = 5*5, method = "field") ``` ### Lagrangian Random Fields `CoSMoS` implements advances in space-time simulation presented in [Papalexiou, Serinaldi and Porcu (2021)](https://doi.org/10.1029/2020WR029466) that enable the simulation of Lagrangian random fields, i.e. random fields characterized by mass transport (advection). The motion is described by velocity vectors with specified orthogonal components _u_ and _v_ at each grid point. Motion vectors can be specified by setting the arguments `advectionid` and `advectionarg` of `fitVAR` function. Several types of advection fields are supported, but users can create their own following the structure of the existing functions (see the documentation of the function `advectionF` for more details). The example below refers to the generation of random fields with affine anisotropy and uniform advection, mimicking rainfall storms moving north-easterly. ```{r, fig.height = 7.0, warning=FALSE, message=FALSE} ## compute VAR model parameters fit <- fitVAR(spacepoints = 30, p = 1, margdist = 'burrXII', margarg = list(scale = 3, shape1 = .9, shape2 = .2), p0 = 0.8, stcsid = "clayton", stcsarg = list(scfid = "weibull", tcfid = "weibull", copulaarg = 2, scfarg = list(scale = 20, shape = 0.7), tcfarg = list(scale = 30.1, shape = 0.8)), anisotropyid = "affine", anisotropyarg = list(phi1 = 2., phi2 = 0.5, phi12 = 0, theta = -pi/3), advectionid = "uniform", advectionarg = list(u = 2.5, v = 1.5) ) ## generate set.seed(10) # 1 # 5 sim3 <- generateRF(n=81, STmodel = fit) ## check checkRF(RF = sim3, nfields = 9*9, method = "field") ``` *** # Stochastic Simulation of Real-World Processes {#section_3} In practice, it is often desired to simulate a natural process based on the properties of an observed time series. This section shows how this can easily be done using `CoSMoS` functions. ## Hourly Precipitation Precipitation can be easily simulated in `CoSMoS` from observed data. This is an example of a time series of observed hourly precipitation (units are hundredths of an inch). ```{r, warning=FALSE, message=FALSE} data("precip") quickTSPlot(precip$value) ``` To generate a synthetic time series with similar characteristics to the observed precipitation, you have to determine (1) the marginal distribution, (2) the autocorrelation structure, and (3) the probability zero for each individual month. This can be done by trying to fit various distributions and autocorrelation structures with `analyzeTS()` and then checking the goodness-of-fit with `reportTS()`, which can return the plots of the distribution (`method = "dist"`), ACS (`method = "acs"`) or basic statistics (`method = "stat"`). ```{r, fig.height = 9, warning=FALSE, message=FALSE} ## CPU time: ~75s precip_ggamma <- analyzeTS(TS = precip, season = "month", dist = "ggamma", acsID = "weibull", lag.max = 12) reportTS(aTS = precip_ggamma, method = "dist") + theme_light() reportTS(aTS = precip_ggamma, method = "acs") + theme_light() reportTS(aTS = precip_ggamma, method = "stat") ``` In this example, the Generalized Gamma distribution and the Weibull autocorrelation structure describe well the observed time series. Other combinations might not give such good results. For example, the Pareto II distribution and the Fractional Gaussian Noise ACS do not give good monthly fits: ```{r, fig.height = 9, warning=FALSE, message=FALSE} precip_pareto <- analyzeTS(TS = precip, season = "month", dist = "paretoII", acsID = "fgn", lag.max = 12) reportTS(aTS = precip_pareto, method = "dist")+ theme_light() reportTS(aTS = precip_pareto, method = "acs") + theme_light() ``` Once you choose the marginal distribution and the autocorrelation structure you can produce as many and as long synthetic series you wish. The generated series will reproduce the distribution, correlation structure and probability zero (intermittency), using the `simulateTS()` function. This example uses the Generalized Gamma distribution and the Weibull autocorrelation structure, which were fitted above. ```{r, warning=FALSE, message=FALSE} sim_precip <- simulateTS(aTS = precip_ggamma, from = as.POSIXct(x = "1978-12-01 00:00:00"), to = as.POSIXct(x = "2008-12-01 00:00:00")) dta <- precip dta[, id := "observed"] sim_precip[, id := "simulated"] dta <- rbind(dta, sim_precip) ggplot(data = dta) + geom_line(mapping = aes(x = date, y = value)) + facet_wrap(facets = ~id, ncol = 1) + theme_light() ``` ## Daily streamflow We can also apply the same methods to daily streamflows. For this example we have used the streamflow records of the Nassawango Creek near Snow Hill (Maryland, US; USGS Id 01485500) from the US Geological Survey (USGS) repository [(https://waterdata.usgs.gov/nwis)](https://waterdata.usgs.gov/nwis). In this case, the fitted distribution was log-normal, and the ACS was Pareto II. ```{r, fig.height = 9, warning=FALSE, message=FALSE} ## CPU time: ~240s data("disch") str <- analyzeTS(TS = disch, dist = "lnorm", norm = "N2", acsID = "paretoII", lag.max = 20, constrain = TRUE, season = "month") reportTS(aTS = str) + theme_light() reportTS(aTS = str, method = "stat") ``` Given the fitted values, streamflows can be simulated as ```{r, fig.height = 3.5, warning=FALSE, message=FALSE} sim_str <- simulateTS(aTS = str) dta <- disch dta[, id := "observed"] sim_str[, id := "simulated"] dta <- rbind(dta, sim_str) ggplot(data = dta) + geom_line(mapping = aes(x = date, y = value)) + facet_wrap(facets = ~id, ncol = 1) + theme_light() ``` # Recommended and Supported Distributions in `CoSMoS` {#section_4} ## Recommended Distributions for Hydroclimatic Variables As general guideline we suggest the following distributions for different variables: * Precipitation: Generalized Gamma (`ggamma`), Burr type XII (`burrXII`) * Streamflow: Burr type III and XII (`burrIII`, `burrXII`), Generalized Gamma (`ggamma`), Lognormal (`lnorm`) * Relative humidity: Beta (beta), Kumaraswamy (`kumar` in [ `extraDistr`](https://cran.R-project.org/package=extraDistr) package) * Wind speed: Weibull (`weibull`), Generalized Gamma (`ggamma`) * Temperature: Normal (`norm`) * Maxima of any variable: GEV (`gev`) ## Supported Distribution in `CoSMoS` CoSMoS supports most distributions available in `R` with defined quantile, distribution, and probability density functions. Additionally, we provide four flexible distributions that we highly recommend. **Burr type III** distribution argument/name - `burrIII` parameters - `list(scale, shape1, shape2)` $$f_{\text{BrIII}}(x)=\frac{\left(\frac{x}{\beta }\right)^{-\frac{1}{\gamma _2}-1} \left(\gamma_1^{-1} \left(\frac{x}{\beta }\right)^{-\frac{1}{\gamma _2}}+1\right) ^{-\gamma_1 \gamma_2-1}}{\beta } \\ F_{\text{BrIII}}(x)= \left(\gamma_1^{-1} \left(\frac{x}{\beta }\right)^{-\frac{1}{\gamma _2}}+1\right) ^{-\gamma _1 \gamma _2} \\ Q_{\text{BrIII}}(u)=\beta \left(\gamma _1 \left(u^{-\frac{1}{\gamma _1 \gamma _2}}-1\right)\right)^{-\gamma _2} \\ m_{\text{BrIII}}(q)=\frac{\beta ^q \gamma _1^{\gamma _2 (-q)} \Gamma \left(\left(q+\gamma _1\right) \gamma _2\right) \Gamma \left(1-q \gamma _2\right)}{\Gamma \left(\gamma _1 \gamma _2\right)}$$ **Burr type XII** distribution argument/name - `burrXII` parameters - `list(scale, shape1, shape2)` $$f_{\text{BrXII}}(x)=\frac{\left(\frac{x}{\beta }\right)^{\gamma _1-1} \left(\gamma _2 \left(\frac{x}{\beta }\right)^{\gamma _1}+1\right)^{-\frac{1}{\gamma _1 \gamma _2}-1}}{\beta } \\ F_{\text{BrXII}}(x)=1-\left(\gamma _2 \left(\frac{x}{\beta }\right)^{\gamma _1}+1\right)^{-\frac{1}{\gamma _1 \gamma _2}} \\ Q_{\text{BrXII}}(u)=\beta \left(-\frac{1-(1-u)^{-\gamma _1 \gamma _2}}{\gamma _2}\right)^{\frac{1}{\gamma _1}} \\ m_{\text{BrXII}}(q)=\frac{\beta ^q \gamma _2^{-\frac{q}{\gamma _1}-1} B\left(\frac{q+\gamma _1}{\gamma _1},\frac{1-q \gamma _2}{\gamma _1 \gamma _2}\right)}{\gamma _1}$$ **Generalized Gamma** distribution argument/name - `ggamma` parameters - `list(scale, shape1, shape2)` $$f_{\mathcal{G}\mathcal{G}}(x)=\frac{\gamma _2 e^{-\left(\frac{x}{\beta }\right)^{\gamma _2}} \left(\frac{x}{\beta }\right)^{\gamma _1-1}}{\beta \Gamma \left(\frac{\gamma _1}{\gamma _2}\right)} \\ F_{\mathcal{G}\mathcal{G}}(x)=Q\left(\frac{\gamma _1}{\gamma _2},0,\left(\frac{x}{\beta }\right)^{\gamma _2}\right) \\ Q_{\mathcal{G}\mathcal{G}}(u)=\beta Q^{-1}\left(\frac{\gamma _1}{\gamma _2},0,u\right)^{\frac{1}{\gamma _2}} \\ m_{\mathcal{G}\mathcal{G}}(q)=\frac{\beta ^q \Gamma \left(\frac{q}{\gamma _2}+\frac{\gamma _1}{\gamma _2}\right)}{\Gamma \left(\frac{\gamma _1}{\gamma _2}\right)}$$ **Pareto II** (also known as Lomax) distribution argument/name - `paretoII` parameters - `list(scale, shape)` $$f_{\text{PII}}(x)=\frac{\left(\frac{\gamma x}{\beta }+1\right)^{-\frac{1}{\gamma }-1}}{\beta } \\ F_{\text{PII}}(x)=1-\left(\frac{\gamma x}{\beta }+1\right)^{-1/\gamma } \\ Q_{\text{PII}}(u)=\frac{\beta \left((1-u)^{-\gamma }-1\right)}{\gamma } \\ m_{\text{PII}}(q)=\frac{\Gamma (q+1) \left(\frac{\beta }{\gamma }\right)^q \Gamma \left(\frac{1}{\gamma }-q\right)}{\Gamma \left(\frac{1}{\gamma }\right)}$$ # Historical Note {#section_5} The genesis of CoSMoS dates from 2009 when Simon Michael Papalexiou developed the core of its methods while at National Technical University of Athens (NTUA), Greece, and applied them initially for multivariate rainfall modeling for the needs of an internal project at NTUA. Simon soon invited a few "colleagues" to build a start-up company, named Oktana, to develop stochastic simulation software. The details were described in a legal document [(Papalexiou, 2010)](https://www.researchgate.net/publication/333323778_Stochastic_modelling_demystified) as a first step to create proprietary software, although Oktana was never created. In 2015, Simon with Yannis Markonis reconsidered the idea of a start-up company participating in the Climate-KIC competition. In 2017 Simon moved to California and Yannis to Prague, and they started discussing the release of CoSMoS as open-source software. The methods were published in arXiv on July 21, 2017 [(Papalexiou, 2017)](https://arxiv.org/abs/1707.06842) and few months later in Advances in Water Resources [(Papalexiou, 2018)](https://doi.org/10.1016/j.advwatres.2018.02.013), followed by a publication on disaggregation (DiPMaC) in WRR [(Papalexiou et al., 2018)](https://doi.org/10.1029/2018WR022726). In the meanwhile, Filip Strnad joined the team in 2018 and coded the first version of CoSMoS in `R`, as the original code was developed by Simon in Mathematica. Francesco Serinaldi joined the team in 2019 initially to co-author a paper on random fields with Simon which was published in WRR in January 2020 [(Papalexiou and Serinaldi, 2020)](https://doi.org/10.1029/2019WR026331). Francesco developed the R code of CoSMoS version 2, collaborating with Filip and Kevin, and extending its functionalities to include the methods of the previously mentioned paper, that is, enabling the simulation of random fields and multivariate time series. The last addition to our team is Kevin Shook who has become the package maintainer and offers advice on `R`. The latest version of CoSMoS implements major advances in space time modeling introduced in [Papalexiou, Serinaldi and Porcu (2021)](https://doi.org/10.1029/2020WR029466) that allow realistic simulations of hydro-environmental fluxes, including rainfall storms, fields resembling weather cyclones, fields converging to (or diverging from) a point, colliding air masses, and more. We are serene to see this ten-year Odyssey finally back on track to Ithaca. Leaving the burdens of the past behind, empowered by a great team and standing for dignity and academic integrity, we will keep striving to offer free, innovative, and reliable software for stochastic modelling. As a closing remark, we thank our imitators, after all, as Oscar Wilde said: “Imitation is the sincerest form of flattery that mediocrity can pay to greatness”. # References {#section_6} 1. Papalexiou, S.M. (2018). _Unified theory for stochastic modelling of hydroclimatic processes: Preserving marginal distributions, correlation structures, and intermittency_. Advances in Water Resources, 115, 234-252, [link](https://doi.org/10.1016/j.advwatres.2018.02.013) 2. Papalexiou, S.M., Markonis, Y., Lombardo, F., AghaKouchak, A., Foufoula-Georgiou, E. (2018). _Precise Temporal Disaggregation Preserving Marginals and Correlations (DiPMaC) for Stationary and Nonstationary Processes_. Water Resources Research, 54(10), 7435-7458, [link](https://doi.org/10.1029/2018WR022726) 3. Papalexiou, S.M., Serinaldi, F. (2020). _Random Fields Simplified: Preserving Marginal Distributions, Correlations, and Intermittency, With Applications From Rainfall to Humidity_. Water Resources Research, 56(2), e2019WR026331, [link](https://doi.org/10.1029/2019WR026331) 4. Papalexiou, S.M., Serinaldi, F., Porcu, E. (2021). _Advancing Space-Time Simulation of Random Fields: From Storms to Cyclones and Beyond_. Water Resources Research, 57, e2020WR029466, [link](https://doi.org/10.1029/2020WR029466) 5. Serinaldi, F., Kilsby, C.G. (2018). _Unsurprising Surprises: The Frequency of Record-breaking and Overthreshold Hydrological Extremes Under Spatial and Temporal Dependence_. Water Resources Research, 54(9), 6460-6487, [link](https://doi.org/10.1029/2018WR023055) * Papalexiou, S.M. (2010). _Stochastic modelling demystified_. 10.13140/RG.2.2.34889.60008.[link](https://www.researchgate.net/publication/333323778_Stochastic_modelling_demystified) * Papalexiou, S.M. (2017). _A unified theory for exact stochastic modelling of univariate and multivariate processes with continuous, mixed type, or discrete marginal distributions and any correlation structure_. ArXiv:1707.06842 [Math, Stat]. [link](https://arxiv.org/abs/1707.06842)