---
title: "An example with simulated data"
output: rmarkdown::html_vignette
vignette: >
  # %\VignetteIndexEntry{An example with simulated data}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(sspm)
library(mgcv)
library(dplyr)
```

The following example shows the typical `sspm` workflow. We will use simulated data.

```{r data}
sfa_boundaries
borealis_simulated
predator_simulated
catch_simulated
```

1. The first step of the `sspm` workflow is to create a `sspm_boundary` from an `sf` object, providing the `boundary` that delineates the boundary regions. The object can then be plotted with `spm_plot` (as can most `sspm` objects). 

```{r boundaries, fig.width=7, fig.height=5, fig.fullwidth=TRUE, fig.align='center'}
bounds <- spm_as_boundary(boundaries = sfa_boundaries, 
                          boundary = "sfa")

plot(bounds)
```

2. The second step consists in wrapping a `data.frame`, `tibble` or `sf` object into a `sspm_data` object, with a few other pieces of relevant information, such as the name, dataset type (biomass, predictor or catch, depending on the type of information contained), time column and coordinates column (i not `sf`) and unique row identifier. Here we wrap the borealis dataset that contains the biomass information.

```{r}
biomass_dataset <- 
  spm_as_dataset(borealis_simulated, name = "borealis",
                 density = "weight_per_km2",
                 time = "year_f",
                 coords = c('lon_dec','lat_dec'), 
                 uniqueID = "uniqueID")

biomass_dataset
```

3. We do the same with the predator data, which are of the predictor type.

```{r}
predator_dataset <- 
  spm_as_dataset(predator_simulated, name = "all_predators", 
                 density = "weight_per_km2",
                 time = "year_f",
                 coords = c("lon_dec", "lat_dec"),
                 uniqueID = "uniqueID")

predator_dataset
```

4. The `sspm` workflow relies on the discretization of the boundary objects, the default method being voronoi tesselation.

```{r}
bounds_voronoi <- bounds %>% 
  spm_discretize(method = "tesselate_voronoi",
                 with = biomass_dataset, 
                 nb_samples = 30)

bounds_voronoi
```

The other available method is `triangulate_delaunay` for delaunay triangulation. Here the `a` argument is used to set the size of the mesh (see `RTriangle::triangulate` for more details).

```{r, eval = FALSE}
## Not run
bounds_delaunay <- bounds %>%
  spm_discretize(method = "triangulate_delaunay", a = 1, q = 30)
bounds_delaunay
```

5. Plotting the object shows the polygons that have been created.

```{r, fig.width=7, fig.height=5, fig.fullwidth=TRUE, fig.align='center'}
plot(bounds_voronoi)
```

```{r, eval = FALSE}
## Not run
plot(bounds_delaunay)
```

6. The results of the discretization can also be explored with `spm_patches()` and `spm_points()`.

```{r}
spm_patches(bounds_voronoi)
spm_points(bounds_voronoi)
```

7. The next step in this workflow is to smooth the variables to be used in the final `sspm` model, by using spatial-temporal smoothers, by passing each dataset through `spm_smooth`. Here we first smooth `weight_per_km2` as well as `temp_at_bottom`. Note that the boundary column `sfa` can be used in the formula as the data will be first joined to the provided boundaries.

```{r}
biomass_smooth <- biomass_dataset %>%  
  spm_smooth(weight_per_km2 ~ sfa + smooth_time(by = sfa) + 
               smooth_space() + 
               smooth_space_time(),
             boundaries = bounds_voronoi, 
             family=tw)%>% 
  spm_smooth(temp_at_bottom ~ smooth_time(by=sfa, xt = NULL) +
               smooth_space() +
               smooth_space_time(xt = NULL),
             family=gaussian)

biomass_smooth
```

8. The smoothed results for any smoothed variables (listed in "smoothed vars" above) can be easily plotted. Here the 4 panels represent 4 different patches in the same SFA. The panels show a very similar pattern due to the nature of the simulated data we are using. The data points are actually different, but the differences are not noticeable in this example case.

```{r, fig.width=7, fig.height=5, fig.fullwidth=TRUE, fig.align='center'}
plot(biomass_smooth, var = "weight_per_km2", log = FALSE, interval = T)
```
You can also make a spatial plot

```{r, fig.width=7, fig.height=5, fig.fullwidth=TRUE, fig.align='center'}
plot(biomass_smooth, var = "weight_per_km2", use_sf = TRUE)
```

9. We also smooth the `weight_per_km2` variable in the predator data.

```{r}
predator_smooth <- predator_dataset %>%  
  spm_smooth(weight_per_km2 ~ smooth_time() + smooth_space(),
             boundaries = bounds_voronoi,
             drop.unused.levels = F, family=tw, method= "fREML")

predator_smooth
```

10. Before we assemble the full model with our newly smoothed data, we need to deal with the catch data. We first load the dataset.

```{r}
catch_dataset <- 
  spm_as_dataset(catch_simulated, name = "catch_data", 
                 biomass = "catch",
                 time = "year_f", 
                 uniqueID = "uniqueID", 
                 coords = c("lon_dec", "lat_dec"))

catch_dataset
```

11. We then need to aggregate this data. This illustrate using the `spm_aggregate` functions. Here we use `spm_aggregate_catch`:

```{r}
biomass_smooth_w_catch <- 
  spm_aggregate_catch(biomass = biomass_smooth, 
                      catch = catch_dataset, 
                      biomass_variable = "weight_per_km2",
                      catch_variable = "catch",
                      fill = mean)

biomass_smooth_w_catch
```

12. Once data has been smoothed, we can assemble a `sspm` model object, using one dataset of type biomass, one dataset of type predictor and (optionnaly) a dataset of type catch.

```{r}
sspm_model <- sspm(biomass = biomass_smooth_w_catch, 
                   predictors = predator_smooth)

sspm_model
```

13. Before fitting the model, we must split data into test/train with `spm_split`.

```{r}
sspm_model <- sspm_model %>% 
  spm_split(year_f %in% c(1990:2017))

sspm_model
```

14. To fit the model, we might be interested in including lagged values. This is done with `spm_lag`.

```{r}
sspm_model <- sspm_model %>% 
  spm_lag(vars = c("weight_per_km2_borealis", 
                   "weight_per_km2_all_predators"), 
          n = 1)

sspm_model
```

15. We can now fit the final spm model with `spm`. 

```{r}
sspm_model_fit <- sspm_model %>% 
  spm(log_productivity ~ sfa +
        weight_per_km2_all_predators_lag_1 +
        smooth_space(by = weight_per_km2_borealis_lag_1) +
        smooth_space(), 
      family = mgcv::scat)

sspm_model_fit
```

It is possible to access the GAM fit object in order to look at it in more details and, for example, evaluate the goodness of fit.

```{r}
gam_fit <- spm_get_fit(sspm_model_fit)
summary(gam_fit)
```

You can also use the summary method.

```{r}
summary(sspm_model_fit, biomass = "weight_per_km2_borealis")
```

16. Plotting the object produces a actual vs predicted plot (with TEST/TRAIN data highlighted.

```{r, fig.width=7, fig.height=5, fig.fullwidth=TRUE, fig.align='center'}
plot(sspm_model_fit, train_test = TRUE, scales = "free")
```

17. We can also extract the predictions.
```{r}
preds <- predict(sspm_model_fit)
head(preds)
```

We can also get the predictions for biomass by passing the biomass variable name.
```{r}
biomass_preds <- predict(sspm_model_fit, biomass = "weight_per_km2_borealis")
head(biomass_preds)
```

We can also predict the biomass one step ahead.

```{r}
biomass_one_step <- predict(sspm_model_fit, biomass = "weight_per_km2_borealis", 
                            next_ts = TRUE)
head(biomass_one_step)
```

18. We can produce an array of plots, as timeseries or as spatial plots

```{r, fig.width=7, fig.height=5, fig.fullwidth=TRUE, fig.align='center'}
plot(sspm_model_fit, log = T, scales = 'free')
plot(sspm_model_fit, log = T, use_sf = TRUE)
```

```{r, fig.width=7, fig.height=5, fig.fullwidth=TRUE, fig.align='center'}
plot(sspm_model_fit, biomass = "weight_per_km2_borealis",  scales = "free")
plot(sspm_model_fit, biomass = "weight_per_km2_borealis", use_sf = TRUE)
```

```{r, fig.width=7, fig.height=5, fig.fullwidth=TRUE, fig.align='center'}
plot(sspm_model_fit, biomass = "weight_per_km2_borealis",
     next_ts = TRUE, aggregate = TRUE, scales = "free", 
     smoothed_biomass = TRUE, interval = T)
```