---
title: "The Inclusion of Claim Covariates in the Generation of SynthETIC Claims"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{The Inclusion of Claim Covariates in the Generation of SynthETIC Claims}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

This vignette aims to illustrate how the inclusion of covariates can influence the severity of the claims generated using the `SynthETIC` package. The distributional assumptions shown in this vignette are consistent with the default assumptions of the `SynthETIC` package (an Auto Liability portfolio). The inclusion of covariates aims to be a minor adjustment step to modelled claim sizes after Step 2: *Claim size* discussed in the [`SynthETIC-demo` vignette](https://CRAN.R-project.org/package=SynthETIC).

In particular, with this demo we will construct:

------------------------------------------------------------------------------------------
Description                     R Object
------------------------------- ----------------------------------------------------------
Covariate Inputs                `covariate_obj` = various factors, their levels and relativities for covariate frequency and claim severity

Covariate Outputs               `covariates_data_obj` = dataset of assigned covariates for each claim

*S_adj*, claim size             `claim_size_w_cov[[i]]` = claim size for all claims that occurred in period *i* after adjustment for covariates
------------------------------------------------------------------------------------------

Reference
---

To cite this package in publications, please use:

```{r, eval=FALSE}
citation("SynthETIC")
```

`SynthETIC` Set Up
---

We set up package-wise global parameters demonstrated in the `SynthETIC-demo` vignette (which can be accessed via `vignette("SynthETIC-demo", package = "SynthETIC")` or [online documentation](https://CRAN.R-project.org/package=SynthETIC)) and perform modelling Steps 1 and 2 to generate the claim frequency and claim sizes under the default assumptions. Note that changing these assumptions for Steps 1 and 2 do not affect how covariates are implemented.

```{r}
library(SynthETIC)
set.seed(20200131)

set_parameters(ref_claim = 200000, time_unit = 1/4)
ref_claim <- return_parameters()[1]
time_unit <- return_parameters()[2]

years <- 10
I <- years / time_unit
E <- c(rep(12000, I)) # effective annual exposure rates
lambda <- c(rep(0.03, I))

# Modelling Steps 1-2
n_vector <- claim_frequency(I = I, E = E, freq = lambda)
occurrence_times <- claim_occurrence(frequency_vector = n_vector)
claim_sizes <- claim_size(frequency_vector = n_vector)
```

Applying Covariates
---

To apply simulated covariates to `SynthETIC` claim sizes, a `covariates` is used in conjunction with the `claim_size_adj()` function to both simulate covariate combinations and apply adjusted claim sizes. The example `covariates` object below includes relativities for

1. the legal representation of the claim
2. the injury severity score
3. the age of the claimant.

```{r}
test_covariates_obj <- SynthETIC::test_covariates_obj
print(test_covariates_obj)
```

The `claim_size_adj()` function simulates the covariate levels for each claim and then adjusts the claim sizes according to the relativities defined above. The covariate levels for each claim can be accessed in the `covariates_data$data` attribute of the function output.

```{r}
claim_size_covariates <- claim_size_adj(test_covariates_obj, claim_sizes)
covariates_data_obj <- claim_size_covariates$covariates_data
head(data.frame(covariates_data_obj$data))
```

The adjusted claim sizes are stored in the `claim_size_adj` attribute.

```{r}
claim_size_w_cov <- claim_size_covariates$claim_size_adj
claim_size_w_cov[[1]]
```

Modelling Steps 3-5
---

Just as in Steps 1-2, Steps 3 onwards also do not require any specific adjustment in relation to implementing covariates. Guidance on implementing these modelling steps can be found in the `SynthETIC-demo` vignette. We can see from the example below that the inclusion of covariates primarily has an impact on claim sizes and thus any following modelling steps that are also impacted from the adjusted claim sizes. Note that the number of claims (`n_vector`) and the time at which they occur (`occurrence_times`) are unaffected by covariates.

```{r}
generate_claims_dataset <- function(claim_size_list) {
    
    # SynthETIC Steps 3-5
    notidel <- claim_notification(n_vector, claim_size_list)
    setldel <- claim_closure(n_vector, claim_size_list)
    no_payments <- claim_payment_no(n_vector, claim_size_list)
    
    claim_dataset <- generate_claim_dataset(
      frequency_vector = n_vector,
      occurrence_list = occurrence_times,
      claim_size_list = claim_size_list,
      notification_list = notidel,
      settlement_list = setldel,
      no_payments_list = no_payments
    )
    
    claim_dataset
}

claim_dataset <- generate_claims_dataset(claim_size_list = claim_sizes)
claim_dataset_w_cov <- generate_claims_dataset(claim_size_list = claim_size_w_cov)

head(claim_dataset)
head(claim_dataset_w_cov)
```

Appendix 1: Using Different Sets of Covariates
---

This section shows the impact of using a set of covariates different than the default values within the `SynthETIC` package.

The included framework allows a user to easily construct any set of covariates required for simulation and/or analysis. This gives the user flexibility in choosing both the number of factors in the set of covariates and the number of levels within each factor. 

The below example compares

- the default data set (without any covariate adjustments)
- an example set of three covariates included with `SynthETIC`
- a custom set of two covariates related to vehicle type and vehicle use-case (an example more relevant if studying motor claims).


```{r}
factors_tmp <- list(
    "Vehicle Type" = c("Passenger", "Light Commerical", "Medium Goods", "Heavy Goods"),
    "Business Use" = c("Y", "N")
)

relativity_freq_tmp <- relativity_template(factors_tmp)
relativity_sev_tmp <- relativity_template(factors_tmp)

# Default Values
relativity_freq_tmp$relativity <- c(
    5, 1.5, 0.35, 0.25,
    1, 4,
    1, 0.6,
    0.35, 0.01,
    0.25, 0,
    2.5, 5
)

relativity_sev_tmp$relativity <- c(
    0.25, 0.75, 1, 3,
    1, 1,
    1, 1,
    1, 1,
    1, 1,
    1.3, 1
)

test_covariates_obj_veh <- covariates(factors_tmp)
test_covariates_obj_veh <- set.covariates_relativity(
    covariates = test_covariates_obj_veh, 
    relativity = relativity_freq_tmp, 
    freq_sev = "freq"
)
test_covariates_obj_veh <- set.covariates_relativity(
    covariates = test_covariates_obj_veh, 
    relativity = relativity_sev_tmp, 
    freq_sev = "sev"
)

claim_size_covariates_veh <- claim_size_adj(test_covariates_obj_veh, claim_sizes)

# Comparison of the same claim size except with adjustments due to covariates
data.frame(
    Claim_Size = head(round(claim_sizes[[1]]))
    ,Claim_Size_Original_Covariates = head(round(claim_size_covariates$claim_size_adj[[1]]))
    ,Claim_Size_New_Covariates = head(round(claim_size_covariates_veh$claim_size_adj[[1]]))
)

# Covariate Levels
head(claim_size_covariates$covariates_data$data)
head(claim_size_covariates_veh$covariates_data$data)

```


Appendix 2: Applying Known Covariate Values
---

To apply specific covariate values for each claim occurrence, we can use the parameter `covariates_id` when constructing the `covariates_data` object. 
This would map the each claim to a corresponding known covariate value from a dataset and apply the relevant severity relativities. Note that in this case, the frequency relativities would not be used, as no simulation of covariate values are performed.

In the example below, we have a known dataset of covariates, which can be mapped to each of the claim sizes. In the covariates dataset, we know:

- The first 25 rows are *Passenger* vehicles not for *Business Use*
- The next 25 rows are *Light Commercial* vehicles not for *Business Use*
- The next 25 rows are *Passenger* vehicles for *Business Use*
- The last 25 rows are *Light Commercial* vehicles for *Business Use*

As a result, we can use the indices for each of these rows to map each set of covariates to its associated claim. In this case, the first 50 claims are related to the last 50 rows in the covariates dataset in reverse order, and claims 51--100 are related to the first 50 rows in the covariates dataset.

```{r}
claim_sizes_known <- list(c(
    rexp(n = 100, rate = 1.5)
))

known_covariates_dataset <- data.frame(
    "Vehicle Type" = rep(rep(c("Passenger", "Light Commerical"), each = 25), times = 2),
    "Business Use" = c(rep("N", times = 50), rep("Y", times = 50))
)
colnames(known_covariates_dataset) <- c("Vehicle Type", "Business Use")

covariates_data_veh <- covariates_data(
    test_covariates_obj_veh, 
    data = known_covariates_dataset, 
    covariates_id = list(c(100:51, 1:50))
)

claim_sizes_adj_tmp <- claim_size_adj.fit(
    covariates_data = covariates_data_veh,
    claim_size = claim_sizes_known
)

head(claim_sizes_adj_tmp[[1]])

```