---
title: "Mixing model in Short Creek"
author: "George G. Vega Yon, Ph.D."
date: today
vignette: >
  %\VignetteIndexEntry{Mixing model in Short Creek}
  %\VignetteEngine{quarto::html}
  %\VignetteEncoding{UTF-8}
---

::: {.callout-warning title="Warning"}
The data and model used in this vignette are for demonstration purposes only and do not reflect real-world conditions accurately. As in most agent-based models, the simulations performed here are simplified representations of the reality and provide rich information for scenario modeling, but not necessarily for forecasting or parameter estimation.
:::

During the 2025 US Measles outbreak, the number of cases in the state of Utah started to climb quickly in the Short Creek region (on the Utah-Arizona border). This vignette shows how we can use the mixing model of the `measles` R package to do scenario modeling of the situation using both age and school data.

## Set up

We start by loading the `measles` R package. Since the package depends on [`epiworldR`](https://cran.r-project.org/package=epiworldR){target="_blank"}, `epiworldR` is loaded automatically too:

```{r}
#| label: load
library(measles)
```

For the data, we need two datasets: information about the population distribution by age and vaccination status, and the contact matrix. The `measles` R package comes with a pre-processed dataset that was created using the [`multigroup.vaccine` R package](https://github.com/EpiForeSITE/multigroup-vaccine). This data uses real vaccination rates from publicly available school records, and population age structure and composition from the latest US census. Vaccination rates for the non-school-aged population were imputed based on assumptions and do not reflect the actual vaccination information for those age groups. We can load the needed data using the `data()` function:

```{r}
#| label: data
data(short_creek, package = "measles")
data(short_creek_matrix, package = "measles")
```

Let's take a look at the datasets:

```{r}
#| label: data-peek
short_creek
```

As we can see, the vaccination rates in this area are significantly low. The vaccination rate needed to avert an outbreak is close to the 0.93 percent of the population. Let's now look at the first few entries of the contact matrix

```{r}
#| label: data-peak-matrix
# Looking at the first 5 columns
short_creek_matrix[, 1:5] |>
  round(2)

# Looking at it as a heatmap
image(short_creek_matrix)
```

One important thing to note is that the data in `short_creek` must coincide with that in the contact matrix, particularly, there should be one entry in the `data.frame` for each row/column of the contact matrix.

## Creating the model

To create the model, we do the following:

```{r}
#| label: create-model
N <- sum(short_creek$agepops)

measles_model <- ModelMeaslesMixing(
  n                            = N,
  prevalence                   = 1 / N,
  contact_rate                 = 15 / 0.9 / 4,
  transmission_rate            = 0.9,
  vax_efficacy                 = 0.97,
  contact_matrix               = short_creek_matrix,
  hospitalization_rate         = 0.1,
  hospitalization_period       = 7,
  days_undetected              = 2,
  quarantine_period            = 21,
  quarantine_willingness       = 0.9,
  isolation_willingness        = 0.9,
  isolation_period             = 4,
  prop_vaccinated              = 0.95,
  contact_tracing_success_rate = 0.8,
  contact_tracing_days_prior   = 4
)
```

Now, since this is a mixing model, we are required to specify the entities. To do so, we can either use the functions `entity()` and `add_entity()`, or use the wrapper `add_entities_from_dataframe()` as follows:


```{r}
#| label: distribute-virus
# Adding the entities to the model
add_entities_from_dataframe(
  model = measles_model,
  entities = short_creek,
  col_name = "age_labels",
  col_number = "agepops",
  as_proportion = FALSE
)
```

We can also specify the vaccination rates. There are multiple ways in which we can do that. The default behavior of the model is, at each run, randomly sample `prop_vaccinated` percent of the population and assign the vaccine. Thus, the number of vaccinated individuals will be constant across simulations. Nonetheless, since the schools and age groups have different vaccination rates, it is better to use the `distribute_tool_to_entities()` function. Like the `add_entities_from_dataframe()` function, the `distribute_tool_to_entities()` function simplifies the process:

```{r}
#| label: distribute-tool
# Creating the distribution function
dist_fun <- distribute_tool_to_entities(
  prevalence = short_creek$vacc_rate,
  as_proportion = TRUE
)

# Setting the distribution function
measles_model |>
  get_tool(0) |>
  set_distribution_tool(dist_fun)
```

Now, we can specify what to record from the simulation.

```{r}
#| label: running-model
measles_model |>
  run_multiple(
    ndays = 100,
    nsims = 100,
    seed  = 8812,
    saver = make_saver("outbreak_size", "hospitalizations"),
    nthreads = 2L
  )
```

After calling the function `run_multiple()`, the C++ function will write the information to disk. Before we read in the data, we can take a look at the summary of the model, which will give us an overview of the last run, including how much time it spend per simulation, and what is the transsition matrix for the current run.

```{r}
#| label: summarize
summary(measles_model)
```

To retrieve the results, we use the `run_multiple_get_results()` function:

```{r}
#| label: run_multiple_get_results
# Extracting the results
ans <- measles_model |>
  run_multiple_get_results(
    freader = data.table::fread
  )

# Taking a look at the structure
str(ans)
```

The function call will get us the results as a list of data.frames (data.table objects in this case). We will use the `data.table` package to manipulate the information.

```{r}
#| label: data.table
# Converting into data.table format for convenience
library(data.table)
outbreak_size <- ans$outbreak_size |> as.data.table()
hospitalizations <- ans$hospitalizations |> as.data.table()
```

## Outbreak size

Finally, since we are only interested about the final outbreak size (in this case), can will collapse the data to get the total number of cases at the final simulation day. Subsequently, we can plot the results using the `hist` function:

```{r}
#| label: timeline-outbreak-size
# Aggregating to get the final
with(
  outbreak_size,
  {
    plot(
      x = date,
      y = outbreak_size,
      col = adjustcolor("blue", alpha.f = .2),
      pch = 19,
      ylab = "Cases",
      xlab = "Day",
      main = "Measles outbreak size in Short Creek",
      sub = "Mixing model with age and school data (100 simulations)"
    )
})
```

We can also investigate the distribution of the final counts

```{r}
#| label: hist-outbreak-size
# Aggregating to get the final
with(
  outbreak_size[date == max(date)],
  {
    hist(
      outbreak_size,
      main = "Measles outbreak size in Short Creek",
      sub = "Mixing model with age and school data (100 simulations)",
      breaks = 20
    )
})
```

## Hospitalizations

In the case of the hospitalizations, we can draw a similar figure. The hospitalizations data contains the following information:

1. Cases per virus (just measles in this case).
2. Cases per tool (or the lack of). Information about "no tool" is recorded with a `tool_id == -1`.
3. The counts (how many records in the data).
4. Weights.

The weights attribute is to ensure that we take into consideration counting individuals more than once. For instance, if a model has more than one tool (not just vaccination), the individual who has two tools would be included twice in `count`. Thus, if we wanted to count the raw number of hospitalization cases, we would add across `weight`, but the `count` variable yields how many individuals were hospitalized under that combination of tool and virus id.


```{r}
#| label: hospitalizations
hosp_tot <- hospitalizations[, .(total = sum(count)), by = .(sim_num, tool_id)]
hosp_tot[, status := fifelse(tool_id == -1, "Unvax", "Vax")]
hosp_tot[, tool_id := NULL]
```

We can create a couple of boxplot to show how many cases we see per vaccination status:

```{r}
#| label: summary-hosp
hosp_tot <- merge(
  hosp_tot[status == "Unvax", .(sim_num, unvax = total)],
  hosp_tot[status == "Vax", .(sim_num, vax = total)],
  all = TRUE
)

# Filling zeros
hosp_tot[, unvax := fcoalesce(unvax, 0L)]
hosp_tot[, vax := fcoalesce(vax, 0L)]

hosp_tot[, .(vax, unvax)] |>
  as.matrix() |>
  boxplot(
    ylab = "Count",
    xlab = "Status",
    main = "Hospitalization per vaccination status",
    sub  = "Mixing model with age and school data (100 simulations)"
  )
```

## Final comments

Ultimately, based on the current results, we would be expecting all unvaccinated individuals to become infected, with a small number of vaccinated individuals also getting infected. This is consistent with the high R0 of measles and the low vaccination rates in the area. Nonetheless, these results should be taken with caution, as this simulation is a simplification of reality that does not consider, for example, individuals' changing their behavior as a function of what is observed; it would be natural to expect some level of precaution, *e.g.*, reducing contact rates, would be observed in such a situation.