---
title: "Imperfect serological test"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Imperfect serological test}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup, output=FALSE}
library(serosv)
library(ggplot2)
```

## Imperfect test

Function `correct_prevalence()` is used for estimating the true prevalence if the serological test used is imperfect

Arguments:

-   `data` the input data frame, must either have:

    -   `age`, `pos`, `tot` columns (for aggregated data)

    -   **OR** `age`, `status` columns for (linelisting data)

-   `bayesian` whether to adjust sero-prevalence using the Bayesian or frequentist approach. If set to `TRUE`, true sero-prevalence is estimated using MCMC.

-   `init_se` sensitivity of the serological test (default value `0.95`)

-   `init_sp` specificity of the serological test (default value `0.8`)

-   `study_size_se` (applicable when `bayesian=TRUE`) sample size for sensitivity validation study (default value `1000`)

-   `study_size_sp` (applicable when `bayesian=TRUE`) sample size for specificity validation study (default value `1000`)

-   `chains` (applicable when `bayesian=TRUE`) number of Markov chains (default to `1`)

-   `warmup` (applicable when `bayesian=TRUE`) number of warm up runs (default value `1000`)

-   `iter` (applicable when `bayesian=TRUE`) number of iterations (default value `2000`)

The function will return a list of 2 items:

-   `info`

    -   if `bayesian = TRUE` contains estimated values for se, sp and corrected seroprevalence

    -   else return the formula for computing corrected seroprevalence

-   `corrected_sero` return a data.frame with `age`, `sero` (corrected sero) and `pos`, `tot` (adjusted based on corrected prevalence)

```{r}
# ---- estimate real prevalence using Bayesian approach ----
data <- rubella_uk_1986_1987
output <- correct_prevalence(data, warmup = 1000, iter = 4000, init_se=0.9, init_sp = 0.8, study_size_se=1000, study_size_sp=3000)

# check fitted value 
output$info[1:2, ]

# ---- estimate real prevalence using frequentist approach ----
freq_output <- correct_prevalence(data, bayesian = FALSE, init_se=0.9, init_sp = 0.8)

# check info
freq_output$info
```

```{r}
# compare original prevalence and corrected prevalence
ggplot()+
  geom_point(aes(x = data$age, y = data$pos/data$tot, color="apparent prevalence")) + 
  geom_point(aes(x = output$corrected_se$age, y = output$corrected_se$sero, color="estimated prevalence (bayesian)" )) +
  geom_point(aes(x = freq_output$corrected_se$age, y = freq_output$corrected_se$sero, color="estimated prevalence (frequentist)" )) +
  scale_color_manual(
    values = c(
      "apparent prevalence" = "red", 
      "estimated prevalence (bayesian)" = "blueviolet",
      "estimated prevalence (frequentist)" = "royalblue")
  )+ 
  labs(x = "Age", y = "Prevalence")
```

### Fitting corrected data

**Data after seroprevalence correction**

Bayesian approach

```{r}
suppressWarnings(
  corrected_data <- farrington_model(
  output$corrected_se,
  start=list(alpha=0.07,beta=0.1,gamma=0.03))
)

plot(corrected_data)
```

Frequentist approach

```{r}
suppressWarnings(
  corrected_data <- farrington_model(
  freq_output$corrected_se,
  start=list(alpha=0.07,beta=0.1,gamma=0.03))
)

plot(corrected_data)
```

**Original data**

```{r}
suppressWarnings(
  original_data <- farrington_model(
  data,
  start=list(alpha=0.07,beta=0.1,gamma=0.03))
)
plot(original_data)
```