---
title: "Introduction"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: Referencias.bib
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(deltapif)
```

## Introduction


The `deltapif` R package calculates **Potential Impact Fractions (PIF) and Population Attributable Fractions (PAF) for aggregated data**. It uses the delta method to derive confidence intervals, providing a robust approach for quantifying the burden of disease attributable to risk factors and the potential impact of interventions.


### Core Concepts: PAF and PIF

The **Population Attributable Fraction (PAF)** answers the question: _"What fraction of disease cases in a population would be prevented if we completely eliminated a risk factor?"_ It represents the maximum possible reduction achievable and is often interpreted as the burden of disease attributed to the exposure.

The **Potential Impact Fraction (PIF)** is a more general measure. It answers: _"What fraction of disease cases would be prevented if we changed exposure from its current distribution to a specific counterfactual scenario?"_ 

PAF is a specific type of PIF where the counterfactual scenario is the **theoretical minimum risk exposure level (TMREL)**. We remark that the TMREL is not always zero. For example, the TMREL for sodium intake is a specific healthy range (e.g., ~1.6g/day), as both too much and too little sodium are harmful. For other exposures, such as smoking, the TMREL can indeed be zero exposure. 

>**Note**: The statistical methods underlying the packge assume that **the relative risk and exposure prevalence estimates are independent** (i.e., derived from different studies or populations).

### Key assumption: Independent (summary) data sources. 

The `deltapif` package is designed for a specific, common scenario in public health:

+ The estimate of the log-relative risk (`beta`) comes from one source (e.g., a published meta-analysis).

+ The estimate of the exposure prevalence (`p`) comes from a separate, independent source (e.g., a national survey).

The delta method implementation here relies on this independence. If you have individual-level data for exposure the [`pifpaf` package](https://github.com/INSP-RH/pifpaf) is more appropriate as it leverages the individual-level variability. If individual-level exposure and outcome data is available from the same source the [`graphPAF` package](https://CRAN.R-project.org/package=graphPAF) is ideal. 


## Usage

### Population Attributable Fraction (PAF)

[@lee](https://doi.org/10.1001/jamanetworkopen.2022.19672) estimated the fraction of dementia cases attributable to smoking in the US. They reported:

  + A relative risk of 1.59 (95% CI: 1.15, 2.20)

  + A smoking prevalence of 8.5%

The point estimate of the PAF can be calculated using [Levin's formula](https://doi.org/10.1016/j.gloepi.2021.100062):

```{r}
library(deltapif)

paf(p = 0.085, beta = log(1.59), quiet = TRUE)
```

#### Incorporating Uncertainty

To calculate confidence intervals, we need the variance of the log-relative risk. The variance can be derived from the confidence interval following the [Cochrane Handbook](https://www.cochrane.org/authors/handbooks-and-manuals/handbook/current/chapter-06#section-6-3-2):

```{r}
var_log_rr <- ((log(2.20) - log(1.15)) / (2 * 1.96))^2
var_log_rr
```

We then provide the log-relative risk (`log(1.59)`) and its variance to `paf()`, specifying the `rr_link` as `exp` to convert the coefficient to a relative risk by exponentiating the log. Since the prevalence variance was not reported, we assume `var_p = 0`.

```{r}
paf_dementia <- paf(
  p         = 0.085, 
  beta      = log(1.59), 
  var_beta  = var_log_rr, 
  var_p     = 0
)
paf_dementia
```

The results match those reported by Lee et al.: **PAF = 4.9% (95% CI: 1.3–9.3)**.

### Potential Impact Fraction (PIF)

[@lee](https://doi.org/10.1001/jamanetworkopen.2022.19672) also considered a scenario reducing smoking prevalence by 15% (from 8.5% to 7.225%). The PIF for this intervention is:

```{r}
lee_pif <- pif(
  p        = 0.085, 
  p_cft    = 0.085 * (1 - 0.15), # 15% reduction
  beta     = log(1.59), 
  var_beta = var_log_rr, 
  var_p    = 0
)
lee_pif
```

This result is consistent with the reported estimate: **PIF = 0.7% (95% CI: 0.2–1.4)**.

### Attributable and averted cases

Attributable and averted cases can be calculated with the `attributable_cases` function. For
example [@dhana2023prevalence](https://pmc.ncbi.nlm.nih.gov/articles/PMC10593099/#SD2) estimate the number of people with Alzheimer's Disease in New York, USA 426.5 (400.2, 452.7) thousand. This implies a variance of `((452.7 - 400.2) / 2*qnorm(0.975))^2 = 2647.005`. 

The number of cases (in thousands) that would be averted if we reduced smoking by 15% assuming the prevalence of smoking is identical to the rest of the US is given by:

```{r}
averted_cases(426.5, lee_pif, variance = 2647.005)
```

Attributable cases can likewise be estimated using the previous `paf` as:

```{r}
attributable_cases(426.5, paf_dementia, variance = 2647.005)
```

### Combining fractions from subpopulations

Multiple fractions can be combined into totals and ensembles. For example the fraction among men and women can be combined into an overall fraction by specifying the distribution of the subgroups in the population:

```{r}
paf_men   <- paf(p = 0.41, beta = 0.31, var_p = 0.001,
                 var_beta = 0.14,
                 label = "Men")
paf_women <- paf(p = 0.37, beta = 0.35, var_p = 0.001, 
                 var_beta = 0.16,
                 label = "Women")
```

Assuming the distribution is 51% women and 49% men:

```{r}
paf_total(paf_men, paf_women, weights = c(0.49, 0.51))
```
This is equivalent to calculating:

$$
\textrm{PAF}_{\text{All}} = 0.49 \cdot \text{PAF}_{\text{Men}} + 0.51 \cdot \text{PAF}_{\text{Women}} 
$$

### Combining fractions from multiple risks

Fractions from disjointed risks can be calculated as an ensemble. For example the fraction of exposure to lead and the fraction of exposure to asbestus:

```{r}
paf_lead  <- paf(p = 0.41, beta = 0.31, var_p = 0.001,
                 var_beta = 0.014,
                 label = "Lead")
paf_absts <- paf(p = 0.61, beta = 0.15, var_p = 0.001, 
                 var_beta = 0.001,
                 label = "Asbestus")
```

A fraction of **environmental** exposure considering both can be calculated by multiplying the inverse of the fractions, assuming a commonality correction (say of `c(0.1, 0.2)`):

```{r}
paf_ensemble(paf_lead, paf_absts, weights = c(0.1, 0.2))
```

where this quantity estimates:

$$
\textrm{PAF}_{\text{Ensemble}} = 1 - (1 - 0.1 \cdot \textrm{PAF}_{\text{Lead}}) \cdot (1 - 0.2 \cdot \textrm{PAF}_{\text{Asbestus}})
$$

### Adjusting fractions for commonality

Adjuting for commonality is usually performed when different risks can be concurrent. In the previous example, exposure to lead and to asbestus can happen at the same time. [@mukadam2019population](https://doi.org/10.1016/S2214-109X(19)30074-9) propose the individual weighted (adjusted) fractions based on commonality weights. These weights represent the proportion of the variance shared among risk factors. To calculate the adjusted fractions one needs to estimate:

$$
\textrm{PIF}_k^{\text{Adjusted}} = \dfrac{\text{PIF}_k}{\sum_k \text{PIF}_k} \cdot \text{PIF}_{\text{Overall}}
$$
where 

$$
\textrm{PIF}^{\text{Overall}} = 1 - \prod\limits_k (1 - w_k \text{PIF}_k)
$$
with

$$
w_k = 1 - \text{commonality}_k
$$


The adjusted fractions can be calculated with the `weighted_adjusted` as:

```{r}
weighted_adjusted_paf(paf_lead, paf_absts, weights = c(0.2, 0.3))
```

which returns a named list of the adjusted fractions. 

## Additional information

Read the [examples](https://rodrigozepeda.github.io/deltapif/articles/Examples.html) vignette 


## References