---
title: "Examples"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Examples}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: Referencias.bib  
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(deltapif)
```

# Examples

## Example 1: Estimating the PAF for the proportion of dementia attributable to smoking

The article by [@lee](https://doi.org/10.1001/jamanetworkopen.2022.19672) estimates the PAF of dementia associated with 12 risk factors (`logrr`) in US adults. The package includes this data as well as the proportion of exposed individuals in the respective race column (Hispanic, Asian, Black and White):

```{r}
data(dementiarisk)
```


```{r, echo = FALSE}
dementiarisk[,c("risk_factor","logrr","sdlog", "total", "hispanic", "asian", "black", "white")]
```

In this example, we show how to calculate the population attributable fraction for **Smoking** among the 4 race groups and then how to calculate the **Total Population Attributable Fraction** of the overall population. 


Notice that **Smoking** has a log relative risk of:

+ `beta = 0.4637340` 

with variance of	

+ `var_beta = 0.0273858` (`= 0.165486566^2`). 

Among hispanic individuals 6.9% (`p = 0.069`) smoke. Hence their PAF is:

```{r}
paf_hispanic <- paf(p = 0.069, beta = 0.4637340, var_beta = 0.0273858, var_p = 0,
                    rr_link = exp, label = "Hispanic")
paf_hispanic
```

The other fractions can be computed in the same way noting that 4.9%	of non-hispanic asians, 11.7% of non-hispanic blacks and	8.4% of non-hispanic whites smoke:

```{r}
paf_asian <- paf(p = 0.049, beta = 0.4637340, var_beta = 0.0273858, 
                    var_p = 0, rr_link = exp, label = "Asian")
paf_black <- paf(p = 0.117, beta = 0.4637340, var_beta = 0.0273858, 
                    var_p = 0, rr_link = exp, label = "Black")
paf_white <- paf(p = 0.084, beta = 0.4637340, var_beta = 0.0273858, 
                    var_p = 0, rr_link = exp, label = "White")
```


## Example 2: Combining subpopulations into a total

We can calculate the **total population attributable fraction** by combining the fractions of the subpopulations weighted by the proportion of the population. According to Wikipedia the distribution in 2020 of the US population by race was as follows:

```{r}
weights <- c("white" = 0.5784, "hispanic" = 0.1873, 
             "black" = 0.1205, "asian" = 0.0592)

#Normalized weights to sum to 1
weights <- weights / sum(weights)
```

The `paf_total` function computes the aggregated `paf` of the whole population:

```{r}
paf_population <- paf_total(paf_white, paf_hispanic, paf_black, 
                            paf_asian, weights = weights, var_weights = 0,
                            label = "All")
paf_population
```

where we set `var_weights = 0` as no covariance matrix was known for the race distribution data (it does exist its just not on Wikipedia). 

The `paf` total implements the following formula:

$$
\textrm{PAF}_{\text{Total}} = w_1 \cdot \text{PAF}_{\text{White}} + 
  w_2 \cdot \text{PAF}_{\text{Hispanic}} +
  w_3 \cdot \text{PAF}_{\text{Black}} +
  w_4 \cdot \text{PAF}_{\text{Asian}}
$$

where the weights stand for the proportion of the population in each subgroup. 

The `coef`, `confint`, `summary`, and `as.data.frame` functions have been extended to 
allow the user to extract information from a `paf` (or `pif`):

```{r}
as.data.frame(paf_population)
```

Attributable cases can be calculated for example using the estimates from 
[@dhana2023prevalence](https://pmc.ncbi.nlm.nih.gov/articles/PMC10593099/#SD2). They estimate the number of people with Alzheimer's Disease in New York, USA 426.5 (400.2, 452.7) thousand. This implies a variance of `((452.7 - 400.2) / 2*qnorm(0.975))^2 = 2647.005`. 

```{r}
attributable_cases(426.5, paf_population, variance = 2647)
```

<!---
### Why are we using the log relative risk?

A common question is why in this case we are using the log relative risk (`logrr`) with `exp` link and not the relative risks directly. The reason depends on what standard deviation can be properly
recovered. Recall that the classic (Wald-type) confidence intervals should have the same
distance from the point estimate (in this case the `RR`) to the bounds. We can see that
this does not happen here as for example in the case of **smoking** the RR is
1.59 [1.15-2.20] and:

```{r}
2.20 - 1.59
```

while 

```{r}
1.59 - 1.15
```

The distances are not the same!

However in the log-scale they are much more similar:

```{r}
log(2.20) - log(1.59)
```

```{r}
log(1.59) - log(1.15)
```


which suggests that the uncertainty was estimated in the log-scale and hence we
need to use the log relative risk as beta and exponentiate it (`link = exp`). 
Furthermore we can go to the original paper, check the methodology and conclude
that the confidence interval (and hence the variance!) was estimated using the log-relative risk 
and then exponentiated. Hence  the uncertainty should be propagated for the 
log risk and then exponentiated. 

Mathematically the explanation of why this happens is because usually relative risks,
odds ratios and hazard ratios are transformed to the log-scale to estimate
the confidence intervals. You can see in Wikipedia, for example, 
the classic formulas for the [relative risks CIs](https://en.wikipedia.org/wiki/Relative_risk)
and for the [odds ratios](https://en.wikipedia.org/wiki/Odds_ratio#Statistical_inference) all involve a log-transform. 
--->

## Example 3: Estimating the PIF for the proportion of dementia reduced by a 15% reduction in smoking. 

[@lee](https://doi.org/10.1001/jamanetworkopen.2022.19672) also estimates the proportion of dementia cases that would be reduced by a 15% decrease in smoking prevalence. This 15% reduction corresponds to multiplying the current smoking prevalence by `1 - 0.15 = 0.85`  specified with the prevalence under the counterfactual (`p_cft`) variable: 

```{r}
pif_hispanic <- pif(p = 0.069, p_cft = 0.069*0.85, beta = 0.4637340, var_beta = 0.0273858, 
                    var_p = 0, rr_link = exp, label = "Hispanic")
pif_asian    <- pif(p = 0.049, p_cft = 0.049*0.85, beta = 0.4637340, var_beta = 0.0273858, 
                    var_p = 0, rr_link = exp, label = "Asian")
pif_black    <- pif(p = 0.117, p_cft = 0.117*0.85, beta = 0.4637340, var_beta = 0.0273858, 
                    var_p = 0, rr_link = exp, label = "Black")
pif_white    <- pif(p = 0.084, p_cft = 0.084*0.85, beta = 0.4637340, var_beta = 0.0273858, 
                    var_p = 0, rr_link = exp, label = "White")
```

The total fraction of the overall population can be aggregated with `pif_total`:

```{r}
pif_population <- pif_total(pif_white, pif_hispanic, pif_black, 
                            pif_asian, weights = weights, var_weights = 0,
                            label = "All")
pif_population
```

We can use the `as.data.frame` function to create a `data.frame` with all of the results:

```{r}
df <- as.data.frame(pif_population, pif_white, pif_black, pif_hispanic, pif_asian)
df
```


Averted cases can be calculated with the same number as before with the `averted_cases` function:

```{r}
averted_cases(426.5, pif_population, variance = 2647)
```

## Example 4: A categorical population attributable fraction

[@pandeya](https://doi.org/10.1111/1753-6405.12456)
estimate the population attributable fraction of different cancers given
distinct levels of alcohol consumption for the Australian population. 

The `alcohol` dataframe included in the package contains an estimate
of the intake of alcohol by sex as well as the proportion of
individuals in each of the intake-categories:

```{r}
data(alcohol)
```

```{r, echo = FALSE}
alcohol[1:12,c("sex", "alcohol_g", "median_intake", "age_18_plus")]
```

The relative risk of alcohol varies by consumption level (`alcohol_g`). For example
in the case of rectal cancer the estimated relative risk is 1.10 (1.07–1.12) per 10g/day. We will need to compute the relative risk for each of the consumption level groups using their median intake as follows:

```{r}
#Get alcohol intake for men divided by 10 cause RR is per 10g/day
alcohol_men_intake <- c(0, 0.8, 1.6, 3.2, 7.1, 12.6, 17.4, 24.5, 34.7, 
                        44.2, 54.4, 85.2) / 10 

#Divided by 10 cause risk is per 10 g/day
relative_risk <- exp(log(1.10)*alcohol_men_intake)

#Divide by 100 to get the prevalence in decimals
prevalence    <- c(0.472, 0.014, 0.022, 0.081, 0.081, 0.069, 0.046, 
                   0.073, 0.042, 0.032, 0.02, 0.048)

#Calculate the paf
paf(prevalence, beta = relative_risk, var_p = 0, var_b = 0)
```

If we want to add uncertainty we can approximate the covariance of the relative risk as follows:

```{r}
#Calculate covariance of the betas given by variance*intake_i*intake_j
beta_sd <- (1.12 - 1.07) / (2*qnorm(0.975))
beta_var <- matrix(NA, ncol = length(prevalence), nrow = length(prevalence))
for (k in 1:length(prevalence)){
  for (j in 1:length(prevalence)){
    beta_var[k,j] <- beta_sd^2*(alcohol_men_intake[j])*(alcohol_men_intake[k])
  }
}

#Calculate the paf
paf_males <- paf(prevalence, beta = relative_risk, var_p = 0, var_b = beta_var, 
               label = "Males")
paf_males
```

We can repeat the same process for females:

```{r}
#Get alcohol intake for women divided by 10 cause RR is per 10g/day
alcohol_female_intake <- c(0, 0.8, 1.6, 3.9, 7.1, 12.6, 17.4, 
                           24.5, 34.7, 44.2, 54.4, 78.1) / 10

#Divided by 10 cause risk is per 10 g/day
relative_risk <- exp(log(1.10)*alcohol_female_intake)

#Divide by 100 to get the prevalence in decimals
prevalence    <- c(0.603, 0.02, 0.054, 0.091, 0.081, 0.061, 0.026, 
                   0.034, 0.015, 0.007, 0.003, 0.005)

#Calculate covariance of the betas given by variance*intake_i*intake_j
beta_sd <- (1.12 - 1.07) / (2*qnorm(0.975))
beta_var <- matrix(NA, ncol = length(prevalence), nrow = length(prevalence))
for (k in 1:length(prevalence)){
  for (j in 1:length(prevalence)){
    beta_var[k,j] <- beta_sd^2*(alcohol_female_intake[j])*(alcohol_female_intake[k])
  }
}

#Calculate the paf
paf_females <- paf(prevalence, beta = relative_risk, var_p = 0, var_b = beta_var, 
                   label = "Females")

paf_females
```

The total fraction can be computed with `paf_total` assuming both males and females are 50% each of the population:

```{r}
paf_total(paf_males, paf_females, weights = c(0.5, 0.5))
```

## Example 5: Combining fractions of different risks into an ensemble

@lee combines multiple population attributable fractions from different risk exposures can be combined into a single fraction. For example, we can create an attributable fraction for **behavioral** risk factors by joining the fractions for: smoking, physical inactivity, and excesive alcohol consumption into one fraction that represents the reduction in cases if all of these exposures were taken to the level of minimum risk. 

For that purpose we calculate each of the fractions separately and then use the `paf_ensemble` function to join them. We first obtain the log-relative risk, its standard deviation, and the total proportion of individuals exposed (here we highlight the part of the table):

```{r}
data(dementiarisk)
```

```{r, echo = FALSE}
dementiarisk[which(dementiarisk$risk_factor %in% 
                     c("Excessive alcohol","Smoking","Physical inactivity")),
             c("risk_factor", "logrr", "sdlog", "total")]
```

We then calculate the population attributable fraction for each:

```{r}
paf_alcohol  <- paf(p = 0.036, beta = 0.1655144, var_beta = 0.05402095^2, 
                    var_p = 0, label = "Alcohol")
paf_smoking  <- paf(p = 0.085, beta = 0.4637340, var_beta = 0.16548657^2, 
                    var_p = 0, label = "Smoking")
paf_physical <- paf(p = 0.628, beta = 0.3293037, var_beta = 0.09296182^2, 
                    var_p = 0, label = "Physical Activity")
```

The `paf_ensemble` function allows to calculate the ensemble `paf` by estimating the product:

$$
\textrm{PAF}_{\text{Ensemble}} = 1 - (1 - w_1 \cdot\textrm{PAF}_{\text{Alcohol}})\cdot (1 - w_2 \cdot\textrm{PAF}_{\text{Smoking}})\cdot (1 - w_3\cdot \textrm{PAF}_{\text{Physical}})
$$

where the weights are oftentimes calculated via a communality correction or are set to 1 for disjointed risks. In the case of this paper the communality estimates were:

+ **Physical inactivity:** 31.1

+ **Smoking:** 5.9

+ **Excessive alcohol:** 3.8

hence we can use them as the `weights` argument:

```{r}
paf_ensemble(paf_alcohol, paf_physical, paf_smoking, 
             weights = c(0.038, 0.311, 0.059),
             label = "Behavioural factors")
```

Finally, variance among the weights can be implemented with the `var_weights` argument providing a variance-covariance matrix:


```{r}
#Variance covariance matrix
weight_variance <- matrix(
  c( 1.00, 0.21, -0.02, 
     0.21, 1.00,  0.09,
    -0.02, 0.09,  1.00), byrow = TRUE, ncol = 3
)
colnames(weight_variance) <- c("Alcohol", "Physical inactivity", "Smoking")
rownames(weight_variance) <- c("Alcohol", "Physical inactivity", "Smoking")

paf_ensemble(paf_alcohol, paf_physical, paf_smoking, 
             weights     = c(0.038, 0.311, 0.059),
             var_weights = weight_variance,
             label       = "Behavioural factors")
```

## Example 6: Adjusted fractions

Adjusting for communality is a way to discount the contribution of correlated risk factors. @lee estimates the adjusted fractions by computing the ensemble fraction as above (lee calls it Overall):

$$
\textrm{PAF}_{\text{Ensemble}} = 1 - (1 - w_1 \cdot\textrm{PAF}_{\text{Alcohol}})\cdot (1 - w_2 \cdot\textrm{PAF}_{\text{Smoking}})\cdot (1 - w_3\cdot \textrm{PAF}_{\text{Physical}})
$$

And then normalizing the proportion of each fraction as part of the ensemble. For example with alcohol:

$$
\textrm{PAF}_{\text{Alcohol}}^{\text{Adj}} = \dfrac{\textrm{PAF}_{\text{Alcohol}}}{\textrm{PAF}_{\text{Alcohol}} + \textrm{PAF}_{\text{Smoking}} + \textrm{PAF}_{\text{Physical}}} \cdot \textrm{PAF}_{\text{Ensemble}}
$$
The same process is repeated for adjusting smoking and physical by just changing the numerator of the fraction. This can be computed in the package as:

```{r}
#Calculate each of the fractions
paf_alcohol  <- paf(p = 0.036, beta = 0.1655144, var_beta = 0.05402095^2, 
                    var_p = 0, label = "Alcohol")
paf_smoking  <- paf(p = 0.085, beta = 0.4637340, var_beta = 0.16548657^2, 
                    var_p = 0, label = "Smoking")
paf_physical <- paf(p = 0.628, beta = 0.3293037, var_beta = 0.09296182^2, 
                    var_p = 0, label = "Physical Activity")

#Get the communality weights
cweights <- c(0.038, 0.311, 0.059)

#Calculate the adjusted versions
weighted_adjusted_paf(paf_alcohol, paf_smoking, paf_physical, weights = cweights)
```


## Example 7: Ensuring non-negative fractions

@lee2024comparison show an example with a relative risk of `rr = 7.0` (log relative risk `1.94591`)
and a log-variance of `0.35.` that results in a negative confidence interval when applying directly the delta method:

```{r}
paf(p = 0.05, beta = 1.94, var_beta = 0.591^2, var_p = 0)
```

The `pif` and `paf` functions contain the `logit` link option for guaranteeing positivity in the fraction's intervals: 

```{r}
paf(p = 0.05, beta = 1.94, var_beta = 0.591^2, var_p = 0, link = "logit")
```

We remark that this option is not turned on by default as the **positivity** of the fraction depends on:

1. The relative risk being strictly greater than 1. 

2. Epidemiological rationale on why the fraction should be positive. 

## Example 8: Computing averted and attributable cases

Attributable and averted cases can be calculated with the `attributable_cases` (resp. `averted_cases`) function. For
example [@dhana2023prevalence](https://pmc.ncbi.nlm.nih.gov/articles/PMC10593099/#SD2) estimates the number of people with Alzheimer's Disease in New York, USA 426.5 (400.2, 452.7) thousand. This implies a variance of `((452.7 - 400.2) / 2*qnorm(0.975))^2 = 2647.005`. 

To estimate cases we need to:

1. Estimate the corresponding potential impact fraction (averted) or population attributable fraction (attributable)
2. Use the `averted_cases` (resp. `attributable_cases` function)

We start by calcualting the same fraction as before from [@lee](https://doi.org/10.1001/jamanetworkopen.2022.19672). This considers the impact on dementia of reducing smoking prevalence by 15% (from 8.5% to 7.225%). The PIF for this intervention is:

```{r}
pif_smoking <- pif(
  p        = 0.085, 
  p_cft    = 0.085 * (1 - 0.15), # 15% reduction
  beta     = log(1.59), 
  var_beta = ((log(2.20) - log(1.15)) / (2 * 1.96))^2, 
  var_p    = 0
)
pif_smoking
```

```{r}
averted_cases(426.5, pif_smoking, variance = 2647.005)
```

## Example 9: Computing averted cases (strictly positive)

In some cases it might be of interest to guarantee the positivity of the averted cases. Following the previous example, 
a higher (hypothetical) variance might result in a confidence interval that includes negative values:

```{r}
averted_cases(426.5, pif_smoking, variance = 100000)
```
When there is an epidemiological rationale on the interval being strictly positive (for example, the relative risk is known to be strictly greater than 1) we can change the link function to `logit` to guarantee the interval is always positive:

```{r}
averted_cases(426.5, pif_smoking, variance = 100000, link = "log")
```

## Example 10: Using Hazard Ratios (HR) and Odds Ratios (OR) instead of Relative Risks (RR)

Sometimes Hazard Rates or Odds Ratios are reported in the literature. As the potential impact fraction and the population
attributable fraction require relative risks, one needs to convert the former. One way is via the `EValues` package 
which uses the method described in [@mathurevalue](https://doi.org/10.1080/01621459.2018.1529598).

In the following example we used the Hazard Ratio of shingles vaccination on dementia by Wu et al:

```{r, echo = FALSE}
upper_rr <- 0.7965867
lower_rr <- 0.7579948
rr_value <- 0.7735267
```

```{r, eval = FALSE}
library(EValue)

upper_rr <- HR(0.72, rare = FALSE) |> toRR() |> as.numeric()
lower_rr <- HR(0.67, rare = FALSE) |> toRR() |> as.numeric()
rr_value <- HR(0.69, rare = FALSE) |> toRR() |> as.numeric()
```

which results in the risk of `r rr_value` with interval `r c(round(lower_rr, 2), round(upper_rr, 2))`. From here one can calculate the potential impact fraction of doubling vaccination coverage from the baseline of `p =  0.438` to `p_cft = 0.876`:

```{r}
pif(
  p     = 0.438,
  beta  = log(rr_value),
  p_cft = 0.876,
  var_beta = ((log(upper_rr) - log(lower_rr)) / (2 * 1.96))^2,
  quiet = TRUE
)
```

## References