--- title: "Examples" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Examples} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} bibliography: Referencias.bib --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(deltapif) ``` # Examples ## Example 1: Estimating the PAF for the proportion of dementia attributable to smoking The article by [@lee](https://doi.org/10.1001/jamanetworkopen.2022.19672) estimates the PAF of dementia associated with 12 risk factors (`logrr`) in US adults. The package includes this data as well as the proportion of exposed individuals in the respective race column (Hispanic, Asian, Black and White): ```{r} data(dementiarisk) ``` ```{r, echo = FALSE} dementiarisk[,c("risk_factor","logrr","sdlog", "total", "hispanic", "asian", "black", "white")] ``` In this example, we show how to calculate the population attributable fraction for **Smoking** among the 4 race groups and then how to calculate the **Total Population Attributable Fraction** of the overall population. Notice that **Smoking** has a log relative risk of: + `beta = 0.4637340` with variance of + `var_beta = 0.0273858` (`= 0.165486566^2`). Among hispanic individuals 6.9% (`p = 0.069`) smoke. Hence their PAF is: ```{r} paf_hispanic <- paf(p = 0.069, beta = 0.4637340, var_beta = 0.0273858, var_p = 0, rr_link = exp, label = "Hispanic") paf_hispanic ``` The other fractions can be computed in the same way noting that 4.9% of non-hispanic asians, 11.7% of non-hispanic blacks and 8.4% of non-hispanic whites smoke: ```{r} paf_asian <- paf(p = 0.049, beta = 0.4637340, var_beta = 0.0273858, var_p = 0, rr_link = exp, label = "Asian") paf_black <- paf(p = 0.117, beta = 0.4637340, var_beta = 0.0273858, var_p = 0, rr_link = exp, label = "Black") paf_white <- paf(p = 0.084, beta = 0.4637340, var_beta = 0.0273858, var_p = 0, rr_link = exp, label = "White") ``` ## Example 2: Combining subpopulations into a total We can calculate the **total population attributable fraction** by combining the fractions of the subpopulations weighted by the proportion of the population. According to Wikipedia the distribution in 2020 of the US population by race was as follows: ```{r} weights <- c("white" = 0.5784, "hispanic" = 0.1873, "black" = 0.1205, "asian" = 0.0592) #Normalized weights to sum to 1 weights <- weights / sum(weights) ``` The `paf_total` function computes the aggregated `paf` of the whole population: ```{r} paf_population <- paf_total(paf_white, paf_hispanic, paf_black, paf_asian, weights = weights, var_weights = 0, label = "All") paf_population ``` where we set `var_weights = 0` as no covariance matrix was known for the race distribution data (it does exist its just not on Wikipedia). The `paf` total implements the following formula: $$ \textrm{PAF}_{\text{Total}} = w_1 \cdot \text{PAF}_{\text{White}} + w_2 \cdot \text{PAF}_{\text{Hispanic}} + w_3 \cdot \text{PAF}_{\text{Black}} + w_4 \cdot \text{PAF}_{\text{Asian}} $$ where the weights stand for the proportion of the population in each subgroup. The `coef`, `confint`, `summary`, and `as.data.frame` functions have been extended to allow the user to extract information from a `paf` (or `pif`): ```{r} as.data.frame(paf_population) ``` Attributable cases can be calculated for example using the estimates from [@dhana2023prevalence](https://pmc.ncbi.nlm.nih.gov/articles/PMC10593099/#SD2). They estimate the number of people with Alzheimer's Disease in New York, USA 426.5 (400.2, 452.7) thousand. This implies a variance of `((452.7 - 400.2) / 2*qnorm(0.975))^2 = 2647.005`. ```{r} attributable_cases(426.5, paf_population, variance = 2647) ``` ## Example 3: Estimating the PIF for the proportion of dementia reduced by a 15% reduction in smoking. [@lee](https://doi.org/10.1001/jamanetworkopen.2022.19672) also estimates the proportion of dementia cases that would be reduced by a 15% decrease in smoking prevalence. This 15% reduction corresponds to multiplying the current smoking prevalence by `1 - 0.15 = 0.85` specified with the prevalence under the counterfactual (`p_cft`) variable: ```{r} pif_hispanic <- pif(p = 0.069, p_cft = 0.069*0.85, beta = 0.4637340, var_beta = 0.0273858, var_p = 0, rr_link = exp, label = "Hispanic") pif_asian <- pif(p = 0.049, p_cft = 0.049*0.85, beta = 0.4637340, var_beta = 0.0273858, var_p = 0, rr_link = exp, label = "Asian") pif_black <- pif(p = 0.117, p_cft = 0.117*0.85, beta = 0.4637340, var_beta = 0.0273858, var_p = 0, rr_link = exp, label = "Black") pif_white <- pif(p = 0.084, p_cft = 0.084*0.85, beta = 0.4637340, var_beta = 0.0273858, var_p = 0, rr_link = exp, label = "White") ``` The total fraction of the overall population can be aggregated with `pif_total`: ```{r} pif_population <- pif_total(pif_white, pif_hispanic, pif_black, pif_asian, weights = weights, var_weights = 0, label = "All") pif_population ``` We can use the `as.data.frame` function to create a `data.frame` with all of the results: ```{r} df <- as.data.frame(pif_population, pif_white, pif_black, pif_hispanic, pif_asian) df ``` Averted cases can be calculated with the same number as before with the `averted_cases` function: ```{r} averted_cases(426.5, pif_population, variance = 2647) ``` ## Example 4: A categorical population attributable fraction [@pandeya](https://doi.org/10.1111/1753-6405.12456) estimate the population attributable fraction of different cancers given distinct levels of alcohol consumption for the Australian population. The `alcohol` dataframe included in the package contains an estimate of the intake of alcohol by sex as well as the proportion of individuals in each of the intake-categories: ```{r} data(alcohol) ``` ```{r, echo = FALSE} alcohol[1:12,c("sex", "alcohol_g", "median_intake", "age_18_plus")] ``` The relative risk of alcohol varies by consumption level (`alcohol_g`). For example in the case of rectal cancer the estimated relative risk is 1.10 (1.07–1.12) per 10g/day. We will need to compute the relative risk for each of the consumption level groups using their median intake as follows: ```{r} #Get alcohol intake for men divided by 10 cause RR is per 10g/day alcohol_men_intake <- c(0, 0.8, 1.6, 3.2, 7.1, 12.6, 17.4, 24.5, 34.7, 44.2, 54.4, 85.2) / 10 #Divided by 10 cause risk is per 10 g/day relative_risk <- exp(log(1.10)*alcohol_men_intake) #Divide by 100 to get the prevalence in decimals prevalence <- c(0.472, 0.014, 0.022, 0.081, 0.081, 0.069, 0.046, 0.073, 0.042, 0.032, 0.02, 0.048) #Calculate the paf paf(prevalence, beta = relative_risk, var_p = 0, var_b = 0) ``` If we want to add uncertainty we can approximate the covariance of the relative risk as follows: ```{r} #Calculate covariance of the betas given by variance*intake_i*intake_j beta_sd <- (1.12 - 1.07) / (2*qnorm(0.975)) beta_var <- matrix(NA, ncol = length(prevalence), nrow = length(prevalence)) for (k in 1:length(prevalence)){ for (j in 1:length(prevalence)){ beta_var[k,j] <- beta_sd^2*(alcohol_men_intake[j])*(alcohol_men_intake[k]) } } #Calculate the paf paf_males <- paf(prevalence, beta = relative_risk, var_p = 0, var_b = beta_var, label = "Males") paf_males ``` We can repeat the same process for females: ```{r} #Get alcohol intake for women divided by 10 cause RR is per 10g/day alcohol_female_intake <- c(0, 0.8, 1.6, 3.9, 7.1, 12.6, 17.4, 24.5, 34.7, 44.2, 54.4, 78.1) / 10 #Divided by 10 cause risk is per 10 g/day relative_risk <- exp(log(1.10)*alcohol_female_intake) #Divide by 100 to get the prevalence in decimals prevalence <- c(0.603, 0.02, 0.054, 0.091, 0.081, 0.061, 0.026, 0.034, 0.015, 0.007, 0.003, 0.005) #Calculate covariance of the betas given by variance*intake_i*intake_j beta_sd <- (1.12 - 1.07) / (2*qnorm(0.975)) beta_var <- matrix(NA, ncol = length(prevalence), nrow = length(prevalence)) for (k in 1:length(prevalence)){ for (j in 1:length(prevalence)){ beta_var[k,j] <- beta_sd^2*(alcohol_female_intake[j])*(alcohol_female_intake[k]) } } #Calculate the paf paf_females <- paf(prevalence, beta = relative_risk, var_p = 0, var_b = beta_var, label = "Females") paf_females ``` The total fraction can be computed with `paf_total` assuming both males and females are 50% each of the population: ```{r} paf_total(paf_males, paf_females, weights = c(0.5, 0.5)) ``` ## Example 5: Combining fractions of different risks into an ensemble @lee combines multiple population attributable fractions from different risk exposures can be combined into a single fraction. For example, we can create an attributable fraction for **behavioral** risk factors by joining the fractions for: smoking, physical inactivity, and excesive alcohol consumption into one fraction that represents the reduction in cases if all of these exposures were taken to the level of minimum risk. For that purpose we calculate each of the fractions separately and then use the `paf_ensemble` function to join them. We first obtain the log-relative risk, its standard deviation, and the total proportion of individuals exposed (here we highlight the part of the table): ```{r} data(dementiarisk) ``` ```{r, echo = FALSE} dementiarisk[which(dementiarisk$risk_factor %in% c("Excessive alcohol","Smoking","Physical inactivity")), c("risk_factor", "logrr", "sdlog", "total")] ``` We then calculate the population attributable fraction for each: ```{r} paf_alcohol <- paf(p = 0.036, beta = 0.1655144, var_beta = 0.05402095^2, var_p = 0, label = "Alcohol") paf_smoking <- paf(p = 0.085, beta = 0.4637340, var_beta = 0.16548657^2, var_p = 0, label = "Smoking") paf_physical <- paf(p = 0.628, beta = 0.3293037, var_beta = 0.09296182^2, var_p = 0, label = "Physical Activity") ``` The `paf_ensemble` function allows to calculate the ensemble `paf` by estimating the product: $$ \textrm{PAF}_{\text{Ensemble}} = 1 - (1 - w_1 \cdot\textrm{PAF}_{\text{Alcohol}})\cdot (1 - w_2 \cdot\textrm{PAF}_{\text{Smoking}})\cdot (1 - w_3\cdot \textrm{PAF}_{\text{Physical}}) $$ where the weights are oftentimes calculated via a communality correction or are set to 1 for disjointed risks. In the case of this paper the communality estimates were: + **Physical inactivity:** 31.1 + **Smoking:** 5.9 + **Excessive alcohol:** 3.8 hence we can use them as the `weights` argument: ```{r} paf_ensemble(paf_alcohol, paf_physical, paf_smoking, weights = c(0.038, 0.311, 0.059), label = "Behavioural factors") ``` Finally, variance among the weights can be implemented with the `var_weights` argument providing a variance-covariance matrix: ```{r} #Variance covariance matrix weight_variance <- matrix( c( 1.00, 0.21, -0.02, 0.21, 1.00, 0.09, -0.02, 0.09, 1.00), byrow = TRUE, ncol = 3 ) colnames(weight_variance) <- c("Alcohol", "Physical inactivity", "Smoking") rownames(weight_variance) <- c("Alcohol", "Physical inactivity", "Smoking") paf_ensemble(paf_alcohol, paf_physical, paf_smoking, weights = c(0.038, 0.311, 0.059), var_weights = weight_variance, label = "Behavioural factors") ``` ## Example 6: Adjusted fractions Adjusting for communality is a way to discount the contribution of correlated risk factors. @lee estimates the adjusted fractions by computing the ensemble fraction as above (lee calls it Overall): $$ \textrm{PAF}_{\text{Ensemble}} = 1 - (1 - w_1 \cdot\textrm{PAF}_{\text{Alcohol}})\cdot (1 - w_2 \cdot\textrm{PAF}_{\text{Smoking}})\cdot (1 - w_3\cdot \textrm{PAF}_{\text{Physical}}) $$ And then normalizing the proportion of each fraction as part of the ensemble. For example with alcohol: $$ \textrm{PAF}_{\text{Alcohol}}^{\text{Adj}} = \dfrac{\textrm{PAF}_{\text{Alcohol}}}{\textrm{PAF}_{\text{Alcohol}} + \textrm{PAF}_{\text{Smoking}} + \textrm{PAF}_{\text{Physical}}} \cdot \textrm{PAF}_{\text{Ensemble}} $$ The same process is repeated for adjusting smoking and physical by just changing the numerator of the fraction. This can be computed in the package as: ```{r} #Calculate each of the fractions paf_alcohol <- paf(p = 0.036, beta = 0.1655144, var_beta = 0.05402095^2, var_p = 0, label = "Alcohol") paf_smoking <- paf(p = 0.085, beta = 0.4637340, var_beta = 0.16548657^2, var_p = 0, label = "Smoking") paf_physical <- paf(p = 0.628, beta = 0.3293037, var_beta = 0.09296182^2, var_p = 0, label = "Physical Activity") #Get the communality weights cweights <- c(0.038, 0.311, 0.059) #Calculate the adjusted versions weighted_adjusted_paf(paf_alcohol, paf_smoking, paf_physical, weights = cweights) ``` ## Example 7: Ensuring non-negative fractions @lee2024comparison show an example with a relative risk of `rr = 7.0` (log relative risk `1.94591`) and a log-variance of `0.35.` that results in a negative confidence interval when applying directly the delta method: ```{r} paf(p = 0.05, beta = 1.94, var_beta = 0.591^2, var_p = 0) ``` The `pif` and `paf` functions contain the `logit` link option for guaranteeing positivity in the fraction's intervals: ```{r} paf(p = 0.05, beta = 1.94, var_beta = 0.591^2, var_p = 0, link = "logit") ``` We remark that this option is not turned on by default as the **positivity** of the fraction depends on: 1. The relative risk being strictly greater than 1. 2. Epidemiological rationale on why the fraction should be positive. ## Example 8: Computing averted and attributable cases Attributable and averted cases can be calculated with the `attributable_cases` (resp. `averted_cases`) function. For example [@dhana2023prevalence](https://pmc.ncbi.nlm.nih.gov/articles/PMC10593099/#SD2) estimates the number of people with Alzheimer's Disease in New York, USA 426.5 (400.2, 452.7) thousand. This implies a variance of `((452.7 - 400.2) / 2*qnorm(0.975))^2 = 2647.005`. To estimate cases we need to: 1. Estimate the corresponding potential impact fraction (averted) or population attributable fraction (attributable) 2. Use the `averted_cases` (resp. `attributable_cases` function) We start by calcualting the same fraction as before from [@lee](https://doi.org/10.1001/jamanetworkopen.2022.19672). This considers the impact on dementia of reducing smoking prevalence by 15% (from 8.5% to 7.225%). The PIF for this intervention is: ```{r} pif_smoking <- pif( p = 0.085, p_cft = 0.085 * (1 - 0.15), # 15% reduction beta = log(1.59), var_beta = ((log(2.20) - log(1.15)) / (2 * 1.96))^2, var_p = 0 ) pif_smoking ``` ```{r} averted_cases(426.5, pif_smoking, variance = 2647.005) ``` ## Example 9: Computing averted cases (strictly positive) In some cases it might be of interest to guarantee the positivity of the averted cases. Following the previous example, a higher (hypothetical) variance might result in a confidence interval that includes negative values: ```{r} averted_cases(426.5, pif_smoking, variance = 100000) ``` When there is an epidemiological rationale on the interval being strictly positive (for example, the relative risk is known to be strictly greater than 1) we can change the link function to `logit` to guarantee the interval is always positive: ```{r} averted_cases(426.5, pif_smoking, variance = 100000, link = "log") ``` ## Example 10: Using Hazard Ratios (HR) and Odds Ratios (OR) instead of Relative Risks (RR) Sometimes Hazard Rates or Odds Ratios are reported in the literature. As the potential impact fraction and the population attributable fraction require relative risks, one needs to convert the former. One way is via the `EValues` package which uses the method described in [@mathurevalue](https://doi.org/10.1080/01621459.2018.1529598). In the following example we used the Hazard Ratio of shingles vaccination on dementia by Wu et al: ```{r, echo = FALSE} upper_rr <- 0.7965867 lower_rr <- 0.7579948 rr_value <- 0.7735267 ``` ```{r, eval = FALSE} library(EValue) upper_rr <- HR(0.72, rare = FALSE) |> toRR() |> as.numeric() lower_rr <- HR(0.67, rare = FALSE) |> toRR() |> as.numeric() rr_value <- HR(0.69, rare = FALSE) |> toRR() |> as.numeric() ``` which results in the risk of `r rr_value` with interval `r c(round(lower_rr, 2), round(upper_rr, 2))`. From here one can calculate the potential impact fraction of doubling vaccination coverage from the baseline of `p = 0.438` to `p_cft = 0.876`: ```{r} pif( p = 0.438, beta = log(rr_value), p_cft = 0.876, var_beta = ((log(upper_rr) - log(lower_rr)) / (2 * 1.96))^2, quiet = TRUE ) ``` ## References