During the life cycle of a medicinal product, knowledge of the safety profile continues to accumulate post market introduction. Adverse event reports collected through spontaneous reporting play an instrumental part in this development.
Adverse event report databases often cover a wide range of drugs and events, beyond what can be feasibly monitored through manual work. Disproportionality analysis is thus often used to highlight drug-event-combinations of potential interest for further qualitative manual case-by-case assessment.
The pvda
package provides three disproportionality
estimators: the proportional reporting rate (PRR), the reporting odds
ratio (ROR) and the information component (IC). It takes report level
data as input, and outputs estimates for each drug-event-combination,
possibly stratified by some variable of interest, e.g. age group and/or
sex.
The workhorse of this package is the function da
. The
da
-function takes patient level data in a data frame as
input (here drug_event_df
) and produces estimates and
summary counts as demonstrated below.
The drug_event_df
is a simulated data set with drugs
named Drug_A - Drug_Z and events Event_1 - Event_1000. Further details
about the simulation are documented in the data object (execute
?df_drug_event
).
The da-function will examine the passed data set for column names contained in the passed parameter list “df_colnames”. The default column names are “report_id”, “drug” and “event”. If we rename the report-id column in “drug_event_df” we need to specify the new name in the call to da, as shown below:
The ‘da_1’ object can then be passed to a summary and a print function.
(Technical note: As printing of colors in the console is not easily reproduced within this vignette , we provide screenshots when applicable. )
The input data needs to be in a report-level format. every line in the data frame corresponds to a drug-event pair from a specific report.
In the drug_event_df-example data, the first three rows of drug_event_df are from the same report, with report_id = 1. The first row reports a Drug_D and an adverse event named Event_5.The next three rows are from another report, where for instance drug B has been reported for two different events, event 15 and event 33. We can ignore the group-column for now.
## # A tibble: 6 × 4
## report_id drug event group
## <int> <chr> <chr> <dbl>
## 1 1 Drug_A Event_23 1
## 2 1 Drug_D Event_6 1
## 3 1 Drug_E Event_8 1
## 4 1 Drug_I Event_9 1
## 5 2 Drug_B Event_2 0
## 6 2 Drug_G Event_13 0
For covenience, the output object of da is a list of class “da” containing two objects, the results (“da_df”) and the input parameters (“input_params”). At some point you might want to extract the results or the input parameters list, here we demonstrate some alternatives:
## [1] "da_df" "input_params"
The da
function builds a comparator from the passed
data, e.g. for prr
by excluding the drug of interest from
the comparator, sometimes referred to as the background. This implies
that any subgroup analysis of e.g. specific age groups can be achieved
by creating a new data frame with the subgroup of interest, and then
passing it to da
.
If a more exhaustive grouping is required, one can provide a grouping
variable group_by
to da
. This executes a call
to da
for each grouping and return the results as a single
data frame. The grouping could for instance be age groups, sex,
countries, regions or a combination thereof, but needs to be passed as a
single variable (character or numeric).
The drug_event_df contains two groups, in the column named “group”.
We can pass “group” as the group_by
parameter in
df_colnames
as below.
## # A tibble: 6 × 4
## report_id drug event group
## <int> <chr> <chr> <dbl>
## 1 1 Drug_A Event_23 1
## 2 1 Drug_D Event_6 1
## 3 1 Drug_E Event_8 1
## 4 1 Drug_I Event_9 1
## 5 2 Drug_B Event_2 0
## 6 2 Drug_G Event_13 0
## # A tibble: 1,808 × 24
## drug event group obs exp_rrr ic2.5 ic ic97.5 exp_prr prr2.5 prr
## <chr> <chr> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Drug_Z Event_1 0 45 26.5 0.3 0.75 1.14 23.4 1.54 1.92
## 2 Drug_Z Event_1 1 43 29.1 0.09 0.56 0.96 26.5 1.28 1.63
## 3 Drug_Z Event_2 0 33 24.6 -0.12 0.42 0.87 23.2 1.07 1.42
## 4 Drug_Z Event_2 1 42 27.8 0.12 0.59 0.99 25.1 1.3 1.67
## 5 Drug_V Event_2 0 34 26.0 -0.14 0.38 0.82 24.6 1.04 1.38
## 6 Drug_V Event_2 1 46 31.0 0.12 0.56 0.95 27.8 1.3 1.66
## 7 Drug_V Event_1 0 40 28.0 0.03 0.51 0.92 25.8 1.21 1.55
## 8 Drug_V Event_1 1 42 32.4 -0.1 0.37 0.77 30.3 1.07 1.38
## 9 Drug_U Event_2 0 42 31.5 -0.06 0.41 0.81 29.1 1.11 1.44
## 10 Drug_U Event_2 1 37 26.8 -0.04 0.46 0.89 24.9 1.14 1.49
## # ℹ 1,798 more rows
## # ℹ 13 more variables: prr97.5 <dbl>, exp_ror <dbl>, ror2.5 <dbl>, ror <dbl>,
## # ror97.5 <dbl>, n_drug <int>, n_event <int>, n_tot <int>, n_event_prr <int>,
## # n_tot_prr <int>, b <int>, c <int>, d <int>
In the output we note the third column is “group”, i.e. the column
name we passed. We also note that the drug-event-pairs are kept
together, e.g. first and second row contains the same
drug-event-combination, “Drug_Z” and “Event_1”. This is to simplify
comparisons within a drug-event-pair across different groups, e.g. if
the same pattern is seen across both males and females. The ordering of
the drug-events in the output is made by averaging the selected
disproportionality estimate (parameter sort_by
), i.e. the
top two rows have the highest average “ic2.5”.
If the same drug-event pair occurs several times in one report, these contributes to counts only once. That is, an observed count of five means that there were five different reports containing the drug-event-pair (at least once). It cannot be due to a single report containing the same drug-event pair five times due to repeated events or data quality issues. This is demonstrated below, where we duplicate the first row of drug_event_df and still arrive at an observed count of 1.
first_row <- drug_event_df[1,]
first_row |>
dplyr::bind_rows(first_row) |>
da() |>
purrr::pluck("da_df") |>
dplyr::pull(obs)
## [1] 1
Detailed formulas for PRR, RRR and IC are documented within each function ( “?prr”, “?rrr”, “?ic”), but overall one can note that all expected counts are derived from the same contingency table of those who were and were not exposed to the drug, and did and did not experience the adverse event. The sample used for the contingency table is the data frame passed to the da function. See also the grouping parameter.
A common question regards the output column names, ic is preceded by “exp_prr”. This is as the information component uses the same expected count as the Relative Reporting Rate (RRR), which is perhaps more well known in the pharmacovigilance community.
# Note the fourth column name:
drug_event_df |>
da() |>
purrr::pluck("da_df") |>
dplyr::select(drug:ic97.5) |>
print(n=1)
## # A tibble: 1,306 × 7
## drug event obs exp_rrr ic2.5 ic ic97.5
## <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl>
## 1 Drug_Z Event_1 88 55.6 0.34 0.66 0.94
## # ℹ 1,305 more rows
The function da
has several input parameters. One is the
rule_of_N
, by default set to 3, which is sometimes referred
to as “rule of three”. This sets ROR and PRR-values to NA if the
observed count is less than the specified N. For completeness, note that
the default shrinkage in the IC acts as a built in ‘rule of 3’,
(i.e. the shrinkage of +0.5 prevents the lower bound to exceed 0 at the
default significance level of 95%).
The function da
has a parameter
number_of_digits
controls the rounding of non-count values
in the output, including all expected counts, uncertainty bounds and
point estimates.
Yes, you can calculate point estimates with confidence or credibility intervals for specific drug-event-combination if you already have the number of observed reports, and other required counts such as the database total count, number of reports with the exposure and number of reports with the event. This can be helpful for instance to quality check results, e.g. from papers or other software.
## # A tibble: 1 × 3
## prr2.5 prr prr97.5
## <dbl> <dbl> <dbl>
## 1 0.266 0.5 0.940
## # A tibble: 1 × 3
## ror2.5 ror ror97.5
## <dbl> <dbl> <dbl>
## 1 11.6 25 54.1
## # A tibble: 1 × 3
## ic2.5 ic ic97.5
## <dbl> <dbl> <dbl>
## 1 -0.0973 0.933 1.69
For further details on the specific input parameters, consult the documentation of each function.
The data.table-package (through “dtplyr”) is used for fast execution.
A test of the execution speed was made using the vaers
R
package, available on gitlab. The
vaers
package contains data from VAERS from years 1990 -
2018, resulting in 4 146 778 rows to be processed by da
in
pvda
. Execution time on a regular laptop was less than 10
seconds.