--- title: "Creating ADRS with GCIG Criteria" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Creating ADRS with GCIG Criteria} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(admiraldev) ``` This article describes creating an `ADRS` ADaM dataset in ovarian cancer studies based on Gynecological Cancer Intergroup (GCIG) criteria. Note that only the GCIG specific steps are covered in this vignette. To get a detailed guidance on all the steps, refer the [Creating ADRS (Including Non-standard Endpoints)](adrs.html). # Introduction Carcinoma antigen-125 (CA-125) is the most commonly used biomarker in ovarian cancer. The Gynecological Cancer Intergroup (GCIG) proposed the criteria for CA-125 response and progression and specified the situations in which CA-125 criteria could be used. These guidelines are becoming more and more popular in clinical trials for ovarian cancer and are often used as one of the secondary endpoints. ```{r echo=FALSE} list_cases <- tibble::tribble( ~"Use case", ~"Use Recommended by GCIG", ~"Not Standard and Needs Further Validation", ~"Not Recommended by GCIG", "First-line trials", "CA-125 progression", "", "CA-125 response", "Maintenance or consolidation trials", "", "CA-125 response and progression ", "", "Relapse trials", "CA-125 response and progression", "", "" ) library(magrittr) list_cases %>% gt::gt() %>% gt::cols_label_with(fn = ~ gt::md(paste0("**", .x, "**"))) %>% gt::tab_header( title = "GCIG recommendations for CA-125 criteria for response and progression in various clinical situations" ) ``` However, the CA-125 criteria can be a bit tricky to use in programming, especially when used alongside the RECIST 1.1. We aim to share our current knowledge and experience in implementing GCIG criteria for ovarian clinical trials. Additionally, we have made certain assumptions regarding how data is collected on CRFs to perform response analysis according to the GCIG criteria. We hope this vignette provides valuable guidance on ADRS programming and highlights key considerations for data collection in relation to these criteria. For more information about GCIG criteria user may visit [GCIG guidelines on response criteria in ovarian cancer](https://pubmed.ncbi.nlm.nih.gov/21270624/) ## CA-125 response categories In further considerations, ULRR stands for Upper Limit of Reference Range. The CA-125 response categories for ovarian cancer are: 1. **CA-125 Complete Response**: baseline CA-125 >= 2 * ULRR, later reduced by at least 50% to normal confirmed at least 4 weeks later. 2. **CA-125 Partial Response**: baseline CA-125 >= 2 * ULRR, later reduced by at least 50% but not to normal confirmed at least 4 weeks later. 3. **Stable Disease**: CA-125 level does not meet the criteria for either partial response or progression disease. 4. **Progression**: This is defined as CA-125 >= 2 * ULRR or CA-125 >= 2 * nadir on 2 occasions at least 1 week apart. - category A: **CA-125 elevated at baseline returning to normal after treatment**: CA-125 >= 2 * ULRR on 2 occasions at least 1 week apart. - category B: **CA-125 elevated at baseline not returning to normal after treatment**: CA-125 >= 2 * nadir value (lowest level reached by CA-125) on 2 occasions at least 1 week apart. - category C: **CA-125 within reference range at baseline**: CA-125 >= 2 * ULRR on 2 occasions at least 1 week apart. 5. **Not Evaluable**: This is when the patient's response cannot be evaluated due to various reasons such as receiving mouse antibodies or having medical/surgical interference with their peritoneum or pleura during the previous 28 days. ## Ways the data is collected in the Ovarian Cancer study CRF ### Required: - Serum CA-125 results as per protocol and mapped in `LB` SDTM domain. - RECIST 1.1 tumor size and response as per protocol and mapped in `RS`/`TR`/`TU` SDTM domain. **Note**: collection of CA-125 and RECIST 1.1 tumor assessment may not be always same visit. CA-125 can be collected more frequently than tumor assessment. ### May be Collected: - CA-125 progression - CA-125 response - Combined response of CA-125 and RECIST 1.1 based on GCIG criteria ### Assumptions: For this vignette we made assumptions that following information is collected on the CRF: 1. CA-125 confirmed responses are collected on the CRF and mapped to `SDTM RS` domain. Please note that we are not programmatically confirming the CA-125 response. 2. The date of CA-125 response is assigned to the date of the first measurement when the CA-125 level is reduced by 50%. The date of CA-125 progression is assigned to the date of the first measurement that meets progression criteria. This is assumed to be collected on the CRF CA-125 response page. 3. CA-125 responses are mapped to `CR`, `PR`, `SD`, `PD`, `NE` as shown below. ```{r echo=FALSE} # styler: off list_resp <- tibble::tribble( ~"CA-125 response per GCIG", ~"CA-125 response mapped", "Response within Normal Range", "CR", "Response but not within Normal Range", "PR", "Non-Response/Non-PD", "SD", "PD", "PD", "NE", "NE" ) knitr::kable(list_resp) # styler: on ``` 4. Combined CA-125 and RECIST 1.1 response per visit where both RECIST 1.1 and CA-125 is evaluated is also collected on the CRF. The investigator should already consider CA-125 confirmed response and each visit RECIST 1.1 response based on GCIG document. 5. In `SUPPRS` there are records with below `QNAM` and `QLABEL` values: ```{r echo=FALSE} # styler: off list_supp <- tibble::tribble( ~"QNAM", ~"QLABEL", ~"QVAL", ~"Purpose", ~"Use case", "`CA125EFL`", "CA-125 response evaluable", "Y/N", "Indicates population evaluable for CA-125 response (baseline CA-125 >= 2 * ULRR and no mouse antibodies)", "`CA125EFL` variable", "`CAELEPRE`", "Elevated pre-treatment CA-125", "Y/N", "Indicates CA-125 level at baseline (Y - elevated, N - not elevated)", "Derivation of PD category (`MCRIT1`/`MCRIT1ML`/`MCRIT1MN`)", "`MOUSEANT`", "Received mouse antibodies", "Y", "Indicates if a prohibited therapy was received", "Derivation of `ANL02FL`", "`CA50RED`", ">=50% reduction from baseline", "Y", "Indicates response, but does not distinguish between CR and PR", "Not used in further derivations", "`CANORM2X`", "CA125 normal, lab increased >=2x ULRR", "Y", "Indicates PD category A or C", "Derivation of PD category (`MCRIT1`/`MCRIT1ML`/`MCRIT1MN`)", "`CNOTNORM`", "CA125 not norm, lab increased >=2x nadir", "Y", "Indicates PD category B", "Derivation of PD category (`MCRIT1`/`MCRIT1ML`/`MCRIT1MN`)" ) knitr::kable(list_supp) # styler: on ``` The above `SUPPRS` records refer only to records with `RS.RSCAT = "CA125"`. The exception is `QNAM = "MOUSEANT"`, which appears for both `RS.RSCAT = "CA125"` and `RS.RSCAT = "RECIST 1.1 - CA125"` (for the same visits). If the data are collected by other ways and similar information from `SUPPRS` dataset are not available, they should be derived in advance from `LB` CA-125 measurements records. Information on whether the patient has received mouse antibodies should also be taken into account. # Required Packages The examples of this vignette require the following packages. ```{r, warning=FALSE, message=FALSE} library(admiral) library(admiralonco) library(pharmaversesdtm) library(pharmaverseadam) library(metatools) library(dplyr) library(tibble) ``` # Programming Workflow - [Read in Data](#readdata) - [Pre-processing of Input Records](#input) - [Analysis Flag Derivation](#anlfl) - [CA-125 Progression](#pdca) - [CA-125 Progression Category](#pdcacat) - [CA-125 Best Confirmed Overall Response](#cborca) - [Combined Best Unconfirmed Overall Response](#combubor) - [Combined Best Confirmed Overall Response](#combcbor) ## Read in Data {#readdata} To start, all data frames needed for the creation of `ADRS` should be read into the environment. This will be a company specific process. Some of the data frames needed are `ADSL` and `RS`. For example purpose, the SDTM and ADaM datasets (based on CDISC Pilot test data)---which are included in `{pharmaversesdtm}` and `{pharmaverseadam}`---are used. ```{r message=FALSE} adsl <- pharmaverseadam::adsl # GCIG sdtm data rs <- pharmaversesdtm::rs_onco_ca125 supprs <- pharmaversesdtm::supprs_onco_ca125 rs <- combine_supp(rs, supprs) rs <- convert_blanks_to_na(rs) ``` ```{r, eval=TRUE, echo=FALSE} dataset_vignette( rs, display_vars = exprs(USUBJID, RSTESTCD, RSCAT, RSSTRESC, VISIT, CA125EFL, CAELEPRE, CA50RED, CANORM2X, CNOTNORM, MOUSEANT) ) ``` At this step, it may be useful to join `ADSL` to your `RS` domain. Only the `ADSL` variables used for derivations are selected at this step. ```{r echo=FALSE} # select subjects from adsl such that there is one subject without RS data rs_subjects <- unique(rs$USUBJID) adsl_subjects <- unique(adsl$USUBJID) adsl <- filter( adsl, USUBJID %in% union(rs_subjects, setdiff(adsl_subjects, rs_subjects)[1]) ) ``` ```{r eval=TRUE} adsl_vars <- exprs(RANDDT, TRTSDT) adrs <- derive_vars_merged( rs, dataset_add = adsl, new_vars = adsl_vars, by_vars = get_admiral_option("subject_keys") ) ``` ## Pre-processing of Input Records {#input} ### Select Overall Response Records per Response Criteria Category The next step is to assign parameter level values such as `PARAMCD`, `PARAM`,`PARAMN`, `PARCAT1`, etc. For this, a lookup can be created based on the SDTM `RSCAT`, `RSTESTCD` and `RSEVAL` values to join to the source data. ```{r, echo=TRUE, message=FALSE} param_lookup <- tribble( ~RSCAT, ~RSTESTCD, ~RSEVAL, ~PARAMCD, ~PARAM, ~PARAMN, ~PARCAT1, ~PARCAT1N, ~PARCAT2, ~PARCAT2N, # CA-125 "CA125", "OVRLRESP", "INVESTIGATOR", "OVRCA125", "CA-125 Overall Response by Investigator", 1, "CA-125", 1, "Investigator", 1, # RECIST 1.1 "RECIST 1.1", "OVRLRESP", "INVESTIGATOR", "OVRR11", "RECIST 1.1 Overall Response by Investigator", 2, "RECIST 1.1", 2, "Investigator", 1, # Combined "RECIST 1.1 - CA125", "OVRLRESP", "INVESTIGATOR", "OVRR11CA", "Combined Overall Response by Investigator", 3, "Combined", 3, "Investigator", 1 ) ``` This lookup may now be joined to the source data and this is how the parameters will look like: ```{r, eval=TRUE, include=TRUE, message=FALSE} adrs <- derive_vars_merged_lookup( adrs, dataset_add = param_lookup, by_vars = exprs(RSCAT, RSTESTCD, RSEVAL) ) ``` ```{r, echo=FALSE} dataset_vignette( adrs %>% arrange(!!!get_admiral_option("subject_keys"), PARAMN, RSSEQ), display_vars = exprs(USUBJID, VISIT, RSCAT, RSTESTCD, RSEVAL, PARAMCD, PARAM, PARCAT1, PARCAT2) ) ``` ### Partial Date Imputation and Deriving `ADT`, `ADTF`, `AVISIT` etc. If your data collection allows for partial dates, you could apply a company-specific imputation rule at this stage when deriving `ADT`. For this example, here we impute missing day to last possible date. ```{r} adrs <- adrs %>% derive_vars_dt( dtc = RSDTC, new_vars_prefix = "A", highest_imputation = "D", date_imputation = "last" ) %>% derive_vars_dy( reference_date = TRTSDT, source_vars = exprs(ADT) ) %>% derive_vars_dtm( dtc = RSDTC, new_vars_prefix = "A", highest_imputation = "D", date_imputation = "last", flag_imputation = "time" ) %>% mutate(AVISIT = VISIT) ``` ```{r, echo=FALSE} dataset_vignette( adrs, display_vars = exprs(USUBJID, PARAMCD, VISIT, AVISIT, RSDTC, ADT, ADTF, ADY) ) ``` ### Derive `AVALC` and `AVAL` Since the set of CA-125 response categories is a subset of the RECIST 1.1 response categories and the set of combined response categories overlaps with the set of RECIST 1.1 response categories, we can use the `admiralonco::aval_resp()` function to assign `AVAL` (ordered from best to worst response). ```{r} adrs <- adrs %>% mutate( AVALC = RSSTRESC, AVAL = aval_resp(AVALC) ) ``` ```{r, echo=FALSE} dataset_vignette( adrs, display_vars = exprs(USUBJID, PARAMCD, AVISIT, ADT, AVAL, AVALC) ) ``` ## Analysis Flag Derivation {#anlfl} When deriving `ANzzFL` this is an opportunity to exclude any records that should not contribute to any downstream parameter derivations. ### Flag One Assessment at Each Analysis Date (`ANL01FL`) {#anl01fl} In the below example: - we consider only valid assessments and those occurring on or after randomization date, - subject identifier variables (usually `STUDYID` and `USUBJID`), `PARAMCD` and `ADT` are a unique key, - if there is more than one assessment at a date, the worst one is flagged (ensure that the appropriate `mode` is being set in the `admiral::derive_var_extreme_flag()`), - to get the correct ordering, we will define the `worst_resp()` function. ```{r} worst_resp <- function(arg) { case_when( arg == "NE" ~ 1, arg == "CR" ~ 2, arg == "PR" ~ 3, arg == "SD" ~ 4, arg == "NON-CR/NON-PD" ~ 5, arg == "PD" ~ 6, TRUE ~ 0 ) } adrs <- adrs %>% restrict_derivation( derivation = derive_var_extreme_flag, args = params( by_vars = c(get_admiral_option("subject_keys"), exprs(PARAMCD, ADT)), order = exprs(worst_resp(AVALC), RSSEQ), new_var = ANL01FL, mode = "last" ), filter = !is.na(AVAL) & ADT >= RANDDT ) ``` ### Flag Assessments up to First PD or After Receiving Mouse Antibodies (`ANL02FL`) {#anl02fl} To restrict response data up to and including first reported progressive disease `ANL02FL` flag could be created by using `{admiral}` function `admiral::derive_var_relative_flag()`. According to GCIG guidelines, assessments after patients received mouse antibodies or if there has been medical and/or surgical interference with their peritoneum or pleura during the previous 28 days **should not be considered**. **Note**: In our vignette, we will use the variable `MOUSEANT`, which indicates whether mouse antibodies have been received. The user can similarly include a variable that indicates whether there has been medical and/or surgical interference with the patient's peritoneum or pleura in the past 28 days. ```{r} adrs <- adrs %>% derive_var_relative_flag( by_vars = c(get_admiral_option("subject_keys"), exprs(PARAMCD)), order = exprs(ADT, RSSEQ), new_var = ANL02FL, condition = (AVALC == "PD" | MOUSEANT == "Y"), mode = "first", selection = "before", inclusive = TRUE ) ``` ### CA-125 Response Evaluable Flag {#eflca} Patients can only be evaluable for a CA-125 response if they have pre-treatment sample that is at least twice the upper limit of the reference range and within 2 weeks before starting the treatment. In our case, this information is collected only for CA-125 records while a flag is needed at the patient level. CA-125 Response Evaluable Flag can easily be derived using `derive_var_merged_exist_flag()` function. ```{r} adrs <- adrs %>% select(-CA125EFL) %>% derive_var_merged_exist_flag( dataset_add = adrs, by_vars = get_admiral_option("subject_keys"), new_var = CA125EFL, condition = (CA125EFL == "Y") ) ``` ```{r, echo=FALSE} dataset_vignette( adrs, display_vars = exprs(USUBJID, AVISIT, PARAMCD, AVALC, ADT, ANL01FL, ANL02FL, CA125EFL) ) ``` ### Select Source Assessments for Parameter Derivations For next parameter derivations we consider: - one assessment at each analysis date (`ANL01FL = "Y"`), - assessments up to and including first PD or mouse antibodies whatever comes first (`ANL02FL = "Y"`), - CA-125 Response Evaluable population (not for derivation of CA-125 progression), ```{r} # used for derivation of CA-125 PD ovr_pd <- filter(adrs, PARAMCD == "OVRCA125" & ANL01FL == "Y" & ANL02FL == "Y") # used for derivation of CA-125 response parameters ovr_ca125 <- filter(adrs, PARAMCD == "OVRCA125" & CA125EFL == "Y" & ANL01FL == "Y" & ANL02FL == "Y") # used for derivation of unconfirmed best overall response from RECIST 1.1 and confirmed CA-125 together ovr_ubor <- filter(adrs, PARAMCD == "OVRR11CA" & CA125EFL == "Y" & ANL01FL == "Y" & ANL02FL == "Y") # used for derivation of confirmed best overall response from RECIST 1.1 and confirmed CA-125 together ovr_r11 <- filter(adrs, PARAMCD == "OVRR11" & CA125EFL == "Y" & ANL01FL == "Y" & ANL02FL == "Y") ``` Now that we have the input records prepared above with any company-specific requirements, we can start to derive new parameter records. ## CA-125 Progression {#pdca} The function `admiral::derive_extreme_records()` can be used to find the date of first CA-125 PD. ```{r} adrs <- adrs %>% derive_extreme_records( dataset_ref = adsl, dataset_add = ovr_pd, by_vars = get_admiral_option("subject_keys"), filter_add = AVALC == "PD", order = exprs(ADT), mode = "first", keep_source_vars = exprs(everything()), set_values_to = exprs( PARAMCD = "PDCA125", PARAM = "CA-125 Disease Progression by Investigator", PARAMN = 4, PARCAT1 = "CA-125", PARCAT1N = 1, PARCAT2 = "Investigator", PARCAT2N = 1, ANL01FL = "Y", ANL02FL = "Y" ) ) ``` ```{r, echo=FALSE} dataset_vignette( adrs %>% arrange(!!!get_admiral_option("subject_keys"), PARAMN, ADT), display_vars = exprs(USUBJID, PARAMCD, AVISIT, ADT, AVALC), filter = PARAMCD == "PDCA125" ) ``` ## CA-125 Progression Category {#pdcacat} To obtain additional variables that store the progression category for participants who have progressed, we will create `MCRITy/MCRITyML/MCRITyMN` variables according to the ADaMIG guidelines. In our consideration we assume that the `SUPPRS` dataset contains `QNAM` values to uniquely classify a progression into one of the `A`, `B` or `C` categories as per `GCIG` criteria. Having previously transposed and merged `SUPPRS` dataset, we have a suitable structure for deriving `MCRIT` variables, since all the necessary variables we will use to check conditions are in one row. For this purpose `{admiral}` provides [`derive_vars_cat()`](https://pharmaverse.github.io/admiral/dev/reference/derive_vars_cat.html) function (see documentation for details). ```{r} definition_mcrit <- exprs( ~PARAMCD, ~condition, ~MCRIT1ML, ~MCRIT1MN, "PDCA125", CAELEPRE == "Y" & CANORM2X == "Y", "Patients with elevated CA-125 before treatment and normalization of CA-125 (A)", 1, "PDCA125", CAELEPRE == "Y" & CNOTNORM == "Y", "Patients with elevated CA-125 before treatment, which never normalizes (B)", 2, "PDCA125", CAELEPRE == "N" & CANORM2X == "Y", "Patients with CA-125 in the reference range before treatment (C)", 3 ) adrs <- adrs %>% mutate(MCRIT1 = if_else(PARAMCD == "PDCA125", "PD Category Group", NA_character_)) %>% derive_vars_cat( definition = definition_mcrit, by_vars = exprs(PARAMCD) ) ``` ```{r, echo=FALSE} dataset_vignette( adrs, display_vars = exprs(USUBJID, PARAMCD, AVISIT, ADT, AVALC, MCRIT1, MCRIT1ML, MCRIT1MN), filter = PARAMCD == "PDCA125" ) ``` Derivation of the progression category may be more complex if the data is collected in a different way and the user needs to check whether: - baseline CA-125 level was elevated, - CA-125 level normalizes over the course of the study, - CA-125 level > 2 * ULRR (if CA-125 level normalizes) or CA-125 level > 2 * nadir (if CA-125 level never normalizes). If criterion to find PD category draws from multiple rows (different parameters or multiple rows for a single parameter) this will require the creation of a new parameter. ## CA-125 Best Confirmed Overall Response {#cborca} The function `admiral::derive_extreme_event()` can be used to derive the CA-125 Best Confirmed Overall Response Parameter. Some events such as `bor_cr`, `bor_pr` have been defined in {admiralonco}. Missing events specific to GCIG criteria are defined below. **Note**: *For `SD`, it is not required as for RECIST 1.1 that the response occurs after a protocol-defined number of days.* ```{r} bor_sd_gcig <- event( description = "Define stable disease (SD) for best overall response (BOR)", dataset_name = "ovr", condition = AVALC == "SD", set_values_to = exprs(AVALC = "SD") ) bor_ne_gcig <- event( description = "Define not evaluable (NE) for best overall response (BOR)", dataset_name = "ovr", condition = AVALC == "NE", set_values_to = exprs(AVALC = "NE") ) adrs <- adrs %>% derive_extreme_event( by_vars = get_admiral_option("subject_keys"), tmp_event_nr_var = event_nr, order = exprs(event_nr, ADT), mode = "first", source_datasets = list( ovr = ovr_ca125, adsl = adsl ), events = list( bor_cr, bor_pr, bor_sd_gcig, bor_pd, bor_ne_gcig, no_data_missing ), set_values_to = exprs( PARAMCD = "CBORCA", PARAM = "CA-125 Best Confirmed Overall Response by Investigator", PARAMN = 5, PARCAT1 = "CA-125", PARCAT1N = 1, PARCAT2 = "Investigator", PARCAT2N = 1, AVAL = aval_resp(AVALC), ANL01FL = "Y", ANL02FL = "Y" ) ) ``` ```{r, echo=FALSE} dataset_vignette( adrs, display_vars = exprs(USUBJID, AVISIT, PARAMCD, AVALC, ADT), filter = PARAMCD == "CBORCA" ) ``` ## Combined Best Unconfirmed Overall Response {#combubor} For patients who are measurable by both RECIST 1.1 and CA-125 the concept of Combined Best Overall Response is used. In our assumptions, RECIST 1.1 response is unconfirmed, and the combined response from `RS` domain is based on that unconfirmed RECIST 1.1. In this part of the vignette, we will derive Combined Best Unconfirmed Overall Response based on combined response (`PARAMCD = "OVRR11CA"`) as collected on the CRF. ```{r} adrs <- adrs %>% derive_extreme_event( by_vars = get_admiral_option("subject_keys"), tmp_event_nr_var = event_nr, order = exprs(event_nr, ADT), mode = "first", source_datasets = list( ovr = ovr_ubor, adsl = adsl ), events = list( bor_cr, bor_pr, bor_sd_gcig, bor_pd, bor_ne_gcig, no_data_missing ), set_values_to = exprs( PARAMCD = "BORCA11", PARAM = "Combined Best Unconfirmed Overall Response by Investigator", PARAMN = 6, PARCAT1 = "Combined", PARCAT1N = 3, PARCAT2 = "Investigator", PARCAT2N = 1, AVAL = aval_resp(AVALC), ANL01FL = "Y", ANL02FL = "Y" ) ) ``` ```{r, echo=FALSE} dataset_vignette( adrs, display_vars = exprs(USUBJID, AVISIT, PARAMCD, AVALC, ADT), filter = PARAMCD == "BORCA11" ) ``` ## Combined Best Confirmed Overall Response {#combcbor} For studies where ORR is one of the primary endpoints, best RECIST 1.1 response for CR and PR needs to be confirmed and maintained for at least 28 days. Due to the complexity of the problem, we will not address it in this vignette. ## Other Endpoints {#other} For examples on the additional endpoints, please see [Creating ADRS (Including Non-standard Endpoints)](adrs.html).