Codelist diagnostics

This vignette presents a set of functions to explore the use of codes in a codelist. We will cover the following key functions:

Let’s start by loading the required packages, connecting to a mock database, and generating a codelist for example purposes. We’ll use getCandidateCodes() to find our codes.

library(DBI)
library(duckdb)
library(dplyr)
library(CDMConnector)
library(CodelistGenerator)
library(CohortConstructor)

# Connect to the database and create the cdm object
con <- dbConnect(duckdb(), 
                      eunomiaDir("synpuf-1k", "5.3"))
cdm <- cdmFromCon(con = con, 
                  cdmName = "Eunomia Synpuf",
                  cdmSchema   = "main",
                  writeSchema = "main", 
                  achillesSchema = "main")

# Create a codelist for depression
depression <- getCandidateCodes(cdm,
                                keywords = "depression")
depression <- list("depression" = depression$concept_id)

Running Diagnostics in a Codelist

Summarise Code Use Using ACHILLES Tables

This function uses ACHILLES summary tables to count the number of records and persons associated with each concept in a codelist. Notice that it requires that ACHILLES tables are available in the CDM.

achilles_code_use <- summariseAchillesCodeUse(depression, 
                                              cdm, 
                                              countBy = c("record", "person"))

From this, we will obtain a summarised result object. We can easily visualise the results using tableAchillesCodeUse():

tableAchillesCodeUse(achilles_code_use,
                     type = "gt")

Notice that concepts with zero counts will not appear in the result table.

Summarise Code Use Using Patient-Level Data

This function performs a similar task as above but directly queries patient-level data, making it usable even if ACHILLES tables are not available. It can be configured to stratify results by concept (byConcept), by year (byYear), by sex (bySex), or by age group (byAgeGroup). We can further specify a specific time period (dateRange).

code_use <- summariseCodeUse(depression,
                             cdm,
                             countBy = c("record", "person"),
                             byYear  = FALSE,
                             bySex   = FALSE,
                             ageGroup =  list("<=50" = c(0,50), ">50" = c(51,Inf)),
                             dateRange = as.Date(c("2010-01-01", "2020-01-01")))

tableCodeUse(code_use, type = "gt")

Identify Orphan Codes

Orphan codes are concepts that might be related to our codelist but that have not been included. It can be used to ensure that we have not missed any important concepts. Notice that this function uses ACHILLES tables.

summariseOrphanCodes() will look for descendants (via concept_descendants table), ancestors (via concept_ancestor table), and concepts related to the codes included in the codelist (via concept_relationship table). Additionally, if the cdm contains PHOEBE tables (concept_recommended table), they will also be used.

orphan <- summariseOrphanCodes(depression, cdm)
tableOrphanCodes(orphan, type = "gt")

Identify Unmapped Codes

This function identifies codes that are conceptually linked to the codelist but that are not mapped.

unmapped <- summariseUnmappedCodes(depression, cdm)
tableUnmappedCodes(unmapped, type = "gt")

Run Diagnostics within a Cohort

You can also evaluate how the codelist is used within a specific cohort. First, we will define a cohort using the conceptCohort() function from CohortConstructor package.

cdm[["depression"]] <- conceptCohort(cdm, 
                                     conceptSet = depression, 
                                     name = "depression")

Then, we can summarise the code use within this cohort:

cohort_code_use <- summariseCohortCodeUse(depression, 
                                          cdm,
                                          cohortTable = "depression",
                                          countBy = c("record", "person"))
tableCohortCodeUse(cohort_code_use)

Summarise Code Use at Cohort Entry

Use the timing argument to restrict diagnostics to codes used at the entry date of the cohort.

cohort_code_use <- summariseCohortCodeUse(depression, 
                                          cdm,
                                          cohortTable = "depression",
                                          countBy = c("record", "person"),
                                          timing = "entry")
tableCohortCodeUse(cohort_code_use)

Stratify Cohort-Level Diagnostics

You can also stratify cohort code use results by year (byYear), by sex (bySex), or by age group (byAgeGroup):

cohort_code_use <- summariseCohortCodeUse(depression, 
                                          cdm,
                                          cohortTable = "depression",
                                          countBy = c("record", "person"),
                                          byYear = FALSE,
                                          bySex = TRUE,
                                          ageGroup = NULL)
tableCohortCodeUse(cohort_code_use)