--- title: "Phenotype diagnostics" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{a01_PhenotypeDiagnostics} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(CDMConnector) if (Sys.getenv("EUNOMIA_DATA_FOLDER") == "") Sys.setenv("EUNOMIA_DATA_FOLDER" = tempdir()) if (!dir.exists(Sys.getenv("EUNOMIA_DATA_FOLDER"))) dir.create(Sys.getenv("EUNOMIA_DATA_FOLDER")) if (!eunomiaIsAvailable()) downloadEunomiaData(datasetName = "synpuf-1k", cdmVersion = "5.3") ``` ## Introduction: Run PhenotypeDiagnostics In this vignette, we are going to present how to run `PhenotypeDiagnostics()`. We are going to use the following packages and mock data: ```{r, message=FALSE, warning=FALSE, eval=FALSE} library(CohortConstructor) library(PhenotypeR) library(dplyr) con <- DBI::dbConnect(duckdb::duckdb(), CDMConnector::eunomiaDir("synpuf-1k", "5.3")) cdm <- CDMConnector::cdmFromCon(con = con, cdmName = "Eunomia Synpuf", cdmSchema = "main", writeSchema = "main", achillesSchema = "main") cdm ``` Note that we have included [achilles tables](https://github.com/OHDSI/Achilles) in our cdm reference, which will be used to speed up some of the analyses. ## Create a cohort First, we are going to use the package [CohortConstructor](https://ohdsi.github.io/CohortConstructor/) to generate three cohorts of *warfarin*, *acetaminophen* and *morphine* users. ```{r, message=FALSE, warning=FALSE, eval=FALSE} # Create a codelist codes <- list("warfarin" = c(1310149, 40163554), "acetaminophen" = c(1125315, 1127078, 1127433, 40229134, 40231925, 40162522, 19133768), "morphine" = c(1110410, 35605858, 40169988)) # Instantiate cohorts with CohortConstructor cdm$my_cohort <- conceptCohort(cdm = cdm, conceptSet = codes, exit = "event_end_date", overlap = "merge", name = "my_cohort") ``` ## Run PhenotypeDiagnostics Now we will proceed to run `phenotypeDiagnotics()`. This function will run the following analyses: - **Database diagnostics**: This includes information about the size of the data, the time period covered, the number of people in the data, and other meta-data of the CDM object. See [Database diagnostics vignette](https://ohdsi.github.io/PhenotypeR/articles/a03_DatabaseDiagnostics.html) for more details. - **Codelist diagnostics**: This includes information on the concepts included in our cohorts' codelist. See [Codelist diagnostics vignette](https://ohdsi.github.io/PhenotypeR/articles/a04_CodelistDiagnostics.html) for further details. - **Cohort diagnostics**: This summarises the attrition of our cohorts, as well as overlapping between cohorts. See [Cohort diagnostics vignette](https://ohdsi.github.io/PhenotypeR/articles/a05_CohortDiagnostics.html) for further details. - **Matched diagnostics**: This matched our study cohorts to people with similar age and sex in the database and performs a large-scale characterisation on both. See [Matched diagnostics vignette](https://ohdsi.github.io/PhenotypeR/articles/a06_MatchedDiagnostics.html) for further details. - **Population diagnostics**: Calculates the frequency of our study cohorts in the database in terms of their incidence rates and prevalence. See [Population diagnostics vignette](https://ohdsi.github.io/PhenotypeR/articles/a07_PopulationDiagnostics.html) for further details. We can specify which analysis we want to perform by setting to TRUE or FALSE each one of the corresponding arguments: ```{r, eval=FALSE} result <- phenotypeDiagnostics( cohort = cdm$my_cohort, databaseDiagnostics = TRUE, codelistDiagnostics = TRUE, cohortDiagnostics = TRUE, populationDiagnostics = TRUE, populationSample = 1e+06, populationDateRange = as.Date(c(NA, NA)), matchedDiagnostics = TRUE, matchedSample = 1000 ) result |> glimpse() ``` Notice that we have three additional arguments: - `populationSample`: It allows to specify a number of people that randomly will be extracted from the CDM to perform the **Population diagnostics** analysis. If NULL, all the participants in the CDM will be included. It helps to reduce the computational time. - `populationDateRange`: We can use it to specify the time period when we want to perform our **Population diagnostics** analysis. - `matchedSample`: Similar to populationSample, this arguments subsets a random sample of people to perform the **Matching diagnostics**. ## Save the results To save the results, we can use [exportSummarisedResult](https://darwin-eu.github.io/omopgenerics/reference/exportSummarisedResult.html) function from [omopgenerics](https://darwin-eu.github.io/omopgenerics/index.html) R Package: ```{r, eval=FALSE} exportSummarisedResult(result, directory = here::here(), minCellCount = 5) ``` ## Visualisation of the results Once we get our **Phenotype diagnostics** result, we can use `shinyDiagnostics` to easily create a shiny app and visualise our results: ```{r, eval=FALSE} result <- shinyDiagnostics(result, directory = tempdir(), minCellCount = 5, open = TRUE) ``` Notice that we have specified the minimum number of counts (`minCellCount`) for suppression to be shown in the shiny app, and also that we want the shiny to be launched in a new R session (`open`). You can see the shiny app generated for this example in [here](https://dpa-pde-oxford.shinyapps.io/Readme_PhenotypeR/).See [Shiny diagnostics vignette](https://ohdsi.github.io/PhenotypeR/articles/a02_ShinyDiagnostics.html) for a full explanation of the shiny app.