---
title: "Phenotype diagnostics"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{a01_PhenotypeDiagnostics}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(CDMConnector)
if (Sys.getenv("EUNOMIA_DATA_FOLDER") == "") Sys.setenv("EUNOMIA_DATA_FOLDER" = tempdir())
if (!dir.exists(Sys.getenv("EUNOMIA_DATA_FOLDER"))) dir.create(Sys.getenv("EUNOMIA_DATA_FOLDER"))
if (!eunomiaIsAvailable()) downloadEunomiaData(datasetName = "synpuf-1k", cdmVersion = "5.3")
```

## Introduction: Run PhenotypeDiagnostics

In this vignette, we are going to present how to run `PhenotypeDiagnostics()`. We are going to use the following packages and mock data:

```{r, message=FALSE, warning=FALSE, eval=FALSE}
library(CohortConstructor)
library(PhenotypeR)
library(dplyr)

con <- DBI::dbConnect(duckdb::duckdb(), 
                      CDMConnector::eunomiaDir("synpuf-1k", "5.3"))
cdm <- CDMConnector::cdmFromCon(con = con, 
                                cdmName = "Eunomia Synpuf",
                                cdmSchema   = "main",
                                writeSchema = "main", 
                                achillesSchema = "main")
cdm
```

Note that we have included [achilles tables](https://github.com/OHDSI/Achilles) in our cdm reference, which will be used to speed up some of the analyses.

## Create a cohort

First, we are going to use the package [CohortConstructor](https://ohdsi.github.io/CohortConstructor/) to generate three cohorts of *warfarin*, *acetaminophen* and *morphine* users.

```{r, message=FALSE, warning=FALSE, eval=FALSE}
# Create a codelist
codes <- list("warfarin" = c(1310149, 40163554),
              "acetaminophen" = c(1125315, 1127078, 1127433, 40229134, 40231925, 40162522, 19133768),
              "morphine" = c(1110410, 35605858, 40169988))

# Instantiate cohorts with CohortConstructor
cdm$my_cohort <- conceptCohort(cdm = cdm,
                               conceptSet = codes, 
                               exit = "event_end_date",
                               overlap = "merge",
                               name = "my_cohort")
```
## Run PhenotypeDiagnostics

Now we will proceed to run `phenotypeDiagnotics()`. This function will run the following analyses:

-   **Database diagnostics**: This includes information about the size of the data, the time period covered, the number of people in the data, and other meta-data of the CDM object. See [Database diagnostics vignette](https://ohdsi.github.io/PhenotypeR/articles/a03_DatabaseDiagnostics.html) for more details. 
-   **Codelist diagnostics**: This includes information on the concepts included in our cohorts' codelist. See [Codelist diagnostics vignette](https://ohdsi.github.io/PhenotypeR/articles/a04_CodelistDiagnostics.html) for further details.
-   **Cohort diagnostics**: This summarises the attrition of our cohorts, as well as overlapping between cohorts. See [Cohort diagnostics vignette](https://ohdsi.github.io/PhenotypeR/articles/a05_CohortDiagnostics.html) for further details.
-   **Matched diagnostics**: This matched our study cohorts to people with similar age and sex in the database and performs a large-scale characterisation on both. See [Matched diagnostics vignette](https://ohdsi.github.io/PhenotypeR/articles/a06_MatchedDiagnostics.html) for further details.
-   **Population diagnostics**: Calculates the frequency of our study cohorts in the database in terms of their incidence rates and prevalence. See [Population diagnostics vignette](https://ohdsi.github.io/PhenotypeR/articles/a07_PopulationDiagnostics.html) for further details.

We can specify which analysis we want to perform by setting to TRUE or FALSE each one of the corresponding arguments:
```{r, eval=FALSE}
result <- phenotypeDiagnostics(
  cohort = cdm$my_cohort, 
  databaseDiagnostics = TRUE, 
  codelistDiagnostics = TRUE, 
  cohortDiagnostics   = TRUE, 
  populationDiagnostics = TRUE,
  populationSample = 1e+06,
  populationDateRange = as.Date(c(NA, NA)),
  matchedDiagnostics  = TRUE,
  matchedSample = 1000
  )
result |> glimpse()
```

Notice that we have three additional arguments:

-   `populationSample`: It allows to specify a number of people that randomly will be extracted from the CDM to perform the **Population diagnostics** analysis. If NULL, all the participants in the CDM will be included. It helps to reduce the computational time.
-   `populationDateRange`: We can use it to specify the time period when we want to perform our **Population diagnostics** analysis.
-   `matchedSample`: Similar to populationSample, this arguments subsets a random sample of people to perform the **Matching diagnostics**.

## Save the results
To save the results, we can use [exportSummarisedResult](https://darwin-eu.github.io/omopgenerics/reference/exportSummarisedResult.html) function from [omopgenerics](https://darwin-eu.github.io/omopgenerics/index.html) R Package:
```{r, eval=FALSE}
exportSummarisedResult(result, directory = here::here(), minCellCount = 5)
```

## Visualisation of the results
Once we get our **Phenotype diagnostics** result, we can use `shinyDiagnostics` to easily create a shiny app and visualise our results:

```{r, eval=FALSE}
result <- shinyDiagnostics(result,
                           directory = tempdir(),
                           minCellCount = 5, 
                           open = TRUE)
```
Notice that we have specified the minimum number of counts (`minCellCount`) for suppression to be shown in the shiny app, and also that we want the shiny to be launched in a new R session (`open`). You can see the shiny app generated for this example in [here](https://dpa-pde-oxford.shinyapps.io/Readme_PhenotypeR/).See [Shiny diagnostics vignette](https://ohdsi.github.io/PhenotypeR/articles/a02_ShinyDiagnostics.html) for a full explanation of the shiny app.