--- title: "Execution" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Execution} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Executing the study Now as we have initiated [database connection](https://healthinformaticsut.github.io/CohortContrast/articles/a00_introduction.html) and created the `targetTable` as well as the `controlTable` we are ready to execute the study. The chunk below shows what a saved study looks like after execution by loading the bundled `lc500` example results: ```{r} if (requireNamespace("nanoparquet", quietly = TRUE)) { studyDir <- system.file("example", "st", package = "CohortContrast") study <- CohortContrast::loadCohortContrastStudy("lc500", pathToResults = studyDir) # Inspect the main exported components created by a completed run. names(study) } ``` This is the same type of output object you can reload from your own saved study directory after running `CohortContrast()`. ```{r, include = TRUE, eval=FALSE, echo=TRUE} ################################################################################ # # Execute # ################################################################################ data = CohortContrast::CohortContrast( cdm, targetTable = targetTable, controlTable = controlTable, pathToResults = file.path(getwd(), "studies"), domainsIncluded = c( "Drug", "Condition", "Measurement", "Observation", "Procedure", "Visit", "Visit detail", "Death" ), prevalenceCutOff = 2.5, topK = FALSE, # Number of features to export presenceFilter = 0.2, # 0-1, percentage of people who must have the chosen feature present complementaryMappingTable = NULL, # Optional manual concept mapping table getSourceData = FALSE, # If true will generate summaries with source data as well runChi2YTests = TRUE, runLogitTests = FALSE, createOutputFiles = TRUE, complName = "LungCancer_1Y") ``` ## The parameters There are multiple parameters we can tweak for different outcomes: ### Mandatory: `cdm` Connection to the database `targetTable` Table for target cohort `controlTable` Table for control cohort `pathToResults` Path to the results folder, can be project's working directory `domainsIncluded` list of CDM domains to include, choose from Drug, Condition, Measurement, Observation, Procedure, Visit, Visit detail, Death `complName` Name of the output study directory ### Customization: `runChi2YTests` boolean for running CHI2Y tests (chi-squared tests for two proportions with Yates continuity correction) `runLogitTests` boolean for logit-tests on the prevalence, builds a model for predicting whether the patient is in target or control `getAllAbstractions` boolean for creating abstractions' levels for the imported data, this is useful when using GUI and exploring data `maximumAbstractionLevel` Maximum level of abstraction allowed, if `getAllAbstractions` is TRUE, for hierarchy the concept_hierarchy table is used `getSourceData` boolean for fetching source data, the data abstraction level which is used to map to OMOP CDM `prevalenceCutOff` numeric or FALSE, if set, removes all of the concepts which are not present (in target) more than `prevalenceCutOff` times. Eg if set to 2, only concepts present double in target are exported. `topK` numeric or FALSE, if set, keeps at maximum this number of features in the analysis. Maximum number of features exported. `presenceFilter` numeric or FALSE, if set, removes all features represented by fewer target cohort subjects than the given percentage `complementaryMappingTable` data frame or NULL. Mapping table for concept merges. Columns: CONCEPT_ID, CONCEPT_NAME, NEW_CONCEPT_ID, NEW_CONCEPT_NAME, ABSTRACTION_LEVEL, TYPE `numCores` Number of cores to allocate to parallel processing, by default max number of cores - 1 `createOutputFiles` Boolean for creating output files, the default value is TRUE `runRemoveTemporalBias` boolean for optional temporal-bias reduction step after main workflow `runAutomaticHierarchyCombineConcepts` boolean for optional hierarchy-based post-processing `runAutomaticCorrelationCombineConcepts` boolean for optional correlation-based post-processing ### Notes: When using the GUI `prevalenceCutOff`, `presenceFilter` can be changed on a slider. The effect of `runChi2YTests` and `runLogitTests` can be toggled as a filter. The function will output a study directory with `complName`, in this case `LungCancer_1Y`, inside `pathToResults`. The study directory contains parquet files (for example `data_patients.parquet`) and a metadata file `metadata.json`. ## Reloading a saved study ```{r, include = TRUE, eval=FALSE, echo=TRUE} reloaded <- CohortContrast::loadCohortContrastStudy( studyName = "LungCancer_1Y", pathToResults = file.path(getwd(), "studies") ) ```