---
title: "Building a mock from data"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{mock_from_tables}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Introduction

Most of the functionalities of **omock** are used to build specific mock tables (e.g. `mockPerson()`, `mockObservationPeriod()`, ...), this allows the user to create mock cdm objects combining all those functions with some room for customisation. There are times where the user will want to create a mock CDM reference from its own bespoke tables. The [`mockCdmFromTables()`](https://ohdsi.github.io/omock/reference/mockCdmFromTables.html) function is designed to facilitates the creation of mock CDM reference from bespoke tables. This functionality will be useful to create a mock CDM from a `cohort_table` or a `drug_exposure` table, or with incomplete data (e.g. missing columns).

```{r}
library(omock)
library(dplyr, warn.conflicts = FALSE)
library(PatientProfiles)
```

# Create a mock cdm from a cohort table

For example if you want to create a CDM reference based on below bespoke cohorts. You can do it simple using the mockCdmFromTable() functions in a few lines of code. 

```{r, warning = FALSE}
# Define a list of user-defined cohort tables
cohortTables <- list(
  cohort1 = tibble(
    subject_id = 1:10L,
    cohort_definition_id = rep(1L, 10),
    cohort_start_date = as.Date("2020-01-01") + 1:10,
    cohort_end_date = as.Date("2020-01-01") + 11:20
  ),
  cohort2 = tibble(
    subject_id = 11:20L,
    cohort_definition_id = rep(2L, 10),
    cohort_start_date = as.Date("2020-02-01") + 1:10,
    cohort_end_date = as.Date("2020-02-01") + 11:20
  )
)

# Create a mock CDM object from the user-defined tables
cdm <- mockCdmFromTables(tables = cohortTables)

cdm
```

The generated CDM object will build the `person`, `observation_period` and vocabulary tables so that all the cohorts are in observation:

```{r}
cdm$cohort1 |>
  addInObservation()
cdm$observation_period
```

# Create a mock CDM from drug_exposure

Now we will create a CDM around a `drug_exposure` table, this functionality is quite useful to obtain mock datasets for testing purposes only specifying part of the information. In this case we will partially define `person` table to impose all individuals are women:

```{r}
person <- tibble(person_id = 1:5L, gender_concept_id = 8532L, year_of_birth = 1992)
```

and we will also create the records of the `drug_exposure` table:

```{r}
drugExposure <- tibble(
  person_id = rep(1:5L, 2),
  drug_concept_id = 19073188L,
  drug_exposure_start_date = rep(as.Date(c("2000-01-01", "2000-06-1")), each = 5),
  drug_exposure_end_date = drug_exposure_start_date + c(10L, 20L, 100L, 140L, 30L, 50L, 30L, 20L, 45L, 35L)
)
```

Then `mockCdmFromTables()` will populate the missing columns with interpolated data and add all the tables necessary to create a minimum viable CDM (it will contain at least `person`, `observation_period` and the vocabulary tables):

```{r, eval = FALSE}
cdm <- mockCdmFromTables(tables = list(person = person, drug_exposure = drugExposure))

cdm
```

As before all the records of `drug_exposure` will be in observation:

```{r, eval = FALSE}
cdm$drug_exposure |>
  addInObservation() |>
  group_by(in_observation) |>
  tally()
```