---
title: "Introduction to SimtablR: Tables Made Simple"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{tb_tutorial}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Welcome to SimtablR!

**SimtblR** is a lightweight package that brings together three essential tools for epidemiologic analysis:

-   **`tb()`** – Create publication-ready descriptive tables
-   **`diag_test()`** – Evaluate diagnostic test performance
-   **`regtab()`** – Generate multi-outcome regression tables

Let's explore each function through practical examples using a realistic dataset.

------------------------------------------------------------------------

## Getting Started

First, load the package and example data:

```{r load-package}
library(SimtablR)
data(epitabl)
```

## Example Data

The `epitabl` dataset comes from a hypothetical cross-sectional study of 500 adults examining risk factors for a chronic disease. Let's get acquainted:

Our study includes demographics (age, sex, education), health behaviors (smoking, exercise), clinical measures (BMI, blood pressure, cholesterol), and outcomes (disease status, healthcare utilization, hospitalization).

**A note on the disease**: About 30% of participants have the disease, making this a realistic prevalence for many chronic conditions:

```{r}
tb(epitabl, disease)
```

------------------------------------------------------------------------

# Part 1: Descriptive Tables with `tb()`

The tb() function is the Swiss Army knife of SimtablR. It is designed to replace the base R table() function for a good part of your workflow.

It handles:

1.  Counts and Percentages (Row, Column, Total)
2.  Statistical Tests (Chi-squared, Fisher, McNemar)
3.  Effect Measures (Prevalence Ratios, Odds Ratios)
4.  Stratification (3-way tables)
5.  Formatting for publication (Flextable integration)

## 1. Univariate Tables (Frequency)

The simplest use case is checking the distribution of a single variable. By default, tb() shows counts and percentages.

```{r}
# Distribution of Smoking Status
tb(epitabl, smoking)
```

## 2.Bivariate Tables (Cross-Tabulation)

The first variable is always the line and the second variable is the column. Let's examine disease prevalence by smoking status:

```{r}
# Disease status by Sex
tb(epitabl, smoking, disease)
```

You can control which percentage is calculated using the flags col (column %), row (row %), or p (total %). If you need to be more specific, add or remove decimal places using `d` (d for decimal!)

```{r}
# Disease status by Sex
tb(epitabl, smoking, disease, row, d =3)
```

### Style table vales

Use the style argument to change how numbers are presented. Common formats are n (pct) or pct (n). use `{p}` or `{n}` to display percentages and frequency, respectively. Using them you can, for example, have the percentages outiside the parenthesis and frquency inside:

```{r}
# Showing "% (n)"
tb(epitabl, smoking, disease, row, style = "{p}% ({n})")
```

## Adding Statistical Evidence

Is there a real association, or could this pattern occur by chance?

Set test = TRUE to automatically calculate a p-value (Chi-squared or Fisher's exact test, depending on sample size). You can also determine it manually. For frequency tables, you can set `"chsqr"` (Pearson's Chi-squared), `"fisher"` (Fisher's Exact Test) or `"mcnemar"` (McNemar).

```{r}
tb(epitabl, smoking, disease, col, test= "chsqr") #could also use test= TRUE
```

We see something interesting. The prevalence varies across smoking groups, and the p-value suggests this is unlikely due to chance alone.

## Prevalence Ratios (PR) and Odds Ratios (OR)

In cross-sectional studies, we often want the Prevalence Ratio (PR). In case-control studies, the Odds Ratio (OR). `tb()` calculates these with confidence intervals automatically.

Now that we know that the prevalence of disease is significantly different across smoking categories, we can explore them with `or` OR or `pr` PR:

```{r}
# Calculate Prevalence Ratio of Disease by Smoking status
tb(epitabl, smoking, disease, col, 
   test = TRUE, 
   rp,            # Calculates Prevalence Ratios; change for or to get ORs
   ref = "Never",     # Reference level for smoking group
   conf.level = 0.95) # Define the level of confidence (0.95 by default)    
```

*Note: Without `ref`, the function uses the first factor level alphabetically. You can look at the levels of any variable using levels(epitabl\$smoking)*

## Continuous Variables

For continuous variables like age or blood pressure, we need to tell `tb()` to treat them differently:

```{r tb-continuous}
tb(epitabl, age, disease, var.type = c(age = "continuous"))
```

By default, you get the **median and interquartile range** (IQR)—robust statistics that aren't thrown off by outliers. If you prefer mean and standard deviation:

```{r}
tb(epitabl, systolic_bp, disease, 
   var.type = c(systolic_bp = "continuous"),
   stat.cont = "mean")
```

## Stratification

You can add a third layer of complexity by stratifying the table. For example, let's look at the relationship between Disease and Smoking, but separated by Sex.

```{r}
tb(epitabl, disease, smoking, col, strat = sex, test = TRUE)
```

Currently, the columns now show every combination: Female-Never, Female-Former, Female-Current, Male-Never, and so on. You can visually compare patterns across strata. *Note: PRs and ORs will not be calculated in stratified tables. For adjusted estimates accounting for multiple variables simultaneously, you'll want regression (see bellow)*

## Exporting to Word/PowerPoint

The output of `tb()` is great for the console, but for publication, you likely need a polished table. SimtablR integrates seamlessly with the [flextable package](https://github.com/davidgohel/flextable).

```{r}
# Create a table object
table <- tb(epitabl, smoking, disease, col, 
         test = TRUE, 
         rp,            
         ref = "Never",     
         conf.level = 0.95) 

library(flextable) #load the flextable package to output into word or pptx
ft <- as_flextable(table)
ft <- autofit(ft)  # Optional: Adjust column widths
ft

save_as_docx(ft, path = "Table1.docx") 
save_as_pptx(ft, path = "Table1.pptx")
```

# Part 2: Diagnostic Test Evaluation with `diag_test()`

Imagine you're evaluating a new rapid diagnostic test for our disease. The gold standard is laboratory confirmation. Its accurate, but slow and expensive. Your rapid test is fast and cheap, but is it accurate enough? Let's find out!

## The Confusion Matrix

`diag_test()` automates the calculation of Sensitivity, Specificity, Predictive Values (PPV/NPV), and Likelihood Ratios. Let's evaluate the performance of rapid_test against lab_confirmed (Gold Standard) in our dataset. You must define what constitutes a "Positive" result in both the test and the reference.

```{r diag-basic}
results <- diag_test(
  data = epitabl,
  test = rapid_test,
  ref = lab_confirmed,
  positive = "Yes"
)

print(results)
```

You can quickly visualize the Confusion Matrix using the generic `plot()` function on the result object.

```{r}
plot(results, main = "Rapid Test Performance")
```

The area of each quadrant reflects the number of observations. Large areas in the true positive and true negative quadrants indicate good performance.

## Extracting Results for Reports

Need the metrics in a table format for a manuscript or report? You can transform the results of `diag_test` into a dataframe containing all the important metrics:

```{r diag-table}
metrics_table <- as.data.frame(results)
print(metrics_table)
```

------------------------------------------------------------------------

# Part 3: Regression Tables with `regtab()`

You've described your data and evaluated your diagnostic test. Now you want to understand which factors predict disease, healthcare utilization, or other outcomes and you have *multiple* outcomes to examine. This is when we use `regtab()`.

Let's say we want to understand healthcare utilization in our population. Specifically, we'll examine three outcomes:

-   **Primary care visits** (outcome1)
-   **Specialist visits** (outcome2)
-   **Emergency department visits** (outcome3)

All are count variables, so we'll use **Poisson regression**. And we want to know: how do age, sex, disease status, and smoking affect each type of visit?

### Poison Regression: Count outcomes

```{r regtab-basic}
healthcare_model <- regtab(
  data = epitabl,
  outcomes = c("outcome1", "outcome2", "outcome3"),
  predictors = ~ age + sex + disease + smoking,
  family = poisson(link = "log"),
  robust = TRUE,
  labels = c(
    outcome1 = "Primary Care",
    outcome2 = "Specialist",
    outcome3 = "Emergency Dept")
  )
  
print(healthcare_model)
```

The function fitted three separate Poisson regression models (one per outcome) using the same predictors and outputs IIRs and CIs

### Logistic Regression: Binary Outcomes

Now, let's look at risk factors for Hospitalization (Binary: Yes/No).

```{r regtab-logistic}
hospital_model <- regtab(
  data = epitabl,
  outcomes = "hospitalized",
  predictors = ~ age + disease + smoking + comorbidity_score,
  family = binomial(link = "logit")
)

print(hospital_model)
```

The coefficients are now **odds ratios**

### Including P-values

```{r regtab-pvalues}
hospital_model_p <- regtab(
  data = epitabl,
  outcomes = "hospitalized",
  predictors = ~ age + disease + smoking + comorbidity_score,
  family = binomial(),
  p_values = TRUE
)

print(hospital_model_p)
```

Now each outcome gets both an estimate column and a p-value column.

### Linear (Gaussian) Regression: Continuous Outcomes

What predicts blood pressure and cholesterol? These are continuous outcomes, so we use **Gaussian regression**:

```{r regtab-gaussian}
clinical_model <- regtab(
  data = epitabl,
  outcomes = c("systolic_bp", "cholesterol"),
  predictors = ~ age + sex + bmi + exercise + smoking,
  family = gaussian(),
  labels = c(
    systolic_bp = "Systolic BP (mmHg)",
    cholesterol = "Cholesterol (mg/dL)"
  ),
  d = 1  # 1 decimal place
)

print(clinical_model)
```

These are **linear (beta) coefficients** (not exponentiated)

## Exporting Your Table

Since regtab() returns a standard data frame, you can easily export it to CSV or Excel for final formatting. You can also use flextable as before, though it is not necessary.

```{r}
# Export to CSV
write.csv(clinical_model, "Table2_Regression_Results.csv", row.names = FALSE)
```

# Conclusion

The **SimtablR** package brings three essential epidemiologic tools into a single, coherent framework:

- **`tb()`** eliminates the drudgery of descriptive table creation
- **`diag_test()`** makes diagnostic test evaluation straightforward and complete
- **`regtab()`** turns multi-outcome regression into a single function call

Each function prioritizes publication-ready output, statistical rigor, and sensible defaults—so you can focus on the science rather than the code.

I hope this package saves you time, reduces errors, and makes your analyses more reproducible. Now go forth and create some excellent tables.

**Happy analyzing!**