---
title: "Introduction to tabbitR"
author: "Siobhan McAndrew"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to tabbitR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5
)
```

# Overview

tabbitR automates the production of large sets of weighted 
crosstabulation tables and exports them directly to 'Excel'.  
It is designed for situations where analysts need many tables at once, such as:

* multiple outcome variables  
* multiple explanatory (breakdown) variables  
* weighted percentages  
* unweighted counts  
* clear and transparent reporting of missing values.  

This is a routine but time-consuming task in survey research, monitoring and 
evaluation, and exploratory data analysis. Doing it manually is repetitive, error-prone, and difficult to keep consistent.

`tabbitR::tabbit_excel()` solves this by producing, for each  
**outcome x breakdown variable** pair:

* a weighted percentage table  
* a matching unweighted N table  
* a clearly labelled summary of missing responses  
* light formatting via **`openxlsx`**  
* one or multiple Excel sheets depending on user preference.  

The goal is to reduce manual work, enforce consistency, and accelerate the early
stages of analysis.

# tabbitR use cases

Survey teams, academic researchers, and data analysts often need to deliver hundreds or thousands of tables - for example:

* one table per survey measure  
* for each demographic variable  
* across multiple countries or survey waves.  

Manual workflows struggle with:

* maintaining consistent formatting  
* ensuring missing values are handled transparently  
* avoiding accidental errors in weights or denominators  
* producing both weighted and unweighted summaries for each outcome variable  
* generating reproducible outputs.  

tabbitR automates all of this in one reproducible command.

# Key features

## Weighted percentages

Percentages are computed using user-supplied weights:

* **column percentages** by default  
* **row percentages** if `row_pct = TRUE`

Percentage formatting respects the `decimals =` argument.

## Unweighted counts

Alongside the percentage table, `tabbit_excel()` writes:

* an unweighted N table
* including or excluding missing values (depending on user options).

## Missing values: clear and explicit

tabbitR does not silently drop missing responses.

Users may choose:

* **default:** missing outcomes excluded from percentage rows, but  
  summarised in a "Missing %" line  
* **`missingasrow = TRUE`:** missing values appear as `"Response missing"`  
  as a full row in both the percentage and N tables  
* **`nomissing = TRUE`:** no missing summary is shown.  

## Flexible layout

* one sheet per breakdown variable (default)  
* or all tables in a single sheet  
* variable labels included when available (from e.g. **`haven`**-labelled data)  
* 'Excel' formatting: bold headers, borders, readable layout.  

## Designed for large projects

tabbitR is especially useful when producing:

* formatted frequencies or breakdowns for hundreds of outcome variables  
* for multiple countries or survey waves  
* publication-ready Excel files for clients.  

# Usage

## A minimal example

```{r}
library(tabbitR)

df <- data.frame(
  outcome = factor(c("A", "B", "A", NA, "C", NA)),
  sex     = factor(c("Male", "Male", "Female", "Female",
                     "Prefer not to say", "Male")),
  weight  = c(1, 2, 1, 1, 0.75, 3)
)

tmp <- tempfile(fileext = ".xlsx")

tabbit_excel(
  data        = df,
  vars        = "outcome",
  breakdown   = "sex",
  wtvar       = "weight",
  file        = tmp,
  decimals    = 1
)

tmp

# The workbook is written to a temporary location (tmp).
# Open the file in a spreadsheet application to inspect the output.
```

## Multiple outcomes and multiple breakdowns

```{r}
### Example toy survey data
set.seed(123)

survey_df <- data.frame(
  outcome1       = factor(sample(c("Agree", "Neutral", "Disagree"), 200, replace = TRUE)),
  outcome2       = factor(sample(c("Often", "Sometimes", "Never"), 200, replace = TRUE)),
  outcome3       = factor(sample(c("Yes", "No"), 200, replace = TRUE)),
  sex            = factor(sample(c("Male", "Female"), 200, replace = TRUE)),
  age            = factor(sample(c("18-34", "35-54", "55+"), 200, replace = TRUE)),
  region         = factor(sample(c("North", "Midlands", "South"), 200, replace = TRUE)),
  survey_weight  = runif(200, 0.5, 2)
)

vars   <- c("outcome1", "outcome2", "outcome3")
breaks <- c("sex", "age", "region")

tmp2 <- tempfile(fileext = ".xlsx")

tabbit_excel(
  data        = survey_df,
  vars        = vars,
  breakdown   = breaks,
  wtvar       = "survey_weight",
  file        = tmp2,
  by_breakdown = TRUE,
  decimals    = 1
)

tmp2

```

# Understanding the options

## Required
* **data**: a data frame  
* **vars**: outcome variables (character vector)  
* **breakdown**: explanatory variables  

## Main options
* **wtvar**: weight variable  
* **by_breakdown**: one sheet per breakdown variable (default TRUE)  
* **decimals**: decimal places for percentages (0-6)  
* **row_pct**: compute row percentages rather than column percentages.  

## Missingness
* **missingasrow**: include missing outcomes as a row  
* **nomissing**: suppress all missing-value summaries.  

## Display
* **nooverall**: drop the "Overall %" column  
* **nototal**: drop the "Total %" row.  

# Treatment of missing values

tabbitR always makes missingness explicit.

### Default  
* missing values excluded from valid rows in weighted table, but shown in a separate "Missing %" row  
* denominator for missing % = valid + missing within each column.

### `missingasrow = TRUE`  
Missing values appear as `"Response missing"` inside the main table.

### `nomissing = TRUE`  
No missing information is displayed in either table.

# How weighted bases are computed

For each breakdown column:

```
weighted base W = sum(weights for non-missing outcomes)
```

Weighted bases are rounded to whole numbers for readability.

# Recommended workflow

1. Convert labelled variables using `haven::as_factor()` when needed.  
2. Select outcome variables.  
3. Choose breakdown variables.  
4. Run `tabbit_excel()`.  
5. Inspect the Excel workbook.  
6. Store code + output for reproducibility.  

# Citation

If you use tabbitR in published work, please cite:

> McAndrew, S. (2025). *tabbitR: Automated weighted cross-tabulations for survey analysis.* R package.  
> https://github.com/smmcandrew/tabbitR

A BibTeX entry is available via:

```r
citation("tabbitR")
```

# Session info

```{r}
sessionInfo()
```

<!-- End of vignette -->