---
title: "Introduction to gerda"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to gerda}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(gerda)
```

## Overview

The `gerda` package provides functions to access and work with GERDA datasets. The German Election Database (GERDA) provides a comprehensive dataset of local, state, and federal election results in Germany. The data is intended to facilitate research on electoral behavior, representation, and political responsiveness at multiple levels of government. All datasets include turnout and vote shares for all major parties. Moreover, GERDA contains geographically harmonized datasets that account for changes in municipal boundaries and mail-in voting districts.

GERDA was compiled by Vincent Heddesheimer, Florian Sichart, Andreas Wiedemann and Hanno Hilbig.

This vignette will introduce you to the main functions of the package and demonstrate how to use them.

## Available Datasets

To see a list of all available GERDA datasets, you can use the `gerda_data_list()` function:

```{r}
gerda_data_list()
```

This function displays a formatted table with the names and descriptions of all available datasets.

## Loading Data

The main function for loading GERDA data is `load_gerda_web()`. This function allows you to load a specific dataset from a web source. Here's an example of how to use it:

```{r, eval=FALSE}
# Load the municipal harmonized dataset
municipal_harm_data <- load_gerda_web("municipal_harm", verbose = TRUE, file_format = "rds")
```

The `load_gerda_web()` function takes the following parameters:

- `file_name`: The name of the dataset to load (as shown in the `gerda_data_list()` output)
- `verbose`: If set to `TRUE`, it prints messages about the loading process (default is `FALSE`)
- `file_format`: Specifies the format of the file to load, either "rds" or "csv" (default is "rds")

## Example Workflow

Here's an example of a typical workflow using the `gerda` package:

1. List available datasets:

```{r}
gerda_data_list()
```

2. Load a dataset (in this case, the federal elections at the county level, harmonized):

```{r, eval=FALSE}
federal_cty_harm <- load_gerda_web("federal_cty_harm", verbose = TRUE)
```

## County-Level Covariates

The `gerda` package includes county-level socioeconomic and demographic covariates from INKAR (Indikatoren und Karten zur Raum- und Stadtentwicklung). These covariates can be easily merged with GERDA election data to enrich your analyses.

### Quick Start

The easiest way to add covariates to your election data is using the `add_gerda_covariates()` function:

```{r, eval=FALSE}
library(dplyr)

# Load election data and add covariates
merged <- load_gerda_web("federal_cty_harm") %>%
  add_gerda_covariates()

# Your data now includes 20 county-level covariates!
```

This function automatically:

- Uses the correct join keys
- Keeps only election years (left join)
- Validates input data

### Available Covariates

The covariates dataset includes 20 variables covering:

- **Demographics**: Age structure, foreign population, gender composition
- **Economy**: GDP per capita, sectoral composition, enterprise structure
- **Labor Market**: Unemployment rates (overall, youth, long-term)
- **Education**: School completion rates, students, apprentices
- **Income**: Median income, purchasing power, low-income households

### Viewing the Codebook

To see detailed information about each covariate, including units and missing data patterns:

```{r, eval=FALSE}
# Get the codebook
codebook <- gerda_covariates_codebook()
print(codebook)

# Find variables with good coverage
library(dplyr)
codebook %>%
  filter(missing_pct < 10) %>%
  select(variable, label, category)
```

### Advanced Usage

For more control, you can access the raw covariates data:

```{r, eval=FALSE}
# Get raw covariate data
covs <- gerda_covariates()

# Inspect before merging
summary(covs$unemployment_rate)

# Custom merge
elections <- load_gerda_web("federal_cty_harm")
merged <- elections %>%
  left_join(covs, by = c("county_code" = "county_code", "election_year" = "year"))
```

### Data Coverage

- **Counties**: 400 German counties (Kreise)
- **Time period**: 1995-2022 (annual data)
- **Election coverage**: Elections from 1998 onwards have full covariate data

Note: Some covariates have missing values. Use the codebook to check data availability for specific variables before analysis.

## Party Crosswalk Function

The `party_crosswalk()` function provides a mapping between GERDA party names and standardized party information from the ParlGov database. This is particularly useful for linking GERDA data with other political science datasets or for obtaining standardized party characteristics.

### Usage

The function takes two main parameters:

- `party_gerda`: A character vector of GERDA party names
- `destination`: The name of the column from the ParlGov view_party table to map to

### Available Mapping Options

You can map GERDA party names to various standardized party characteristics, including:

- `left_right`: Left-right position scores
- `party_name_english`: English party names
- `party_name_short`: Short party names
- `country_name`: Country names
- And many other ParlGov variables

### Example

```{r, eval=FALSE}
# Map GERDA party names to left-right positions
parties <- c("cdu", "spd", "linke_pds", "fdp")
left_right_scores <- party_crosswalk(parties, "left_right")
print(left_right_scores)

# Map to English party names
english_names <- party_crosswalk(parties, "party_name_english")
print(english_names)
```

This function is especially useful when you want to:

- Analyze parties along ideological dimensions
- Merge GERDA data with other comparative datasets
- Standardize party names across different data sources
- Access additional party metadata from ParlGov

## Conclusion

The `gerda` package provides easy access to a wide range of German election and related data. By using the `gerda_data_list()` function to explore available datasets and `load_gerda_web()` to load them, you can quickly incorporate this data into your research or analysis projects.

For more information or to provide feedback, please contact <hhilbig@ucdavis.edu> or visit the GitHub repository at https://github.com/hhilbig/gerda.