---
title: "Getting Started with chiOpenData"
output: rmarkdown::html_vignette
author: "Christian Martinez"
vignette: >
  %\VignetteIndexEntry{Getting Started with chiOpenData}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
library(chiOpenData)
library(ggplot2)
library(dplyr)
```

## Introduction

Welcome to the `chiOpenData` package, a R package dedicated to helping R users connect to the [Chicago Open Data Portal](https://data.cityofchicago.org/)!

The `chiOpenData` package provides a streamlined interface for accessing Chicago's vast open data resources. It connects directly to the Chicago Open Data Portal, helping users bridge the gap between raw city APIs and tidy data analysis. This package is part of a broader ecosystem of open data tools designed to provide a consistent interface across cities. It does this in two ways:

### The `chi_pull_dataset()` function

The primary way to pull data in this package is the `chi_pull_dataset()` function, which works in tandem with `chi_list_datasets()`. You do not need to know anything about API keys or authentication.

The first step would be to call the `chi_list_datasets()` to see what datasets are in the list and available to use in the `chi_pull_dataset()` function. This provides information for thousands of datasets found on the portal.

```{r chi-list-datasets}
chi_list_datasets() |> head()
```

The output includes columns such as the dataset title, description, and link to the source. The most important fields are the dataset `key` and `id`. You need **either** in order to use the `chi_pull_dataset()` function. You can put **either** the key value or id value into the `dataset =` filter inside of `chi_pull_dataset()`.

For instance, if we want to pull the dataset `Crimes - 2001 to Present`, we can use either of the methods below:

```{r chi-311-pull}
chi_motor_vehicle_collisions_data <- chi_pull_dataset(
  dataset = "ijzp-q8t2", limit = 2, timeout_sec = 90)

chi_motor_vehicle_collisions_data <- chi_pull_dataset(
  dataset = "crimes_2001_to_present", limit = 2, timeout_sec = 90)
```

No matter if we put the `id` or the `key` as the value for `dataset =`, we successfully get the data!

### The `chi_any_dataset()` function

The easiest workflow is to use `chi_list_datasets()` together with `chi_pull_dataset()`.

In the event that you have a particular dataset you want to use in R that is not in the list, you can use the `chi_any_dataset()`. The only requirement is the dataset’s API endpoint (a URL provided by the Chicago Open Data portal). Here are the steps to get it:

1. On the Chicago Open Data Portal, go to the dataset you want to work with.
2. Click on "Export" (next to the actions button on the right hand side).
3. Click on "API Endpoint".
4. Click on "SODA2" for "Version".
5. Copy the API Endpoint.

Below is an example of how to use the `chi_any_dataset()` once the API endpoint has been discovered, that will pull the same data as the `chi_pull_dataset()` example:

```{text}
chi_motor_vehicle_collisions_data <- chi_any_dataset(json_link = "	
https://data.cityofchicago.org/resource/ijzp-q8t2.json", limit = 2)
```

### Rule of Thumb

While both functions provide access to Chicago Open Data, they serve slightly different purposes.

In general:

- Use `chi_pull_dataset()` when the dataset is available in `chi_list_datasets()`
- Use `chi_any_dataset()` when working with datasets outside the catalog

Together, these functions allow users to either quickly access the datasets or flexibly query any dataset available on the Chicago Open Data portal.

## Real World Example

Chicago has a population of about 2.7 million people, and unfortunately, it has a higher than average crime rate, and all crime data is contained in the dataset, [found here](https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-Present/ijzp-q8t2/about_data). In R, the `chiOpenData` package can be used to pull this data directly.

By using the `chi_pull_dataset()` function, we can gather the most recent crime cases in Chicago, and filter based upon any of the columns inside the dataset.

Let's take an example of 3 requests that occur on the street. The `chi_pull_dataset()` function can filter based off any of the columns in the dataset. To filter, we add `filters = list()` and put whatever filters we would like inside. From our `colnames` call before, we know that there is a column called "location_description" which we can use to accomplish this.

```{r filter-location}

chicago_crimes_street <- chi_pull_dataset(dataset = "ijzp-q8t2",limit = 3, timeout_sec = 90, filters = list(location_description = "STREET"))
chicago_crimes_street

# Checking to see the filtering worked
chicago_crimes_street |>
  distinct(location_description)
```

Success! From calling the `chicago_crimes_2026` dataset we see there are only 3 rows of data, and from the `distinct()` call we see the only location featured in our dataset is STREET.

One of the strongest qualities this function has is its ability to filter based off of multiple columns. Let's put everything together and get a dataset of *50* crimes that occur on the STREET that are not domestic.

```{r filter-chi-crimes}
# Creating the dataset
chicago_crimes <- chi_pull_dataset(dataset = "ijzp-q8t2", limit = 50, timeout_sec = 90, filters = list(location_description = "STREET", domestic = FALSE))

# Calling head of our new dataset
chicago_crimes |>
  slice_head(n = 6)

# Quick check to make sure our filtering worked
chicago_crimes |>
  summarize(rows = n())

chicago_crimes |>
  distinct(location_description)

chicago_crimes |>
  distinct(domestic)
```

We successfully created our dataset that contains 50 requests regarding that are not domestic that happen on the street.

### Mini analysis

Now that we have successfully pulled the data and have it in R, let's do a mini analysis on using the `primary_type` column, to figure out what are the main types of crimes.

To do this, we will create a bar graph of the crime types.

```{r compaint-type-graph, fig.alt="Bar chart showing the frequency of crime types happening on the street that are not domestic.", fig.cap="Bar chart showing the frequency of crime types happening on the street that are not domestic.", fig.height=5, fig.width=7}
# Visualizing the distribution, ordered by frequency
chicago_crimes |>
  count(primary_type) |>
  ggplot(aes(
    x = n,
    y = reorder(primary_type, n)
  )) +
  geom_col(fill = "steelblue") +
  theme_minimal() +
  labs(
    title = "Top 50 Crime Types on the Street That Are Not Domestic",
    x = "Number of Crimes",
    y = "Primary Crime Type"
  )
```

This graph shows us not only *which* crimes were committed, but *how many* of each crime occurred. This suggests that theft is the most common crime type among recent non-domestic street incidents.

## Summary

The `chiOpenData` package serves as a robust interface for the Chicago Open Data portal, streamlining the path from raw city APIs to actionable insights. By abstracting the complexities of data acquisition—such as pagination, type-casting, and complex filtering—it allows users to focus on analysis rather than data engineering.

As demonstrated in this vignette, the package provides a seamless workflow for targeted data retrieval, automated filtering, and rapid visualization.

## How to Cite

If you use this package for research or educational purposes, please cite it as follows:

Martinez C (2026). chiOpenData: Convenient Access to Chicago Open Data API Endpoints. R package version 0.1.0, <https://martinezc1.github.io/chiOpenData/>.