--- title: "lightsf: A Curated Collection of Georeferenced and Spatial Datasets" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{A Curated Collection of Georeferenced and Spatial Datasets} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(lightsf) library(ggplot2) library(dplyr) ``` # Introduction The `lightsf` package offers a **curated and diverse collection of georeferenced and spatial datasets** from various domains, enabling researchers, educators, and analysts to easily explore spatial patterns and perform geostatistical analysis in R. This package consolidates datasets from **multiple open and trusted sources**, including **Kaggle, spData, adespatial, chopin, and bivariateLeaflet**, to provide a unified resource for spatial data exploration and visualization. The datasets included in `lightsf` cover a broad spectrum of topics such as **urban studies, housing markets, environmental monitoring, transportation networks, and socio-economic indicators**. Each dataset is carefully formatted and documented to support both **educational purposes** and **applied spatial analysis**. `lightsf` provides data in multiple spatial formats —including **point patterns**, **polygons**, **socio-economic data frames**, and **network-like structures**— allowing users to perform tasks ranging from **basic exploratory mapping** to **advanced spatial modeling**. By centralizing geospatial datasets in a single package, `lightsf` simplifies the workflow for those who wish to learn, teach, or apply spatial data science techniques without the need to gather and preprocess data from multiple sources. ## Dataset Suffixes Each dataset in the `lightsf` package uses a **suffix** to indicate the type of spatial data it contains: - `_pts`: Refers to **point-based datasets** that include georeferenced locations, usually represented by latitude and longitude coordinates. - `_poly`: Refers to **polygon-based datasets**, typically representing areas, administrative boundaries, or spatial zones. - `_points`: Refers to **point datasets** similar to `_pts`, often derived from other spatial sources or including additional spatial or attribute information. These suffixes help users quickly identify the **geometric structure** and **spatial representation** of each dataset included in the `lightsf` package. ## Example Datasets Below are selected example datasets included in the `lightsf` package: - `nc_points`: Mildly clustered **georeferenced points** representing locations in **North Carolina, United States**. - `dc_poly`: **Polygon-based spatial dataset** containing **Washington D.C. census tract data**, suitable for creating **choropleth maps** and exploring demographic or spatial patterns. - `afcon_poly`: **Polygon dataset** representing **spatial patterns of conflict in Africa (1966–1978)**, useful for studying regional clustering and spatial heterogeneity. ## Data Visualization with lightsf Data ### Spatial Patterns of Conflict in Africa (1966–1978) ```{r afcon-poly-plot, fig.width=6, fig.height=4.5, out.width="90%", message=FALSE, warning=FALSE} # Basic exploration of the dataset names(afcon_poly) class(afcon_poly) length(afcon_poly) str(afcon_poly) # Ensure the dataset is a data frame afcon_df <- as.data.frame(afcon_poly) # Create a scatter plot of coordinates colored by total conflicts ggplot(afcon_df, aes(x = x, y = y)) + geom_point(aes(color = totcon, size = totcon), alpha = 0.8) + scale_color_gradient(low = "lightyellow", high = "darkred") + labs( title = "Spatial Patterns of Conflict in Africa (1966–1978)", x = "Longitude", y = "Latitude", color = "Total Conflicts", size = "Conflict Intensity" ) + theme_minimal() + theme( plot.title = element_text(hjust = 0.5), legend.position = "right" ) ``` ## Conclusion The `lightsf` package provides a **curated and diverse collection of georeferenced and spatial datasets** designed to support spatial data analysis, visualization, and education in R. It brings together datasets from multiple open sources, offering ready-to-use spatial data covering topics such as **urban studies, housing markets, environmental monitoring, transportation, and socio-economic indicators**. By providing well-structured and documented datasets in various spatial formats, `lightsf` facilitates **exploratory mapping**, **geostatistical modeling**, and **teaching of spatial analysis concepts**.