Title: Economic Entity Identifier Standardization
Version: 0.0.2
Description: Provides utility functions for standardizing economic entity (economy, aggregate, institution, etc.) name and id in economic datasets such as those published by the International Monetary Fund and World Bank. Aims to facilitate consistent data analysis, reporting, and joining across datasets. Used as a foundational building block in the 'econdataverse' family of packages (https://www.econdataverse.org).
License: MIT + file LICENSE
Encoding: UTF-8
RoxygenNote: 7.3.2
Imports: cli, dplyr, purrr, rlang, stringr, tibble, tidyr
Suggests: testthat (≥ 3.0.0), withr
Config/testthat/edition: 3
URL: https://teal-insights.github.io/r-econid/, https://github.com/Teal-Insights/r-econid
BugReports: https://github.com/Teal-Insights/r-econid/issues
Depends: R (≥ 4.1.0)
LazyData: true
NeedsCompilation: no
Packaged: 2025-06-26 14:58:02 UTC; teal_emery
Author: L. Teal Emery [cre], Christopher C. Smith [aut], Christoph Scheuch [ctb], Teal Insights [cph]
Maintainer: L. Teal Emery <lte@tealinsights.com>
Repository: CRAN
Date/Publication: 2025-07-07 20:00:02 UTC

econid: Economic Entity Identifier Standardization

Description

logo

Provides utility functions for standardizing economic entity (economy, aggregate, institution, etc.) name and id in economic datasets such as those published by the International Monetary Fund and World Bank. Aims to facilitate consistent data analysis, reporting, and joining across datasets. Used as a foundational building block in the 'econdataverse' family of packages (https://www.econdataverse.org).

Author(s)

Maintainer: L. Teal Emery lte@tealinsights.com

Authors:

Other contributors:

See Also

Useful links:


Add a custom entity pattern

Description

This function allows users to extend the default entity patterns with a custom entry.

Usage

add_entity_pattern(
  entity_id,
  entity_name,
  entity_type,
  aliases = NULL,
  entity_regex = NULL
)

Arguments

entity_id

A unique identifier for the entity.

entity_name

The standard (canonical) name of the entity.

entity_type

A character string describing the type of entity ("economy", "organization", "aggregate", or "other").

aliases

An optional character vector of alternative names identifying the entity. If provided, these are automatically combined (using the pipe operator, "|") with entity_name and entity_id to construct a regular expression pattern.

entity_regex

An optional custom regular expression pattern. If supplied, it overrides the regex automatically constructed from aliases.

Details

Custom entity patterns can be added at the top of a script (or interactively) and will be appended to the built-in patterns when using list_entity_patterns(). This makes it possible for users to register alternative names (aliases) for entities that might appear in their economic datasets.

The custom entity patterns are kept separately and are appended to the default patterns when retrieving the entity_patterns via list_entity_patterns(). The custom patterns will only persist for the length of the R session.

Value

NULL. As a side effect of the function, the custom pattern is stored in an internal tibble for the current session.

Examples

add_entity_pattern(
  "ASN",
  "Association of Southeast Asian Nations",
  "economy",
  aliases = c("ASEAN")
)
patterns <- list_entity_patterns()
print(patterns[patterns$entity_id == "ASN", ])



Create entity Name Regex Pattern

Description

Creates a regular expression pattern from one or more entity names, following standardized rules for flexible matching. The function converts each input name to lowercase, escapes special regex characters, and replaces spaces with a flexible whitespace pattern (⁠.?⁠). The individual patterns are then joined with the pipe operator (|) to produce a regex that matches any of the supplied names.

Usage

create_entity_regex(names)

Arguments

names

A character vector of entity names.

Value

A character string containing the combined regex pattern.


Entity Patterns

Description

A dataset containing patterns for matching entity names. This dataset is accessible through list_entity_patterns.

Usage

entity_patterns

Format

A data frame with the following columns:

entity_id

Unique identifier for the entity

entity_name

entity name

iso3c

ISO 3166-1 alpha-3 code

iso2c

ISO 3166-1 alpha-2 code

entity_type

Type of entity ("economy", "organization", "aggregate", or "other")

entity_regex

Regular expression pattern for matching entity names

Source

Data manually prepared by Teal L. Emery


List entity patterns

Description

This function returns a tibble containing regular expression patterns for identifying economic indicators. It combines the patterns from the built-in entity_patterns dataset with any custom patterns stored in the .econid_env environment.

Usage

list_entity_patterns()

Value

A data frame with the following columns:

entity_id

entity id

entity_name

entity name

iso2c

ISO 3166-1 alpha-2 code

iso3c

ISO 3166-1 alpha-3 code

entity_type

entity type

entity_regex

Regular expression pattern for matching entity names

Examples

patterns <- list_entity_patterns()


Match entities with patterns using regex matching

Description

Given a data frame and a vector of target columns, perform regex matching on the target columns until all entities are matched or we run out of columns to match. Warn about ambiguous matches (duplicate entity_id values). Return a data frame mapping the target columns to the entity patterns.

Usage

match_entities_with_patterns(
  data,
  target_cols,
  patterns,
  warn_ambiguous = TRUE
)

Arguments

data

A data frame containing the columns to match

target_cols

Character vector of column names to match

patterns

Data frame containing entity patterns; if NULL, uses list_entity_patterns()

warn_ambiguous

Logical; whether to warn about ambiguous matches

Value

A data frame with the unique combinations of the target columns mapped to the entity patterns


Reset custom entity patterns

Description

This function resets all custom entity patterns that have been added during the current R session.

Usage

reset_custom_entity_patterns()

Value

Invisibly returns NULL.

Examples

add_entity_pattern("EU", "European Union", "economy")
reset_custom_entity_patterns()
patterns <- list_entity_patterns()
print(patterns[patterns$entity_id == "EU", ])


Standardize Entity Identifiers

Description

Standardizes entity identifiers (e.g., name, ISO code) in an economic data frame by matching them against a predefined list of regex patterns to add columns containing standardized identifiers to the data frame.

Usage

standardize_entity(
  data,
  ...,
  output_cols = c("entity_id", "entity_name", "entity_type"),
  prefix = NULL,
  fill_mapping = NULL,
  default_entity_type = NA_character_,
  warn_ambiguous = TRUE,
  overwrite = TRUE,
  warn_overwrite = TRUE,
  .before = NULL
)

Arguments

data

A data frame or tibble containing entity identifiers to standardize

...

Columns containing entity names and/or IDs. These can be specified using unquoted column names (e.g., entity_name, entity_id) or quoted column names (e.g., "entity_name", "entity_id"). Must specify at least one column. If two columns are specified, the first is assumed to be the entity name and the second is assumed to be the entity ID.

output_cols

Character vector specifying desired output columns. Options are "entity_id", "entity_name", "entity_type", "iso3c", "iso2c". Defaults to c("entity_id", "entity_name", "entity_type").

prefix

Optional character string to prefix the output column names. Useful when standardizing multiple entities in the same dataset (e.g., "country", "counterpart"). If provided, output columns will be named prefix_entity_id, prefix_entity_name, etc. (with an underscore automatically inserted between the prefix and the column name).

fill_mapping

Named character vector specifying how to fill missing values when no entity match is found. Names should be output column names (without prefix), and values should be input column names (from ...). For example, c(entity_id = "country_code", entity_name = "country_name") will fill missing entity_id values with values from the country_code column and missing entity_name values with values from the country_name column.

default_entity_type

Character or NA; the default entity type to use for entities that do not match any of the patterns. Options are "economy", "organization", "aggregate", "other", or NA_character_. Defaults to NA_character_. This argument is only used when "entity_type" is included in output_cols.

warn_ambiguous

Logical; whether to warn about ambiguous matches

overwrite

Logical; whether to overwrite existing entity_* columns

warn_overwrite

Logical; whether to warn when overwriting existing entity_* columns. Defaults to TRUE.

.before

Column name or position to insert the standardized columns before. If NULL (default), columns are inserted at the beginning of the dataframe. Can be a character vector specifying the column name or a numeric value specifying the column index. If the specified column is not found in the data, an error is thrown.

Value

A data frame with standardized entity information merged with the input data. The standardized columns are placed directly to the left of the first target column.

Examples

# Standardize entity names and IDs in a data frame
test_df <- tibble::tribble(
  ~entity,         ~code,
  "United States",  "USA",
  "united.states",  NA,
  "us",             "US",
  "EU",             NA,
  "NotACountry",    NA
)

standardize_entity(test_df, entity, code)

# Standardize with fill_mapping for unmatched entities
standardize_entity(
  test_df,
  entity, code,
  fill_mapping = c(entity_id = "code", entity_name = "entity")
)

# Standardize multiple entities in sequence with a prefix
df <- data.frame(
  country_name = c("United States", "France"),
  counterpart_name = c("China", "Germany")
)
df |>
  standardize_entity(
    country_name
  ) |>
  standardize_entity(
    counterpart_name,
    prefix = "counterpart"
  )


Validate inputs for entity standardization

Description

Validates the input data frame and column names for entity standardization.

Usage

validate_entity_inputs(
  data,
  target_cols_names,
  output_cols,
  prefix,
  fill_mapping = NULL
)

Arguments

data

A data frame or tibble to validate

target_cols_names

Character vector of column names containing entity identifiers

output_cols

Character vector of requested output columns

prefix

Optional character string to prefix the output column names

fill_mapping

Named character vector specifying how to fill missing values

Value

Invisible NULL