This package provides cleaned and formatted data for for entity resolution (record linkage or de-duplication) from the Cora data set. The Cora data set contains 1879 records with citation information on published papers, which includes features such as titles, authors, year published, and other information. The data set has a respective “gold” data set that provides information on which records are a match based on the id.
# Install the development version from GitHub
devtools::install_github(“resteorts/cora”)