textreuse: Detect Text Reuse and Document Similarity

Tools for measuring similarity among documents and detecting passages which have been reused. Implements shingled n-gram, skip n-gram, and other tokenizers; similarity/dissimilarity functions; pairwise comparisons; minhash and locality sensitive hashing algorithms; and a version of the Smith-Waterman local alignment algorithm suitable for natural language.

Version: 0.1.5
Depends: R (≥ 3.1.1)
Imports: assertthat (≥ 0.1), digest (≥ 0.6.8), dplyr (≥ 0.8.0), NLP (≥ 0.1.8), Rcpp (≥ 0.12.0), RcppProgress (≥ 0.1), stringr (≥ 1.0.0), tibble (≥ 3.0.1), tidyr (≥ 0.3.1)
LinkingTo: BH, Rcpp, RcppProgress
Suggests: testthat (≥ 0.11.0), knitr (≥ 1.11), rmarkdown (≥ 0.8), covr
Published: 2020-05-15
Author: Lincoln Mullen ORCID iD [aut, cre]
Maintainer: Lincoln Mullen <lincoln at lincolnmullen.com>
BugReports: https://github.com/ropensci/textreuse/issues
License: MIT + file LICENSE
URL: https://docs.ropensci.org/textreuse, https://github.com/ropensci/textreuse
NeedsCompilation: yes
Materials: README NEWS
In views: NaturalLanguageProcessing
CRAN checks: textreuse results

Documentation:

Reference manual: textreuse.pdf
Vignettes: Text alignment
Introduction to the textreuse packages
Minhash and locality-sensitive hashing
Pairwise comparisons for document similarity

Downloads:

Package source: textreuse_0.1.5.tar.gz
Windows binaries: r-prerel: textreuse_0.1.5.zip, r-release: textreuse_0.1.5.zip, r-oldrel: textreuse_0.1.5.zip
macOS binaries: r-prerel (arm64): textreuse_0.1.5.tgz, r-release (arm64): textreuse_0.1.5.tgz, r-oldrel (arm64): textreuse_0.1.5.tgz, r-prerel (x86_64): textreuse_0.1.5.tgz, r-release (x86_64): textreuse_0.1.5.tgz
Old sources: textreuse archive

Reverse dependencies:

Reverse suggests: textrank

Linking:

Please use the canonical form https://CRAN.R-project.org/package=textreuse to link to this page.