cpp11tesseract: Open Source OCR Engine

Bindings to 'Tesseract': a powerful optical character recognition (OCR) engine that supports over 100 languages. The engine is highly configurable in order to tune the detection algorithms and obtain the best possible results.

Version: 5.3.2
Imports: pdftools (≥ 1.5), curl, digest
LinkingTo: cpp11
Suggests: magick (≥ 1.7), spelling, knitr, tibble, rmarkdown
Published: 2024-10-22
DOI: 10.32614/CRAN.package.cpp11tesseract
Author: Jeroen Ooms ORCID iD [aut], Mauricio Vargas Sepulveda ORCID iD [aut, cre], Munk School of Global Affairs and Public Policy [fnd]
cpp11tesseract author details
Maintainer: Mauricio Vargas Sepulveda <m.sepulveda at mail.utoronto.ca>
BugReports: https://github.com/pachadotdev/cpp11tesseract/issues
License: Apache License (≥ 2)
URL: https://pacha.dev/cpp11tesseract/
NeedsCompilation: yes
SystemRequirements: Tesseract >= 4.0.0 (libtesseract-dev / tesseract-devel) and Leptonica (libleptonica-dev / leptonica-devel). On Debian you need to install the English and other languages training data separately (e.g. tesseract-ocr-eng or tesseract-ocr-spa).
Language: en-US
Materials: NEWS
CRAN checks: cpp11tesseract results [issues need fixing before 2024-11-05]

Documentation:

Reference manual: cpp11tesseract.pdf
Vignettes: Using the Tesseract OCR engine in R (source, R code)

Downloads:

Package source: cpp11tesseract_5.3.2.tar.gz
Windows binaries: r-devel: cpp11tesseract_5.3.2.zip, r-release: cpp11tesseract_5.3.2.zip, r-oldrel: cpp11tesseract_5.3.2.zip
macOS binaries: r-release (arm64): cpp11tesseract_5.3.2.tgz, r-oldrel (arm64): cpp11tesseract_5.3.2.tgz, r-release (x86_64): cpp11tesseract_5.3.2.tgz, r-oldrel (x86_64): cpp11tesseract_5.3.2.tgz

Linking:

Please use the canonical form https://CRAN.R-project.org/package=cpp11tesseract to link to this page.