Extended Installation Guide

Text enables users access to HuggingFace Transformers in R through the R-package reticulate as an interface to Python, and the python packages torch and transformers. So it’s important to install both the text-package and a python environment with the text required python packages that the text-package can use.

The recommended way is to use textrpp_install() to install a conda environment with text required python packages, and textrpp_initialize to initialize it.

Conda environment

library(text)
library(reticulate)

# Install text required python packages in a conda environment (with defaults).
text::textrpp_install()

# Show available conda environments.
reticulate::conda_list()

# Initialize the installed conda environment.
# save_profile = TRUE saves the settings so that you don't have to run textrpp_initialize() after restarting R. 
text::textrpp_initialize(save_profile = TRUE)

# Test so that the text package work.
textEmbed("hello")

Solving OMP errors and R/Rstudio crashes

Recently some text users (mainly on Mac), have experienced OMP errors - and that RStudio and R crashes. When this is happening we have found the following solutions for now:

Sys.setenv(OMP_NUM_THREADS = "1") #Limit the number of threads to prevent conflicts.

Sys.setenv(OMP_MAX_ACTIVE_LEVELS = "1") 

# Also might have to restart R
.rs.restartR()

# If above does not work, you can also try this; although this solution might have some risks assocaited with it (for more information see https://github.com/dmlc/xgboost/issues/1715)
Sys.setenv(KMP_DUPLICATE_LIB_OK = "TRUE") #Temporarily allows execution despite duplicate OpenMP libraries.

### This is how you can unset the settings
Sys.unsetenv("OMP_NUM_THREADS")
Sys.unsetenv("OMP_MAX_ACTIVE_LEVELS")
Sys.unsetenv("KMP_DUPLICATE_LIB_OK")

# This is how you can verify the settings
print(Sys.getenv("DYLD_LIBRARY_PATH"))


# Please let us know if you find any other solutions. 

Solving Mac OS errors

Failed to build tokenizers

if running: textrpp_install()

results in this error:

Failed to build tokenizers
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects

In the terminal run:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Rust compiler

Error:

"Error: Error installing package(s): ..." 
including: "error: can't find Rust compiler"

In the terminal run:

brew install rust

Virtual environments

It is also possible to use virtual environments (although it is currently only tested on MacOS).

# Create a virtual environment with text required python packages.
# Note that you have to provide a python path.
text::textrpp_install_virtualenv(rpp_version = c("torch==1.7.1", "transformers==4.12.5", "numpy", "nltk"),
                                 python_path = "/usr/local/bin/python3.9",
                                 envname = "textrpp_virtualenv")

# Initialize the virtual environment.
text::textrpp_initialize(virtualenv = "textrpp_virtualenv",
                         condaenv = NULL,
                         save_profile = TRUE)

Versions tested for virtual environment

Virtual environments works for MacOS, whereas github actions does not currently work for Linux and Windows. At gihub actions look for a workflow run called: Virtual environment for more information.

OS Python_version torch transformers Success
‘Mac OS’ ‘3.9.8’ ‘torch==1.11.0’ ‘transformers==4.19.2’ Pass
‘Linux’ ‘3.9.8’ ‘torch==1.11.0’ ‘transformers==4.19.2’ Pass
‘Mac OS’ ‘3.9.8’ ‘torch==1.7.1’ ‘transformers==4.12.5’ Pass
‘Linux’ - - - -
‘Windows’ - - - -

Installation instructions for text 0.9.10

Below is the instructions for installing earlier versions of text (0.9.10 and before); these should work for newer versions of text as long as a correct versions of python and required packages are used.

library(text)

# To install the python packages torch, transformers, numpy and nltk through R, run: 
library(reticulate)
install_miniconda()

conda_install(envname = 'r-reticulate', c('torch==0.4.1', 'transformers==3.3.1', 'numpy', 'nltk'), pip = TRUE)

# Windows 10
conda_install(envname = 'r-reticulate', c('torch==0.4.1', 'transformers==3.3.1', 'numpy', 'nltk'))

Checking your versions

If something isn’t working right, it is a good start to examine what is installed and running on your system. For example to make sure that you have R and Python versions that are up to date.


# First check R-version and which packages that are attached and loaded.  
sessionInfo()

# Second check out python version; and make sure you at least have version 3.6.10
library(reticulate)
py_config()

Issue: RStudio craches during textEmbed

After a new install/update of text, RStudio crashed (Abort session) when running functions that fetches word embeddings (i.e., textEmbedLayersOutput or textEmbed).

Solution: Reinstall reticulate and r-miniconda

To solve the issue re-install reticulate (development version) and uninstall and install r-miniconda.

Uninstall r-miniconda by removing its entire folder (which by default [in Mac] is at Users/YOUR_USER_NAME/Library/r-miniconda).

(Note that [in Mac] the Library folder is hidden, so to make it visible go to Finder and the path Users/YOUR_USER_NAME/ and press the three keys: COMMAND + SHIFT + . . Then the Library-folder should appear, and you can find and remove r-miniconda.

library(text)

# To re-install packages start with a fresh session by restarting R and RStudio

# Install development of reticulate (might not be necessary)
devtools::install_github("rstudio/reticulate")

# After having manually removed the r-miniconda folder, install it again: 
library(reticulate)
install_miniconda()

# Subsequently re-install torch, transformers, numpy and nltk by running: 
conda_install(envname = 'r-reticulate', c('torch==0.4.1', 'transformers==3.3.1', 'numpy', 'nltk'), pip = TRUE)

The exact way to install these packages may differ across systems. Please see:
Python
torch
transformers

Share advise

If you find a good solution please feel free to email oscar [ d_o t] kjell [a_t] psy [DOT] lu [d_o_t]se so that we can update above instructions. >>>>>>> e368e8b (documentation updates)