--- title: "Setup and Tutorial" author: "Alexander P. Christensen, Hudson Golino, Aleksandar Tomašević" date: "`r format(Sys.time(), '%B %d, %Y')`" output: pdf_document: keep_tex: true vignette: > %\VignetteIndexEntry{Setup and Tutorial} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} \usepackage[utf8]{inputenc} header-includes: - | ```{=latex} \usepackage{hyperref} \hypersetup{ colorlinks = true, allcolors = blue } ``` --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#" ) ``` ```{r load, eval = TRUE, echo = FALSE, comment = NA, warning = FALSE, message = FALSE} library(transforEmotion) ``` ## 1. Python setup using `setup_miniconda()` {#section_1} *transforEmotion* uses the *reticulate* package to automatically install a standalone miniconda version on your computer and download the necessary Python modules to run the transformer models. After installing *transforEmotion* package, you need to run `setup_miniconda()` function. This function will performs the following tasks: 1. Install a standalone miniconda version on your computer. 2. Setup a virtual Python environment for *transforEmotion* package. 3. Install the necessary Python modules to run the transformer models. By having a standalone miniconda installed through *transforEmotion*, you should not encounter any conflicts between miniconda and existing Python installations. It also ensures that the correct version of Python libraries are installed into the virtual environment, guaranteeing compatibility regardless of your operating system or whether you already have Python installed. The miniconda installation process takes a few minutes to complete. At certain points, it may appear that the installer is stuck, but please allow a few minutes for the installer to finish its process. Remember, `setup_miniconda()` only needs to be run once after installing the package. Unless you run into issues with Python environment and libraries later on, there is no need to run this function again on the same system. ## 2. Using `transformer_scores` {#section_2} You can use any number of \href{https://huggingface.co}{Hugging Face} text classification transformers. *transforEmotion* currently implements only the zero-shot classification models. Future updates to the package may include opportunities to train and fine-tune these models. For now, there are several options that work well for most classification tasks straight out-of-the-box. You can view different transformers that can be used in *transforEmotion* [here](https://huggingface.co/models?pipeline_tag=zero-shot-classification). As mentioned in section [1](#section_1), before running the `transformer_scores` function, you need to execute `setup_miniconda()` to install the necessary modules. Running the `transformer_scores` function will also trigger the download of the [Cross-Encoder's DistilRoBERTa](https://huggingface.co/cross-encoder/nli-distilroberta-base) transformer model. Therefore, the easiest way to get started is by using an example. ```{r transformers_scores, eval = FALSE, echo = TRUE, comment = NA, warning = FALSE, message = FALSE} library(transforEmotion) # Setup Python setup_miniconda() # Load data data(neo_ipip_extraversion) # Example text text <- neo_ipip_extraversion$friendliness[1:5] # positively worded items only # Run transformer function transformer_scores( text = text, classes = c( "friendly", "gregarious", "assertive", "active", "excitement", "cheerful" ) ) ``` The downloads will take some time when you run a specific model for the first time. Assuming everything goes well with the code above, you should see output that looks like this: ```{r transformers_scores_output, eval = FALSE, echo = TRUE, comment = NA, warning = FALSE, message = FALSE} $`make friends easily` friendly gregarious assertive active excitement cheerful 0.579 0.075 0.070 0.071 0.050 0.155 $`warm up quickly to others` friendly gregarious assertive active excitement cheerful 0.151 0.063 0.232 0.242 0.152 0.160 $`feel comfortable around people` friendly gregarious assertive active excitement cheerful 0.726 0.044 0.053 0.042 0.020 0.115 $`act comfortably around people` friendly gregarious assertive active excitement cheerful 0.524 0.062 0.109 0.183 0.019 0.103 $`cheer people up` friendly gregarious assertive active excitement cheerful 0.071 0.131 0.156 0.190 0.362 0.089 ``` If you want to run `transformer_scores` on additional text, simply enter that text into the `text` argument of the function. The transformer models that you've used during your R session will remain in R's environment until you exit R or remove them from your environment. That's it! You've successfully obtained sentiment analysis scores from the [Cross-Encoder's DistilRoBERTa](https://huggingface.co/cross-encoder/nli-distilroberta-base) transformer model. Now, go forth and quantify the qualitative! ## 3. Using `image_scores` {#section_3} *transforEmotion* also provides a Facial Expression Recognition (*FER*) function that uses the [CLIP](https://openai.com/blog/clip/) model to obtain sentiment analysis scores from images. The CLIP model is a transformer model that was trained on a dataset of 400 million image-text pairs. CLIP's input contains an image and text labels and the inference output is a score for each label. In case of emotion labels, the CLIP returns FER scores. The `image_scores` function uses the [CLIP](https://huggingface.co/openai/clip-vit-base-patch32) model from Hugging Face. In a similar way to the `transformer_scores` function, the `image_scores` function also downloads the CLIP model the first time you run it, performs the inference and returns the output as an R dataframe. The main input of this function is the path to image. It can be a local filepath or an URL pointin to an image. The second input is an array of emotion labels. ```{r image_scores, eval = FALSE, echo = TRUE, comment = NA, warning = FALSE, message = FALSE} image <- ""https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Mona_Lisa%2C_by_Leonardo_da_Vinci%2C_from_C2RMF_retouched.jpg/402px-Mona_Lisa%2C_by_Leonardo_da_Vinci%2C_from_C2RMF_retouched.jpg" emotions <- c("excitement", "happiness", "pride", "anger", "fear", "sadness", "neutral") image_scores(image, emotions) ``` This code runs FER on the Mona Lisa painting and returns the following output, a dataframe with 7 columns and 1 row. ```{r image_scores_output, eval = FALSE, echo = TRUE, comment = NA, warning = FALSE} excitement happiness pride anger fear sadness neutral 1 0.02142187 0.02024468 0.0604699 0.04037686 0.03273294 0.1061871 0.7185667 ``` In case there is no face recognized on the image, the function will return an empty dataframe, preceeded by a warning message. We can test this with an image of three penguins from the South Shetland Islands. ```{r image_scores_nas, eval = FALSE, echo = TRUE, comment = NA, warning = FALSE, message = FALSE} image <- "https://upload.wikimedia.org/wikipedia/commons/thumb/0/0e/Adelie_penguins_in_the_South_Shetland_Islands.jpg/640px-Adelie_penguins_in_the_South_Shetland_Islands.jpg" image_scores(image, emotions) ``` This gives the following output: ```{r image_scores_nas_output, eval = FALSE, echo = TRUE, comment = NA, warning = FALSE} No face found in the image data frame with 0 columns and 0 rows ``` ## 4. Using `video_scores' {#section_4} We can use the workflow described in section [3](#section_3) to run FER on videos. The `video_scores` function takes a video as input and returns a dataframe with FER scores for each frame of the video. The video can be a local filepath or an URL pointing to a YouTube video. In case of YouTUbe, the function will download the video and save it in the temporary directory. The function also takes an array of emotion labels as input. After downloading video, this function will extract frames from the video and the run the same workflow as in case of images. Here's an example of running FER on a YouTube video of Boris Johnson. ```{r video_scores, eval = FALSE, echo = TRUE, comment = NA, warning = FALSE, message = FALSE} video_url <- "https://www.youtube.com/watch?v=hdYNcv-chgY&ab_channel=Conservatives" emotions <- c("excitement", "happiness", "pride", "anger", "fear", "sadness", "neutral") result <- video_scores(video_url, classes = emotions, nframes = 10, save_video = TRUE, save_frames = TRUE, video_name = 'boris-johnson', start = 10, end = 120) head(result) ``` Working with videos is more computationally complex. This example extracts only 10 frames from the video and I shouldn't take longer than few minutes on an average laptop without GPU (depending on your internet connection needed to download the entire video and CLIP model). In research applicatons, we will usually extract 100-300 frames from the video. This can take much longer, so pantience is advised while waiting for the results. Working with videos is more complex, so this function takes several additional arguments. The `nframes` argument specifies the number of frames to extract from the video. The `save_video` argument specifies whether to save the video in the temporary directory. The `save_frames` argument specifies whether to save the extracted frames in the temporary directory. The `video_name` argument specifies the name of the video. Saving the video and frames is useful for purposes of quality control and potential evalution of the result by human coders. The `start` and `end` arguments specify the start and end time of the video. The `start` and `end` arguments are in seconds and point to the timestamps of the video where the analysis should start and end. If they are not provided, the function will try to process the entire video. The result is a dataframe with 7 columns and 10 rows, one row for each frame. The dataframe contains FER scores for each frame of the video. The ouput of `head(result)` looks like this: ```{r video_scores_output, eval = FALSE, echo = TRUE, comment = NA, warning = FALSE} excitement happiness pride anger fear sadness neutral 1 0.08960483 0.006041054 0.05632496 0.2259102 0.2781007 0.1757137 0.1683045 2 0.11524552 0.011083936 0.08131301 0.1672127 0.3321840 0.1652457 0.1277151 3 0.09541881 0.007240616 0.05629114 0.1665660 0.3410282 0.1952039 0.1382514 4 0.09860725 0.011296707 0.07909032 0.1693194 0.3010349 0.1759851 0.1646665 5 0.08856109 0.007197607 0.07237346 0.2261922 0.3237688 0.1515539 0.1303529 6 0.10022306 0.011431777 0.09256416 0.1467394 0.3202718 0.1574203 0.1713494 ``` These 3 functions are the core of *transforEmotion* package. They allow you to run sentiment analysis on text, images and videos. If you run into any issues, please report them on the [GitHub issues page](https://github.com/atomashevic/transforEmotion/issues). If you have any suggestions for improvements, please let us know. We are always looking for ways to improve the package and make it more useful for researchers.