---
title: "Goodreader Quick Start Guide"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Goodreader Quick Start Guide}
  %\VignetteEngine{knitr::knitr}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  warning = FALSE, 
  message = FALSE,
  out.width='\\textwidth', fig.height = 4, fig.width = 5, fig.align='center'
)

```

## Installing and loading the package

Install the package:

```{r, eval = FALSE}
install.packages("Goodreader")
```

And load the package:

```{r}
library(Goodreader)
```

## Searching for Books on Goodreads

The `search_goodreads()` function allows you to search for books on Goodreads based on various criteria.

The code below searches for books that include the term “parenting” in the title and returned 10 books sorted by readers’ ratings
```{r, eval = FALSE}
parent_df <- search_goodreads(search_term = "parenting", search_in = "title", num_books = 10, sort_by = "ratings")
```

```{r, eval = FALSE}
summary(parent_df)
##   title              author            book_id         
## Length:10          Length:10          Length:10         
## Class :character   Class :character   Class :character  
## Mode  :character   Mode  :character   Mode  :character  

##     url               ratings     
## Length:10          Min.   : 8427  
## Class :character   1st Qu.:11744  
## Mode  :character   Median :13662  
##                    Mean   :19757  
##                    3rd Qu.:13784  
##                    Max.   :69591  
```

You can also search author's name:

```{r, eval = FALSE}
search_goodreads(search_term = "J.K. Rowling", search_in = "author", num_books = 5, sort_by = "ratings") 
```

The `search_goodreads()` function includes a `sort_by` that sorts the results either by `ratings` or `published_year`:

```{r, eval = FALSE}
search_goodreads(search_term = "J.K. Rowling", search_in = "author", num_books = 5, sort_by = "published_year") 
```

## Scrape book metadata and reviews

After the books are found, save their IDs to a text file. These IDs are used for extracting book metadata and reviews:

```{r, eval = FALSE}
get_book_ids(input_data = parent_df, file_name = "parent_books.txt") #the book IDs are now stored in a text file named “parent_books”
```

Book metadata can then be scraped:

```{r, eval = FALSE}
parent_bookinfo <- scrape_books(book_ids_path = "parent_books.txt", use_parallel = FALSE)
```

To speed up the scraping process:
*Turn on the parallel process: `use_parallel = TRUE` 
*Specify the number of cores for the parallel process (e.g., `num_cores = 8)

```{r, eval = FALSE}
parent_bookreviews <- scrape_reviews(book_ids_path = "parent_books.txt", num_reviews = 10, use_parallel = FALSE) #users can also turn on parallel process to speed up the process
```

## Conduct sentiment analysis

The `analyze_sentiment()` function calculates the sentiment score of each review based on the lexicon chosen by the user. Available options for lexicon are `afinn`, `bing`, and `nrc`. Basic negation scope detection was implemented (e.g., not happy is labeled as negative emotion and is assigned with a negative score).

```{r, eval = FALSE}
sentiment_results <- analyze_sentiment(parent_bookreviews, lexicon = "afinn")
```

The `average_book_sentiment()` function calculates the average sentiment score for each book.

```{r, eval = FALSE}
ave_sentiment <- average_book_sentiment(sentiment_results)
summary(ave_sentiment)
##    book_id          avg_sentiment  
##  Length:10          Min.   : 4.40  
##  Class :character   1st Qu.: 7.25  
##  Mode  :character   Median :12.86  
##                     Mean   :12.95  
##                     3rd Qu.:14.65  
##                     Max.   :27.30
```

The sentiment scores can be plotted as a histogram: 

```{r, eval=FALSE}
sentiment_histogram(sentiment_results)
```
```{r echo=FALSE, out.width='400px'}
knitr::include_graphics('../man/figures/sentiment_hist.png')
```

Or a trend of average sentiment score over time:

```{r, eval=FALSE}
sentiment_trend(sentiment_results, time_period = "year")
```
```{r echo=FALSE, out.width='400px'}
knitr::include_graphics('../man/figures/sentiment_trend.png')
```

## Perform topic modeling

Apply topic modeling to the reviews data:
```{r, eval = FALSE}
reviews_topic <- model_topics(parent_bookreviews, num_topics = 3, num_terms = 10, english_only = TRUE)
## Topic 1:  
## parent, children, need, one, way, good, get, work, dont, give 
## 
## Topic 2:  
## parent, child, book, emot, feel, help, also, can, children, use 
## 
## Topic 3:  
## book, just, kid, think, read, like, time, say, realli, much
```

Plot the top terms by topic:
```{r, eval=FALSE}
plot_topic_terms(reviews_topic)
```
```{r echo=FALSE, out.width='400px'}
knitr::include_graphics('../man/figures/topic_terms.png')
```

Create a word cloud for each topic:
```{r, eval=FALSE}
gen_topic_clouds(reviews_topic)
```

Topic 1:
```{r echo=FALSE, out.width='400px'}
knitr::include_graphics('../man/figures/Topic1.png')
```

Topic 2:
```{r echo=FALSE, out.width='400px'}
knitr::include_graphics('../man/figures/Topic2.png')
```

Topic 3:
```{r echo=FALSE, out.width='400px'}
knitr::include_graphics('../man/figures/Topic3.png')
```


## Other utility functions

The following table shows other utility functions to extract book-related information 

```{r echo=FALSE} 
library(dplyr)
data.frame(Function = c("get_book_ids()", "get_book_summary()", "get_author_info()", "get_genres()", "get_published_time()", "get_num_pages()", "get_format_info()", "get_rating_distribution()"), 
           Output = c("Text file",  "List", "List", "List", "List", "List", "List", "List"),    
           Description = c("Retrieve the book IDs from the input data and save to a text file ", "Retrieve the summary for each book", "Retrieve the author information for each book", "Extract the genres for each book", "Retrieve the published time for each book", "Retrieve the number of pages for each book", "Retrieve the format information for each book", "Retrieve the rating distribution for each book")) %>%
  knitr::kable() 
```