--- title: "SemanticDistance_Data_Viz" author: "Jamie Reilly, Hannah R. Mechtenberg, Emily B. Myers, Jonathan E. Peelle" date: "`r Sys.Date()`" vignette: > %\VignetteIndexEntry{SemanticDistance_Data_Viz} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} vignetteBuilder: knitr output: rmarkdown::html_vignette: toc: yes --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` ```{r, message=FALSE, echo=F, warning=F} # Load SemanticDistance library(SemanticDistance) ``` # Data Visualization Options `SemanticDistance` contains two primary visualization options. Most users will be able to plot monologue distances as continuously changing time series using simple approaches like `ggline`, specializing bells and whistles to their own unique needs. The visualization funtions we have included are used for gleaning structure(s) from lists of words. At present, these options include hierarchical cluster analysis (producing a triangle dendrogram) and network analysis (producing a simple undirected graph network). Each of these approaches uses simple machine learning algorithms (kmeans) to determine optimal cluster sizes. # STEP 1: CLEAN AND FORMAT YOUR MONOLOGUE OR LIST ```{r, message=FALSE} #Start from MyCleanList <- clean_monologue_or_list(Unordered_List, wordcol='mytext') knitr::kable(head(MyCleanList, 10), format = "pipe") ``` # STEP 2: CREATE DENDROGRAM or NETWORK From your cleaned and formatted list, visualize relations between words ## Option 1: Hierarchical Cluster Dendrogram Words on any vector of words but only makes sense for unordered word lists! Produces a dendogram from a vector of words. First pulls words, then creates a square matrix with cosine distances for all possible word pairs: d[i,j]. Then converts semantic distance matrix to Euclidean distance. Then plots a hierchcial clustering solution moving words closer together in proximity based on their distance.
Arguments:
`dat` dataframe processed using `clean_monologue_or_list()`
`output` quoted argument `dendrogram` or `network` default is `dendrogram`
`dist_type` quoted argument, which distance norms would you like? default is `embedding` alt is 'SD15' ```{r} mydendro <- wordlist_to_network(MyCleanList, output='dendrogram', dist_type='embedding') print(mydendro) ``` ## Option 2: iGraph network Takes hclust properties from dendrogram steps and creates a simple igraph object.
`dat` dataframe cleaned using `clean_monologue_or_list`
`output` quoted argument `dendrogram` or `network` default is `dendrogram`
`dist_type` default is 'embedding', alt is 'SD15' ```{r, message=FALSE, warning=FALSE} mynetwork <- wordlist_to_network(MyCleanList, output='network', dist_type='embedding') print(mynetwork) ```