--- title: "Introduction to the scimetr package" author: "UDC Ranking's Group" date: '`r paste0("scimetr ", packageVersion("scimetr"),": ", Sys.Date())`' output: rmarkdown::html_vignette: toc: yes toc_depth: 3 vignette: > %\VignetteIndexEntry{Introduction to the scimetr package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set(fig.dim = c(8, 6), fig.align = "center", out.width = "80%") old.opt <- options(digits = 5) # rebuid <- FALSE # TRUE # knitr::spin("scimetr.R", knit = FALSE) # knitr::purl("scimetr.Rmd", documentation = 2) ``` ```{r } library(scimetr) ``` This vignette illustrates the use of the [`scimetr`](https://rubenfcasal.github.io/scimetr/) package for performing bibliometric analyses using datasets exported from Web of Science, highlighting the main workflows and functionalities. The package provides tools for scientometric and bibliometric research, including routines to import bibliographic records from [*Clarivate Analytics Web of Science*](https://www.webofscience.com/wos/) (WoS) and conduct bibliometric analyses. A list of other useful R packages for this type of analysis is available [here](https://rubenfcasal.github.io/scimetr/articles/docs/R_packages.html). # Installation Since the package is not yet available on CRAN, you need to install the development version from the GitHub repository [rubenfcasal/scimetr](https://github.com/rubenfcasal/scimetr): ```{r eval=FALSE} # install.packages("remotes") remotes::install_github("rubenfcasal/scimetr") ``` Alternatively, Windows users may install the corresponding *scimetr_X.Y.Z.zip* file in the [releases section](https://github.com/rubenfcasal/scimetr/releases/latest) of the github repository. It is recommended to first install its dependencies: ```{r eval=FALSE} # Dependencies install.packages(c("dplyr", "tidyr", "stringr", "ggplot2", "scales", "rlang", "openxlsx")) # Last released version install.packages("https://github.com/rubenfcasal/scimetr/releases/download/v1.2.0/scimetr_1.2.0.zip", repos = NULL) ``` Once the package is installed, it can be loaded as usual. # Bibliographic data We will focus exclusively on importing publication data from [WoS](https://www.webofscience.com/wos/) in text format. First, you need to download the corresponding files from the WoS website, for example, by following the steps described [here](https://rubenfcasal.github.io/scimetr/articles/WoS_export.html). ## Loading WoS data from a directory WoS files (which by default are limited to 500 records each) can be automatically loaded from a subdirectory: ```{r eval=FALSE} dir("UDC_2018-2023 (01-02-2024)", pattern = "*.txt") ``` ```{r echo=FALSE} # dput(dir("UDC_2014-2023 (01-02-2024)", pattern='*.txt')) c( "savedrecs01.txt", "savedrecs02.txt", "savedrecs03.txt", "savedrecs04.txt", "savedrecs05.txt", "savedrecs06.txt", "savedrecs07.txt", "savedrecs08.txt", "savedrecs09.txt", "savedrecs10.txt" ) ``` To combine the files into a `data.frame`, the `import_wos()` function is used: ```{r eval=FALSE} wos.data <- import_wos("UDC_2018-2023 (01-02-2024)") ``` Next, the database must be created using the `db_bib()` function, as shown later. ## Example data The package includes the example dataset `wosdf` (obtained using the `import_wos()` function), corresponding to a WoS search by the Affiliation field of *Universidade da Coruña (UDC)* (Affiliation: OG = Universidade da Coruna) in the research area `"Mathematics"` during the years 2018–2023. All data tables have an associated `variable.labels` attribute with the variable labels. These will be displayed below the variable names when viewed in RStudio (e.g. `View(wosdf)`). ```{r } wos.labels <- attr(wosdf, "variable.labels") knitr::kable(head(data.frame(wos.labels)), col.names = c("Variable", "Label") ) ``` ... A full list of the variables used in the database tables is shown in the final section [*Variable list*](#variables) of this document. # Bibliographic database `scimetr` uses lists with `data.frame` components as relational databases. To create the [bibliographic database](https://en.wikipedia.org/wiki/Bibliographic_database), use the `db_bib()` function (the result is a `wos.db`-class S3 object): ```{r wosdf} db <- db_bib(wosdf, label = "Mathematics_UDC_2018-2023") names(db) ``` ## Summaries You can generate either global summaries or yearly summaries of your database. ### Global summary The `summary()` method of a bibliographic database `(summary.wos.db()`), provides an overview of the entire database, including total documents, authors, journals, citations, and other aggregated statistics. ```{r summary} res1 <- summary(db) res1 ``` ### Yearly summary The `summary_year()` method breaks down the summary *by year*, showing trends over time in publications, citations, and other key metrics. ```{r } res2 <- summary_year(db) res2 ``` ## Visualizations The [`ggplot2`](https://ggplot2.tidyverse.org) package is used to create a wide variety of visualizations from the database. There are three main types of plots you can create: - Database plots (`plot(db)`). - Summary plots (`plot(summary(db)`). - Yearly summary plots (`plot(summary_year(db))`). Note: All `plot()` methods invisible return a list with the generated `ggplot2` objects (use `plot = FALSE` to avoid plotting). ### Database plots The `plot()` method of a bibliographic database (`plot.wos.db()`) provides a general visualization of its contents. ```{r plotdb, warning=FALSE, message=FALSE} plot(db) ``` ### Summary plots The plot method of a summary result (`plot.summary.wos.db()`) visualizes the results generating different types of plots: standard bar, line plots or Pie charts (`pie = TRUE`). ```{r } plot(res1) plot(res1, pie = TRUE) ``` ### Yearly summary plots The `plot()` method of a yearly summary (`plot.summary.year()`) visualizes the results generating different types of plots: standard bar and line plots for trends over time, or boxplots (`boxplot = TRUE`), to show variability within each year. ```{r } plot(res2) plot(res2, boxplot = TRUE) ``` ## Filtering To filter elements (entities) of the database, you can use the functions `get_id_