Title: | Easy Interface to Search 'SciELO' Database |
Version: | 0.1.0 |
Description: | Provides a simple interface to search and retrieve scientific articles from the 'SciELO' (Scientific Electronic Library Online) database https://scielo.org. It allows querying, filtering, and visualizing results in an interactive table. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Imports: | httr (≥ 1.4.6), xml2 (≥ 1.3.6), rvest (≥ 1.0.3), dplyr (≥ 1.1.4), magrittr (≥ 2.0.3), stats, stringr (≥ 1.5.1) |
URL: | https://github.com/PabloIxcamparij/easyScieloPack |
BugReports: | https://github.com/PabloIxcamparij/easyScieloPack/issues |
NeedsCompilation: | no |
Packaged: | 2025-07-15 06:31:56 UTC; emmas |
Author: | Pablo Ixcamparij [aut, cre] |
Maintainer: | Pablo Ixcamparij <jose.sorto@ucr.ac.cr> |
Repository: | CRAN |
Date/Publication: | 2025-07-18 14:40:08 UTC |
Builds a SciELO search URL based on query object parameters.
Description
Builds a SciELO search URL based on query object parameters.
Usage
build_scielo_url(query_obj, page_from_idx, items_per_page)
Arguments
query_obj |
A 'scielo_query' object. |
page_from_idx |
The 'from' index for pagination (e.g., 1, 16, 31). |
items_per_page |
The 'count' parameter (e.g., 15). |
Value
A character string representing the full SciELO search URL.
Fetch search results from SciELO
Description
This is the core function that performs the web scraping and data extraction. It handles pagination and combines results into a single data frame.
Usage
fetch_scielo_results(query_obj)
Arguments
query_obj |
A |
Value
A data.frame
containing all fetched articles.
Normalize and validate a subject category
Description
Accepts a single subject category and validates it.
Usage
normalize_categories(category)
Arguments
category |
A character vector of length 1. Subject category to filter by (e.g., "environmental sciences"). |
Value
A cleaned category string if valid.
Normalize SciELO collection names or ISO codes
Description
Converts country names or ISO codes into valid SciELO collection codes. Only one value is allowed.
Usage
normalize_collections(collections)
Arguments
collections |
A character vector of length 1: a country name (e.g., "Costa Rica") or a valid SciELO ISO code (e.g., "cri"). |
Value
A character string representing the normalized SciELO collection code.
Examples
normalize_collections("Costa Rica") # returns "cri"
normalize_collections("cri") # returns "cri"
Normalize and validate a journal name
Description
Accepts a single journal name and validates it.
Usage
normalize_journals(journal)
Arguments
journal |
A character vector of length 1. Journal name to filter by (e.g., "Revista Ambiente & Água"). |
Value
A cleaned journal name if valid.
Normalize and validate article language codes
Description
Accepts a single language code ("es", "pt", "en") and validates it.
Usage
normalize_languages(lang_code)
Arguments
lang_code |
A character vector of length 1. Language code to filter by. |
Value
A normalized (lowercase) language code if valid.
Examples
normalize_languages("EN") # returns "en"
Normalize and validate n_max
Description
Ensures that the value of n_max is a positive integer or NULL.
Usage
normalize_nmax(value)
Arguments
value |
The value to validate. |
Value
An integer or NULL.
Parses a single SciELO search results HTML page.
Description
Parses a single SciELO search results HTML page.
Usage
parse_scielo_page(html_page, query_obj)
Arguments
html_page |
An |
query_obj |
A 'scielo_query' object (used for abstract language preference). |
Value
A list of data frames, each representing an article.
Search SciELO and return results as a data.frame
Description
Executes a search in the SciELO database using multiple optional filters, and returns the results as a data frame.
Usage
search_scielo(
query,
lang = "en",
lang_operator = "AND",
n_max = NULL,
journals = NULL,
collections = NULL,
languages = NULL,
categories = NULL,
year_start = NULL,
year_end = NULL
)
Arguments
query |
Search term (e.g., "climate change"). Required. |
lang |
Interface language for SciELO website ("en", "es", "pt"). Default is "en". |
lang_operator |
Operator for combining language filters ("AND" or "OR"). Default is "AND". |
n_max |
Maximum number of results to return. Optional. |
journals |
Vector of journal names to filter. Only one supported. Optional. |
collections |
A character string for filtering by SciELO collection (country name or ISO code, e.g., "Mexico" or "mex"). |
languages |
Vector of article languages to filter (e.g., "en"). |
categories |
Vector of subject categories (e.g., "ecology"). |
year_start |
Start year for filtering articles. Optional. |
year_end |
End year for filtering articles. Optional. |
Details
Note: Only one value per filter category is currently supported (e.g., only one language).
Value
A data.frame with the search results.
Examples
# Simple search with a keyword
df1 <- search_scielo("salud ambiental")
# Limit number of results to 5
df2 <- search_scielo("salud ambiental", n_max = 5)
# Filter by SciELO collection (country name or code)
df3 <- search_scielo("salud ambiental", collections = "Ecuador")
df4 <- search_scielo("salud ambiental", collections = "cri") # Costa Rica by ISO code
# Filter by article language
df5 <- search_scielo("salud ambiental", languages = "es")
# Filter by a specific journal
df6 <- search_scielo("salud ambiental", journals = "Revista Ambiente & Agua")
# Filter by subject category
df7 <- search_scielo("salud ambiental", categories = "environmental sciences")
# Filter by year range
df8 <- search_scielo("salud ambiental", year_start = 2015, year_end = 2020)
Validate year range for SciELO query
Description
Ensures that start and end years are valid numeric values and in correct order.
Usage
years(start_year, end_year)
Arguments
start_year |
Integer. Start year for filtering (inclusive). |
end_year |
Integer. End year for filtering (inclusive). |
Value
A list with named elements year_start
and year_end
.
Examples
valid_years <- years(2018, 2022)