rdomains 0.4.0
Breaking Changes
- Removed
get_alexa_data() function (Alexa service
discontinued by Amazon)
Major Changes
- Removed unused aws.alexa dependency
- Removed devtools from Imports (incorrect usage)
- Added modern tidyverse-style API with comprehensive input
validation
- Significant code deduplication through shared helper functions
API Updates
- Updated
virustotal_cat() to use VirusTotal API v3
(previously v2.0)
- Updated documentation references to v3 API endpoints
- Fixed
virustotal_cat() implementation to properly
extract categories from v3 API response structure
Improvements
- All categorization functions now validate inputs with helpful error
messages using cli package
- Standardized parameter naming (virustotal_cat now uses ‘domains’
instead of ‘domain’)
- Better error messages with clear guidance on how to fix issues
- Modernized code style (pipes, purrr, tibble internally with
data.frame output for compatibility)
- Improved file path handling with informative errors
- Enhanced rate limiting in LLM functions
- Cleaner domain preprocessing logic shared across all functions
Internal Changes
- Added helper functions for common operations:
clean_domains() - standardized domain cleaning
validate_domains() - comprehensive input
validation
validate_data_file() - consistent file validation
get_api_key() - unified API key retrieval
build_categorization_prompt() - LLM prompt
construction
apply_rate_limit() - rate limiting logic
- Refactored to use purrr instead of for-loops where appropriate
- All functions now return tibbles for modern data handling
- Added checkmate for robust input validation
- Added readr for faster CSV reading
- Extracted domain cleaning logic to single function
- Improved string operations with stringr
- Removed redundant
:: notation for imported functions
(cleaner code, consistent with @importFrom)
Breaking Changes
- All categorization functions now return tibbles instead of
data.frames
get_alexa_data() has been removed (service
discontinued)
- Input validation is now stricter (NULL and empty strings are
properly rejected)
virustotal_cat() parameter renamed from
domain to domains for consistency
rdomains 0.3.0
- NEW: Added LLM-based domain classification with
openai_cat() and claude_cat() functions
- Support for OpenAI GPT models and Anthropic Claude models for domain
categorization
- Flexible custom category schemas - users can specify their own
categories or use defaults
- Consistent API design matching existing
*_cat()
functions for seamless integration
- Built-in rate limiting and error handling for API calls
- REMOVED: BrightCloud support due to service
unavailability
- Updated documentation URLs from HTTP to HTTPS where applicable
- Fixed Shallalist references to reflect service discontinuation
rdomains 0.2.1
- shallalist stopped its service so downloaded latest shalla db and
changed the URL from which we fetch the shallalist file
rdomains 0.2.0
- URL fixes. in resubmission now because site from which data was
downloaded went down which broke some tests
rdomains 0.1.9
- R package supporting headless browsing has been abandoned. So
removing trusted_cat. Sigh.
rdomains 0.1.8
- Function for checking if domain a university domain using
https://github.com/Hipo/university-domains-list
rdomains 0.1.7
- Changes due to move to a new repo.
- Basic brightcloud function added
rdomains 0.1.6
- Adds not_news classifier that classifies not news based on published
work.
- passes expect_lint_free
rdomains 0.1.5
- Shallalist and DMOZ data read in with stringAsFactors as FALSE.
- Swapped the DMOZ data to domain level category data, included
English translations of non-English categories, quote protection of
multiple categories.
- Accounting for changes in RSelenium — startServer() for instance is
deprecated. But currently only allow for passing of log for
trusted_cat.
- Fixed bug in shalla_cat for multiple domain names arguments
- Fixed small issue with adult_ml1_cat() whose returned data.frame had
a column that was a named list. The column is now a vector.
- If an unknown domain is passed to virustotal, it will return an
empty data.frame rather than throw an error.
rdomains 0.1.0