--- title: "Beyond F-UJI: reuse, sensitivity, hygiene, and FAIR-TLC" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Beyond F-UJI: reuse, sensitivity, hygiene, and FAIR-TLC} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") library(rfair) ``` Automated FAIR tools have well-documented blind spots. In peer review of a COVID-19 FAIR-assessment study, the reviewer (Melissa Haendel) noted that such tools reward the *presence* of a license, an identifier, or a metadata field without checking whether the data is actually reusable, legitimately restricted, or properly identified. `rfair` adds checks for exactly these. ## A license can be present yet not open for reuse Detecting that a license exists says nothing about whether you may reuse the data. `license_reuse()` classifies the actual permissions, and maps each license to the six-category taxonomy of the [(Re)usable Data Project](https://reusabledata.org) (Carbon et al. 2019). ```{r} license_reuse("https://creativecommons.org/licenses/by/4.0/")[c("category", "rdp_category", "facilitates_reuse")] license_reuse("https://creativecommons.org/licenses/by-nc-nd/4.0/")[c("category", "rdp_category", "facilitates_reuse")] ``` Only *permissive* licenses facilitate reuse without negotiation; CC-BY-NC-ND is present and standard, yet restrictive. ## Controlled-access and sensitive data is not a FAIR failure Data behind a data-use agreement (e.g. human/clinical data) is legitimately restricted; it should be judged on metadata richness, not open download. `classify_access()` flags this, drawing on the (Re)usable Data Project curations. ```{r} classify_access(access_level = "closedAccess", urls = "https://www.ncbi.nlm.nih.gov/gap/?term=phs000424")[c("access", "controlled_access", "sensitive")] ``` ## Identifier hygiene Layered identifiers (an identifier minted on top of another) and non-persistent identifiers reduce interoperability. ```{r} identifier_hygiene("RRID:MGI:5577054")$issues identifier_hygiene("https://doi.org/10.5281/zenodo.8347772")$hygiene_ok ``` ## FAIR-TLC: Traceable, Licensed, Connected The reviewer's own framework extends FAIR with three principles ([Haendel et al., FAIR+](https://doi.org/10.5281/zenodo.203295)): data should be **Traceable** (provenance, attribution), **Licensed** (clearly and reusably), and **Connected** (qualified links to related entities). `fair_tlc()` computes these from an assessment. ```{r, eval = FALSE} a <- assess_fair("https://doi.org/10.5281/zenodo.8347772") fair_tlc(a) #> dimension indicator met #> 1 Traceable T1 Provenance TRUE #> 2 Traceable T2 Attribution TRUE #> 3 Licensed L1 Documented & minimally restrictive TRUE #> 4 Licensed L2 Flowthrough transparency TRUE #> 5 Connected C1 Connectedness TRUE ``` ## The canonical FAIR principles For reference, the authoritative principle definitions (from the FAIR-nanopubs vocabulary used by go-fair.org): ```{r} head(fair_principles(), 4) ```