--- title: "Rationale for De-identification" output: rmarkdown::html_document bibliography: REFS.bib csl: elsevier-vancouver.csl vignette: > %\VignetteIndexEntry{Rationale for De-identification} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` This package deals with methods for handling personally identifiable features in data sets and hence we should address the point of why this is useful and necessary. ‘Personal data’ is information about an individual that could identify them[@EU_2016]. In the United Kingdom, the use of personal data is legislated for under the Data Protection Act, 2018[@DataProtection_2018] with the ‘Information Commissioner’s Office’, or ICO, acting as the independent organisation that upholds information rights. Data protection laws essentially protect a person’s right to privacy; considered to be a fundamental human right in many places around the world[@Diggelmann2014; @rights2007international; @krotoszynski2016privacy]. The misuse of personal data could be harmful to individuals and could lead to things such as identity theft, discrimination or physical harm. In the UK and across Europe the use, or processing, of personal data must have one of six bases in law. These bases are predicated on the necessary use of the personal data for a specific purpose; essentially you can’t just collect and use personal data because you might find it helpful for something sometime in the future. The law in the UK recognizes the importance of research and how some aspects of data protection law could potentially compromise the integrity of the research. For example, research is exempt from the article of UK GDPR that gives data subjects the right to have their personal data erased. This is particularly pertinent to medical research whereby removing data could potentially skew the results which could mean a drug or treatment looks better or worse than it really is. Even though there are research exemptions, the first port of call for the collection or processing or personal data is to ascertain whether non-identifying or anonymous data can be collected instead. Hence, researchers need a simple, reliable, and transparent methodology for the anonymization of data sets. Notably, due to the ability to script and share routines - this package means that researchers can design their anonymization pipeline and share it with the data controllers (with the benefit of no proprietary costs).