---
title: "Using SNOMED dictionaries and codelists"
author: "Anoop Shah"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Using SNOMED dictionaries and codelists}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

Using SNOMED dictionaries and codelists
=======================================

This package is designed to make it easier to use SNOMED CT in R, including searching the SNOMED CT dictionary and relations, and creating and manipulating SNOMED CT codelists.

Basic introduction to SNOMED CT
-------------------------------

SNOMED CT is a clinical terminology system which contains thousands of concepts, each of which has a distinct meaning. Each concept has a unique concept ID, and one or more descriptions (synonyms). It also contains a knowledge model (ontology), specifying which concepts are subtypes of another concept or associated in other ways.

The SNOMED dictionaries are contained within four key tables:

- CONCEPT - one row per concept, with key metadata about the concepts
- DESCRIPTION - one row per description
- STATEDRELATIONSHIP - directly stated relationships
- RELATIONSHIP - relationships between terms inferred from other relationships

The following additional tables are also included: 
- SIMPLEMAP - Maps to Clinical Terms Version 3 and some other terms (e.g. COVID-19 terms)
- EXTENDEDMAP - Maps to ICD-10 and OPCS4
- REFSET - Sets of SNOMED CT concepts used for specific clinical / operational purposes 

Each SNOMED CT concept may have any number of synonyms, but there are the following special types:

- Fully Specified Name - this is the most precise description of the concept, but is too verbose for general use.
- Preferred - these is the synonym that is preferred for clinical display in a particular language, and is recorded in the language-specific reference set

Each concept also has a semantic tag, which denotes what type of concept it is (e.g. 'disorder' or 'organism'). This is currently recorded in parentheses at the end of the Fully Specified Name, and can be extracted using the semanticType function in this package.

Loading the SNOMED CT dictionaries
----------------------------------

This package contains functions to import a set of SNOMED CT release files which should be obtained from NHS Digital TRUD (https://isd.digital.nhs.uk/trud/user/guest/group/0/home). The International and UK release files are provided in two separate downloads, and need to be combined to create the  whole dictionary. This can be done using the loadSNOMED() function and specifying a vector of folder paths to load.

The SNOMED CT UK drug extension is available as a separate file and can (in theory) be added to the SNOMED environment, but this has not yet been tested.
  
For most users the 'Snapshot' files, containing the current versions of the entries, will be the most useful, and these are loaded by loadSNOMED() function. The  sample dictionaries in this package are also based on the Snapshot file format. The 'Delta' files contain changes since the previous version, and the 'Full' files contain a history of the changes to each entry.

When the NHS Digital SNOMED CT files are extracted, they will be in a folder such as 'SnomedCT_InternationalRF2_PRODUCTION_20200731T120000Z', and the relevant tables will be in the subfolder 'Snapshot/Terminology' (main SNOMED CT tables) and 'Snapshot/Refset' (mappings and refsets). Code such as the following can be used to load the tables into R (the file paths should be changed to the actual file paths on your system).

Note that the UK SNOMED CT release needs to be loaded together with the corresponding international release. The UK release cannot be used in isolation because it is incomplete. 

```
# Load package
require(Rdiagnosislist)

# Load UK and International SNOMED CT release files into an
# R environment called 'SNOMED'
SNOMED <- loadSNOMED(c(
  'SnomedCT_InternationalRF2_PRODUCTION_20200731T120000Z/',
  'SnomedCT_UKClinicalRF2_PRODUCTION_20210317T000001Z/'))

# Save the 'SNOMED' environment to a file on disk
saveRDS(SNOMED, file = 'mySNOMED.RDS')

# Reload the 'SNOMED' environment from file
SNOMED <- readRDS('mySNOMED.RDS')
```

Using R environments
--------------------

The SNOMED CT dictionary files are loaded into an R 'environment', which is an object that can contain other objects. This is a convenient way in R to store a group of objects without cluttering up the global environment with too many individual objects. It also allows different versions of SNOMED CT dictionaries to be used side by side, by loading them into different environments. For ease of use, many of the functions in the package will search for the dictionaries in an environment named 'SNOMED' by default, but for programming use it is recommended to specify the environment explicitly in function calls to avoid unexpected errors. 

For the purpose of this vignette, we will create a sample set of SNOMED CT files from the sample dictionaries included with the package. 

```{r}
# The sampleSNOMED() function returns an environment containing
# the sample dictionaries
require(Rdiagnosislist)
TEST <- sampleSNOMED()

# TEST is now an environment containing the sample SNOMED CT dictionary.
# Objects within the environment can be retrieved using the $ operator
# or the 'get' function. We will export the sample dictionaries to a
# temporary folder to show how to reload them using loadSNOMED()
exportSNOMEDenvir(TEST, tempdir())
dir(tempdir())

# loadSNOMED searches for files containing '_Concept_', '_Description_',
# '_StatedRelationship_', '_Relationship_', 'Refset_SimpleMap',
# 'Refset_ExtendedMap' or 'Refset_Simple', as in the actual SNOMED CT
# release files.

# Import using the loadSNOMED function
SNOMED <- loadSNOMED(tempdir(), active_only = FALSE)
```

SNOMED CT concepts IDs in R
---------------------------

SNOMED CT concept IDs are long integers which need to be represented using the integer64 data type in R. This is not available in base R but is provided in the bit64 package which is automatically loaded with this package. They must not be stored as short integer or numeric values because they cannot be stored precisely and may be incorrect.

This package makes it easier to use SNOMED CT concept IDs because they can be supplied as character vectors and converted to 'SNOMEDconcept' vectors. The SNOMEDconcept class is a 64-bit integer class which can faithfully store SNOMED CT concept IDs, and is more memory-efficient than storing them as character vectors. If the SNOMED dictionary is available in the R environment, the concepts are displayed with their description in the default print method.

The function 'as.SNOMEDconcept' can be used to retrieve the SNOMED concept ID matching a description. This function also converts SNOMED CT concept IDs in other formats (numeric, 64-bit integer or character) into SNOMEDconcept objects.

```{r}
# Make sure the SNOMED environment is available and contains the SNOMED dictionary
SNOMEDconcept('Heart failure', SNOMED = SNOMED)

# To use the sample SNOMED dictionary for testing
SNOMEDconcept('Heart failure', SNOMED = sampleSNOMED())

# If an object named SNOMED containing the SNOMED dictionary is available
# in the current environment, it does not need to be stated in the
# function call
SNOMED <- sampleSNOMED()
SNOMEDconcept('Heart failure')

# The argument 'exact' can be used to specify whether a regular expression
# search should be done, e.g.
SNOMEDconcept('Heart f', exact = FALSE)

# The 'description' function can be used to return the descriptions of
# the concepts found. It returns a data.table with the fully specified 
# name for each term.
description(SNOMEDconcept('Heart f', exact = FALSE))

# The 'semantic type' function returns the semantic type of the concept
# from the Fully Specified Name
semanticType(SNOMEDconcept('Heart failure'))

# Functions which expect a SNOMEDconcept object, such as semanticType,
# will automatically convert their argument to SNOMEDconcept using the
# function as.SNOMEDconcept
semanticType('Heart failure')
```

Set operations using SNOMEDconcept
----------------------------------

This package provides versions of the set functions 'setdiff', 'intersect' and 'union' which work with SNOMEDconcept objects.

```{r}
# A list of concepts with a description containing the term 'heart'
# (not that all synonyms are searched, not just the Fully Specified Names)
heart <- SNOMEDconcept('Heart|heart', exact = FALSE, SNOMED = sampleSNOMED())

# A list of concepts containing the term 'fail'
fail <- SNOMEDconcept('Fail|fail', exact = FALSE, SNOMED = sampleSNOMED())

# Concepts with heart and fail
intersect(heart, fail)

# Concepts with heart and not fail
setdiff(heart, fail)

# Concepts with heart or fail
union(heart, fail)
```

These set operations can be used to create lists of SNOMED CT concepts of interest, similar to the way researchers use Read codes. However, SNOMED CT also allows the use of hierarchies and relationships to locate concepts by their meaning.

Using relationships between SNOMED CT concepts
----------------------------------------------

The most important relationship is the 'Is a' relationship, also known as parent and child. This makes it easy to find all specific concepts that are a subtype of a more general concept. The relationship functions (parents, ancestors, children and descendants) all take a SNOMEDconcept object as input, or attempt to convert their argument to SNOMEDconcept.

```{r}
SNOMED <- sampleSNOMED()

# Parents (immediate ancestors)
parents('Acute heart failure')

# Ancestors
ancestors('Acute heart failure')

# Children (immediate descendants)
children('Acute heart failure')

# Descendants
descendants('Acute heart failure')
```

Attributes of SNOMED CT concepts
--------------------------------

The 'hasAttributes' function can be used to find terms with particular attributes. For example, to find all disorders with a finding site of the heart, we can use the relatedConcepts function, which retrieves relationships from the RELATIONSHIP and STATEDRELATIONSHIP tables. 

In order to find the finding site of a disorder, we use the 'forward' relationship. In order to find disorders with a particular finding site, we use the relationship in the 'reverse' direction.

```{r}
require(Rdiagnosislist)
SNOMED <- sampleSNOMED()

# List all the attributes of a concept
print(attrConcept('Heart failure'))

# 'Finding site' of a particular disorder
relatedConcepts('Heart failure', 'Finding site')

# Disorders with a 'Finding site' of 'Heart'
relatedConcepts('Heart', 'Finding site', reverse = TRUE)
```

SNOMED CT codelists
-------------------

The SNOMED CT codelist data type allows a list of SNOMED CT concepts to be curated. It allows codelists to be expressed in three formats:

- simple -- a simple list of concepts, one row per concept. It contains at least 2 columns: 'conceptId' and 'term'
- tree -- a hierarchical list, in which each concept may be a single concept or include its descendants. It contains at least 4 columns: 'conceptId', 'term', 'include_desc' (whether descendants of the term are included) and 'included' (TRUE means it is an inclusion concept, FALSE means it is excluded)
- exptree -- tree format codelists in which the descendants are also enumerated, and have a missing (NA) entry in the 'include_desc' column

Tree format codelists may be more concise and easier to understand, but generally need to be converted to the 'simple' format (either using this package or equivalent functionality in a SNOMED CT server) in order to be used in practice, e.g. to run queries in a database. The result of the conversion may vary depending on the version of SNOMED CT used.

SNOMEDcodelist objects can be constructed using the function SNOMEDcodelist. The function 'is.SNOMEDcodelist' checks if an object is a SNOMEDcodelist (and optionally whether it is of the specified type), and 'as.SNOMEDcodelist' converts an object to a SNOMEDcodelist if it is not already.

The following items of metadata can be stored as part of a SNOMEDcodelist object:

- codelist_name
- author
- version
- date
- sct_version (SNOMED CT version taken from the SNOMED environment that was used to create the codelist)
- timestamp (automatically created)
- format ('simple', 'tree' or 'exptree')

```{r}
SNOMED <- sampleSNOMED()

# Create a codelist containing all the descendants of
# the concept 'Heart failure'
my_heart_failure_codelist <- SNOMEDcodelist(
  SNOMEDconcept('Heart failure'), include_desc = TRUE,
  format = 'simple', codelist_name = 'Heart failure')

# Original codelist
print(my_heart_failure_codelist)

# Convert to tree format
tree <- SNOMEDcodelist(my_heart_failure_codelist, format = 'tree')
print(tree)

# Write out codelist to file
# Metadata are stored in a column named 'metadata'
# export(tree, file = paste0(tempdir(), '/hf_codes.csv'))

# Reload codelist from file (including metadata)
# reloaded_codelist <- as.SNOMEDcodelist(
#  data.table::fread(paste0(tempdir(), '/hf_codes.csv')))
# print(reloaded_codelist)
```

'History of' SNOMED CT concepts
------------------

Some SNOMED CT concepts state 'History of' a disorder or procedure, such as 
'History of heart failure'. These concepts may be important to include in a search for patients with a past history of a condition, which may be recorded using either the disorder concept or a history concept.

'History of' concepts are of the semantic type 'Situation' with attributes:

- Temporal context = In the past
- Associated finding = concept ID of the historic condition
- Finding context = Known present
- Subject relationship context = Subject of record

'Suspected' concepts are also of the semantic type 'Situation' and have the following attributes:

- Temporal context = Current or specified time
- Associated finding = concept ID of the condition that is suspected
- Finding context = Suspected
- Subject relationship context = Subject of record

The sample SNOMED dictionary included in this package does not include all these attributes, so the following examples require a full SNOMED CT dictionary to be loaded.

```
find_historic <- function(x, SNOMED){
  # returns a vector of concepts which are historic versions of
  # the supplied concepts
  # Arguments: x = a SNOMEDconcept vector
  #            SNOMED = a SNOMED environment

  R <- unique(relatedConcepts(x,
    type = SNOMEDconcept('Associated finding'), reverse = TRUE))
  Rpast = hasAttributes(R,
    typeIds = SNOMEDconcept('Temporal context'),
    destinationIds = SNOMEDconcept('In the past'), SNOMED = SNOMED)
  Rknown = hasAttributes(R,
    typeIds = SNOMEDconcept('Finding context'),
    destinationIds = SNOMEDconcept('Known present'), SNOMED = SNOMED)
  Rsubj = hasAttributes(R,
    typeIds = SNOMEDconcept('Subject relationship context'),
    destinationIds = SNOMEDconcept('Subject of record'), SNOMED = SNOMED)
  
  R[Rpast & Rknown & Rsubj]
}

find_suspected <- function(x, SNOMED){
  # returns a vector of concepts which are suspected versions of
  # the supplied concepts
  # Arguments: x = a SNOMEDconcept vector
  #            SNOMED = a SNOMED environment

  R <- unique(relatedConcepts(x,
    type = SNOMEDconcept('Associated finding'), reverse = TRUE))
  Rcurrent = hasAttributes(R,
    typeIds = SNOMEDconcept('Temporal context'),
    destinationIds = SNOMEDconcept('Current or specified time'),
    SNOMED = SNOMED)
  Rsuspected = hasAttributes(R,
    typeIds = SNOMEDconcept('Finding context'),
    destinationIds = SNOMEDconcept('Suspected'), SNOMED = SNOMED)
  Rsubj = hasAttributes(R,
    typeIds = SNOMEDconcept('Subject relationship context'),
    destinationIds = SNOMEDconcept('Subject of record'), SNOMED = SNOMED)
  
  R[Rcurrent & Rsuspected & Rsubj]
}

# Example (using full SNOMED CT dictionary):
#
# > find_historic(SNOMEDconcept('Heart failure'), SNOMED)
# [1] "309634009 | History of heart failure in last year (situation)"
# [2] "161505003 | History of heart failure (situation)"
#             
# > find_suspected(SNOMEDconcept('Heart failure'), SNOMED)
# [1] "394887005 | Suspected heart failure (situation)" 
```


SNOMED CT simple refsets
-----------------

Refsets are sets of SNOMED CT terms (like codelists) that are supplied as part of the SNOMED CT distribution and are used for operational purposes. They are included in the REFSET table in the SNOMED dictionary. The getRefset function retrieves a particular refset. Each refset has a name which is a SNOMED CT concept of semantic type 'foundation metadata concept'.

```{r}
SNOMED = sampleSNOMED()

# Obtain a list of available refsets with descriptions and counts
merge(SNOMED$REFSET[, .N, by = list(conceptId = refsetId)],
  SNOMED$DESCRIPTION[, list(conceptId, term)], by = 'conceptId')

# Obtain a refset as a SNOMEDconcept vector
renal_ref <- getRefset('Renal clinical finding simple reference set')

# Find out whether a concept is included in a refset
SNOMEDconcept('Renal failure') %in% renal_ref
```
 

Mapping between SNOMED CT and ICD-10 and OPCS4
----------------------------------------------

Mapping tables are provided in the international and UK SNOMED CT distributions for ICD-10. The UK distribution also includes mappings to CTV3, ICD Oncology (ICD-O) for body locations and OPCS4.

These mappings are designed to enable clinical data entered using SNOMED CT to be converted to ICD-10 or OPCS4 in a semi-automated manner. This package includes functions to use the mappings to convert codelists from one format to another. Beware that these code systems are not equivalent, so it is necessary to check the output codelist to ensure that the mapping makes sense. For ICD-10 maps, only maps with mapCategoryId = 'Map source concept is properly classified' are retrieved.

The table SIMPLEMAPS contains maps to CTV3 and ICD-O, whereas EXTENDEDMAPS contains mappings to ICD-10 and OPCS4. The EXTENDEDMAPS tables contains additional columns to define the type of mapping, but they are not used by the functions in this package.

The `getMaps` function can be used to conveniently return the ICD-10 or OPCS4 codes associated with a SNOMED CT concept. Note that in the output data.table, the icd10_code or opcs4_code column is a list (because there may be multiple entries per SNOMED CT concept). To return each map on a separate row, use getMaps with the option `single_row_per_concept = FALSE`.

```{r}
# Example: creating an ICD-10 heart failure codelist using SNOMED CT
SNOMED <- sampleSNOMED()

my_heart_failure_codelist <- SNOMEDcodelist(
  SNOMEDconcept('Heart failure'), include_desc = FALSE)

getMaps(my_heart_failure_codelist, to = c('icd10'))

my_pacemaker_codelist <- SNOMEDcodelist(
  SNOMEDconcept('Implantation of cardiac pacemaker'),
  include_desc = FALSE)

getMaps(my_pacemaker_codelist, to = c('opcs4'))
```

Mapping between SNOMED CT and Read Clinical Terminology
-------------------------------------------------------
Prior to the introduction of SNOMED CT, the Read Clinical Terminology (version 2 or 3) were used to record clinical information in UK general practice. SNOMED CT terms are partly derived from Read, so all Read terms have a 'forward' mapping to an appropriate SNOMED CT term. NHS Digital provides a set of mappings to enable these conversions in clinical systems that are moving to SNOMED CT. These tables are available from
the Technology Reference data Update Distribution \url{https://isd.digital.nhs.uk/trud3/user/guest/group/0/pack/9/subpack/9/releases}.

This package provides the 'loadMAPS' function for loading the NHS Digital files into an R data.table, the 'MAPS' sample dataset and the 'getMaps' function for retrieving Read Clinical Terms Version 2 (Read 2) and 
Clinical Terms Version 3 (CTV3) that map to a set of SNOMED CT concepts.

These maps can be used for converting SNOMED CT codelists into Read 2 or CTV3 format for running queries, such as to characterise patient phenotypes or identify patient populations for research. They cannot be used in the reverse direction (to map a Read 2/CTV3 codelist to SNOMED CT) because some of the SNOMED CT terms will be missed out, and the list will be incomplete.

Use the 'unlist' function, followed by deduplication, to create a table of concepts in another coding system mapped from SNOMED CT, as in the example below.

```{r}
# Example: creating a Read heart failure codelist using SNOMED CT
SNOMED <- sampleSNOMED()
data(READMAPS)

# Start off with a SNOMED CT codelist containing the descendants of
# the concept 'Heart failure'
my_heart_failure_codelist <- SNOMEDcodelist(
  SNOMEDconcept('Heart failure'), include_desc = TRUE)

single_row_maps <- getMaps(my_heart_failure_codelist,
  mappingtable = READMAPS, to = c('read2', 'ctv3'),
  single_row_per_concept = TRUE)

# Display the maps - one row per concept (long terms truncated)
print(single_row_maps[, list(term = substr(term, 1, 10), read2_code,
  ctv3_concept)])

multi_row_maps <- getMaps(my_heart_failure_codelist,
  mappingtable = READMAPS, to = 'read2',
  single_row_per_concept = FALSE)

# Display the maps - multiple rows per concept (long terms truncated)
print(multi_row_maps[, list(term = substr(term, 1, 10), read2_code,
  read2_term = substr(read2_term, 1, 30))])
  
# Create a standalone Read2 codelist
read_codelist <- data.table::data.table(
  code = unlist(single_row_maps$read2_code),
  term = unlist(single_row_maps$read2_term))
print(read_codelist[!duplicated(read_codelist)][order(code)])
```

More information
----------------

For more information about SNOMED CT, visit the SNOMED CT international website: <https://www.snomed.org/>

SNOMED CT (UK edition) can be downloaded from the NHS Digital site: <https://isd.digital.nhs.uk/trud/user/guest/group/0/home>

The NHS Digital terminology browser can be used to search for terms interactively: <https://termbrowser.nhs.uk/>