In this example we will create a candidate codelist for osteoarthritis, exploring how different search strategies may impact our final codelist. First, let’s load the necessary packages and create a cdm reference using mock data.
The mock data has the following hypothetical concepts and relationships:
We will start by creating a codelist with keywords match. Let’s say
that we want to find those codes that contain “Musculoskeletal disorder”
in their concept_name:
getCandidateCodes(
cdm = cdm,
keywords = "Musculoskeletal disorder",
domains = "Condition",
standardConcept = "Standard",
includeDescendants = FALSE,
searchInSynonyms = FALSE,
searchNonStandard = FALSE,
includeAncestor = FALSE
)
#> # A tibble: 1 × 6
#> concept_id found_from concept_name domain_id vocabulary_id standard_concept
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 1 From initial… Musculoskel… Condition SNOMED S
Note that we could also identify it based on a partial match or based on all combinations match.
getCandidateCodes(
cdm = cdm,
keywords = "Musculoskeletal",
domains = "Condition",
standardConcept = "Standard",
searchInSynonyms = FALSE,
searchNonStandard = FALSE,
includeDescendants = FALSE,
includeAncestor = FALSE
)
#> # A tibble: 1 × 6
#> concept_id found_from concept_name domain_id vocabulary_id standard_concept
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 1 From initial… Musculoskel… Condition SNOMED S
getCandidateCodes(
cdm = cdm,
keywords = "Disorder musculoskeletal",
domains = "Condition",
standardConcept = "Standard",
searchInSynonyms = FALSE,
searchNonStandard = FALSE,
includeDescendants = FALSE,
includeAncestor = FALSE
)
#> # A tibble: 1 × 6
#> concept_id found_from concept_name domain_id vocabulary_id standard_concept
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 1 From initial… Musculoskel… Condition SNOMED S
Notice that currently we are only looking for concepts with
domain = "Condition"
. However, we can expand the search to
all domains using domain = NULL
.
Now we will include standard and non-standard concepts in our initial
search. By setting
standardConcept = c("Non-standard", "Standard")
, we allow
the function to return, in the final candidate codelist, both the
non-standard and standard codes that have been found.
getCandidateCodes(
cdm = cdm,
keywords = "Musculoskeletal disorder",
domains = "Condition",
standardConcept = c("Non-standard", "Standard"),
searchInSynonyms = FALSE,
searchNonStandard = FALSE,
includeDescendants = FALSE,
includeAncestor = FALSE
)
#> # A tibble: 2 × 6
#> concept_id found_from concept_name domain_id vocabulary_id standard_concept
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 1 From initial… Musculoskel… Condition SNOMED S
#> 2 24 From initial… Other muscu… Condition SNOMED <NA>
We can also search for multiple keywords simultaneously, capturing all of them with the following search:
getCandidateCodes(
cdm = cdm,
keywords = c(
"Musculoskeletal disorder",
"arthritis"
),
domains = "Condition",
standardConcept = c("Standard"),
includeDescendants = FALSE,
searchInSynonyms = FALSE,
searchNonStandard = FALSE,
includeAncestor = FALSE
)
#> # A tibble: 4 × 6
#> concept_id found_from concept_name domain_id vocabulary_id standard_concept
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 1 From initial… Musculoskel… Condition SNOMED S
#> 2 3 From initial… Arthritis Condition SNOMED S
#> 3 4 From initial… Osteoarthri… Condition SNOMED S
#> 4 5 From initial… Osteoarthri… Condition SNOMED S
Now we will include the descendants of an identified code using
includeDescendants
argument
getCandidateCodes(
cdm = cdm,
keywords = "Musculoskeletal disorder",
domains = "Condition",
standardConcept = "Standard",
includeDescendants = TRUE,
searchInSynonyms = FALSE,
searchNonStandard = FALSE,
includeAncestor = FALSE
)
#> # A tibble: 5 × 6
#> concept_id found_from concept_name domain_id vocabulary_id standard_concept
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 1 From initial… Musculoskel… Condition SNOMED S
#> 2 2 From descend… Osteoarthro… Condition SNOMED S
#> 3 3 From descend… Arthritis Condition SNOMED S
#> 4 4 From descend… Osteoarthri… Condition SNOMED S
#> 5 5 From descend… Osteoarthri… Condition SNOMED S
Notice that now, in the column found_from
, we can see
that we have obtain concept_id=1
from an initial search,
and concept_id_=c(2,3,4,5)
when searching for descendants
of concept_id 1.
We can also exclude specific keywords using the argument
exclude
getCandidateCodes(
cdm = cdm,
keywords = "Musculoskeletal disorder",
domains = "Condition",
exclude = c("Osteoarthrosis", "knee"),
standardConcept = "Standard",
includeDescendants = TRUE,
searchInSynonyms = FALSE,
searchNonStandard = FALSE,
includeAncestor = FALSE
)
#> # A tibble: 3 × 6
#> concept_id found_from concept_name domain_id vocabulary_id standard_concept
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 1 From initial… Musculoskel… Condition SNOMED S
#> 2 3 From descend… Arthritis Condition SNOMED S
#> 3 5 From descend… Osteoarthri… Condition SNOMED S
To include the ancestors one level above the identified concepts, we
can use the argument includeAncestor
codes <- getCandidateCodes(
cdm = cdm,
keywords = "Osteoarthritis of knee",
includeAncestor = TRUE,
domains = "Condition",
standardConcept = "Standard",
includeDescendants = TRUE,
searchInSynonyms = FALSE,
searchNonStandard = FALSE,
)
codes
#> # A tibble: 2 × 6
#> concept_id found_from concept_name domain_id vocabulary_id standard_concept
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 4 From initial… Osteoarthri… Condition SNOMED S
#> 2 3 From ancestor Arthritis Condition SNOMED S
We can also pick up codes based on their synonyms. For example,
Osteoarthrosis has a synonym of
Arthritis.
getCandidateCodes(
cdm = cdm,
keywords = "osteoarthrosis",
domains = "Condition",
searchInSynonyms = TRUE,
standardConcept = "Standard",
includeDescendants = FALSE,
searchNonStandard = FALSE,
includeAncestor = FALSE
)
#> # A tibble: 2 × 6
#> concept_id found_from concept_name domain_id vocabulary_id standard_concept
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 2 From initial… Osteoarthro… Condition SNOMED S
#> 2 3 In synonyms Arthritis Condition SNOMED S
Notice that if includeDescendants = TRUE
,
Arthritis descendants will also be included:
getCandidateCodes(
cdm = cdm,
keywords = "osteoarthrosis",
domains = "Condition",
searchInSynonyms = TRUE,
standardConcept = "Standard",
includeDescendants = TRUE,
searchNonStandard = FALSE,
includeAncestor = FALSE
)
#> # A tibble: 4 × 6
#> concept_id found_from concept_name domain_id vocabulary_id standard_concept
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 2 From initial… Osteoarthro… Condition SNOMED S
#> 2 3 In synonyms Arthritis Condition SNOMED S
#> 3 4 From descend… Osteoarthri… Condition SNOMED S
#> 4 5 From descend… Osteoarthri… Condition SNOMED S
We can also pick up concepts associated with our keyword via
non-standard search.
codes1 <- getCandidateCodes(
cdm = cdm,
keywords = "Degenerative",
domains = "Condition",
standardConcept = "Standard",
searchNonStandard = TRUE,
includeDescendants = FALSE,
searchInSynonyms = FALSE,
includeAncestor = FALSE
)
codes1
#> # A tibble: 1 × 6
#> concept_id found_from concept_name domain_id vocabulary_id standard_concept
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 2 From non-sta… Osteoarthro… Condition SNOMED S
Let’s take a moment to focus on the standardConcept
and
searchNonStandard
arguments to clarify the difference
between them. standardConcept
specifies whether we want
only standard concepts or also include non-standard concepts in the
final candidate codelist. searchNonStandard
determines
whether we want to search for keywords among non-standard concepts.
In the previous example, since we set
standardConcept = "Standard"
, we retrieved the code for
Osteoarthrosis from the non-standard search. However,
we did not obtain the non-standard code degenerative
arthropathy from the initial search. If we allow non-standard
concepts in the final candidate codelist, we would retireve both
codes:
codes2 <- getCandidateCodes(
cdm = cdm,
keywords = "Degenerative",
domains = "Condition",
standardConcept = c("Non-standard", "Standard"),
searchNonStandard = FALSE,
includeDescendants = FALSE,
searchInSynonyms = FALSE,
includeAncestor = FALSE
)
codes2
#> # A tibble: 1 × 6
#> concept_id found_from concept_name domain_id vocabulary_id standard_concept
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 7 From initial… Degenerativ… Condition Read <NA>