NER using MetaMAP

Go to the online MetaMap service (batch mode): https://ii.nlm.nih.gov/Batch/UTS_Required/metamap.shtml

MetaMap will not run on text that contains non-ascii characters, so remove them before using MetaMap. Non-ascii characters can be identified with the regular expression “[^\x00-\x7F]”.

In Output/Display Options, check “Show CUIs (-I)”.

In I would like to only use specific Semantic Types, check the checkbox, and select the semantic types relevant to the phenotype. For example, use

acab, aapp, anab, antb, biof, bacs, bodm, chem, chvf, chvs, clnd, cgab, diap, dsyn, elii, enzy, fndg, hops, horm, imft, irda, inbe, inpo, inch, lbpr, lbtr, medd, mobd, neop, nnon, orch, patf, phsu, phpr, rcpt, sosy, topp, vita

to select the following:

Acquired Abnormality; Amino Acid, Peptide, or Protein; Anatomical Abnormality; Antibiotic; Biologic Function; Biologically Active Substance; Biomedical or Dental Material; Chemical; Chemical Viewed Functionally; Chemical Viewed Structurally; Clinical Drug; Congenital Abnormality; Diagnostic Procedure; Disease or Syndrome; Element, Ion, or Isotope; Enzyme; Finding; Hazardous or Poisonous Substance; Hormone; Immunologic Factor; Indicator, Reagent, or Diagnostic Aid; Individual Behavior; Injury or Poisoning; Inorganic Chemical; Laboratory or Test Result; Laboratory Procedure; Medical Device; Mental or Behavioral Dysfunction; Neoplastic Process; Nucleic Acid, Nucleoside, or Nucleotide; Organic Chemical; Pathologic Function; Pharmacologic Substance; Phenomenon or Process; Receptor; Sign or Symptom; Therapeutic or Preventive Procedure; Vitamin.

Click Submit Batch MetaMap. An email will notify you once MetaMap has finished processing the file. Download “text.out” and rename it, such as “Wikipedia.out”. Create a folder and put all the output files in it.


As an example, 5 source articles on CAD can be uploaded to the MetaMAP website: CAD_article_wiki.txt, CAD_article_medscape.txt, CAD_article_medline.txt, CAD_article_merck.txt, CAD_article_mayo.txt.

Run MetaMap_postprocessing.R to extract the CUIs from the output and select by majority voting.

The CAD dictionary generated by MetaMAP can be found in CAD_dict.txt.