SemRep Gold Standard Annotation

Skip Navigation  |   Home    NLM » LHNCBC » SKR » SemRep Gold

In early 2011, we conducted a gold standard annotation study in which we annotated 500 sentences randomly selected from MEDLINE abstracts with semantic predications. The results are mainly intended to serve as an evaluation testbed for SemRep. They can also be used by other information extraction systems based on UMLS domain knowledge. The study consisted of three phases: a) practice phase b) main annotation phase and c) adjudication phase.

Here, we present two sets of annotations from the main phase as well as the adjudicated gold standard. For further details, refer to our BMC Bioinformatics paper "
Constructing A Semantic Predication Gold Standard from the Biomedical Literature" or contact

Please Note: Users are responsible for compliance with the UMLS Metathesaurus License Agreement.

To access the SemRep Gold Standard Annotation files, you must have accepted the terms of the UMLS Metathesaurus License Agreement, which requires you to respect the copyrights of the constituent vocabularies and to file a brief annual report on your use of the UMLS. You also must have activated a UMLS Terminology Services (UTS) account. For information on how we use UTS authentication please select the Info icon to the right: Information Mark Symbol: Help about UTS accounts

For details of the licenses see the UMLS Metathesaurus License Agreement and How to License and Access the Unified Medical Language System® (UMLS®) Data.

Available Files:

Annotator A: Main Phase XML fileAnnotator A: Main Phase (main_A.xml) (1.3 mb)

Annotator B: Main Phase XML fileAnnotator B: Main Phase (main_B.xml) (1.4 mb)

Annotator C: Adjudication XML fileAnnotator C: Adjudication (adjudicated.xml) (1.4 mb)

DTD fileDTD file (annotations.dtd) (1.8 kb)

Last Modified: April 29, 2012 ii-public2
     Contact Us    |   Contact Us (SemRep)    |   Copyright    |   Privacy    |   Accessibility    |   Freedom of Information Act    |    Get Acrobat Reader button
Links to Our Sites:
Indexing Initiative (II)
Investigating computer-assisted and fully automatic methodologies for indexing biomedical text. Includes the NLM Medical Text Indexer (MTI).
Semantic Knowledge Representation (SKR)
Develop programs to provide usable semantic representation of biomedical text. Includes the SemRep program.
Program to map biomedical text to the UMLS Metathesaurus. Information and downloadable material for the MetaMap program.
Word Sense Disambiguation (WSD)
Test collection of manually curated MetaMap ambiguity resolution in support of word sense disambiguation research.
MEDLINE Baseline Repository (MBR)
Static MEDLINE® Baselines for use in research involving biomedical citations. Allows for query searches and test collection creation.
Structured Abstracts (SA)
Information about NLM's research on Structured Abstracts in the MEDLINE® Baselines.
Lister Hill Center Homepage Link - Image of Lister Hill Center Lister Hill National Center for Biomedical Communications   NLM Homepage Link - NLM Logo U.S. National Library of Medicine   NIH Homepage Link - NIH Logo National Institutes of Health
DHHS Homepage Link - DHHS Logo Department of Health and Human Services