Comparative and Functional Genomics
Volume 5, Issue 8, Pages 648-654
Conference paper

Overview and Utilization of the NCI Thesaurus

1NCI Center for Bioinformatics, 6116 Executive Blvd, Ste 403, National Cancer Institute, NIH, Bethesda 20892, MD, USA
2NCI Office of Communications, 6116 Executive Blvd, National Cancer Institute, NIH, Bethesda 20892, MD, USA

Received 21 November 2004; Accepted 24 November 2004

The NCI Thesaurus is a reference terminology covering areas of basic and clinical science, built with the goal of facilitating translational research in cancer. It contains nearly 110 000 terms in approximately 36000 concepts, partitioned in 20 subdomains, which include diseases, drugs, anatomy, genes, gene products, techniques, and biological processes, among others, all with a cancer-centric focus in content, and originally designed to support coding activities across the National Cancer Institute. Each concept represents a unit of meaning and contains a number of annotations, such as synonyms and preferred name, as well as annotations such as textual definitions and optional references to external authorities. In addition, concepts are modelled with description logic (DL) and defined by their relationships to other concepts; there are currently approximately 90 types of named relations declared in the terminology. The NCI Thesaurus is produced by the Enterprise Vocabulary Services project, a collaborative effort between the NCI Center for Bioinformatics and the NCI Office of Communications, and is part of the caCORE infrastructure stack ( It can be accessed programmatically through the open caBIO API and browsed via the web ( A history of editing changes is also accessible through the API. In addition, the Thesaurus is available for download in various file formats, including OWL, the web ontology language, to facilitate its utilization by others.