Cargando…

Ontology-driven indexing of public datasets for translational bioinformatics

The volume of publicly available genomic scale data is increasing. Genomic datasets in public repositories are annotated with free-text fields describing the pathological state of the studied sample. These annotations are not mapped to concepts in any ontology, making it difficult to integrate these...

Descripción completa

Detalles Bibliográficos
Autores principales: Shah, Nigam H, Jonquet, Clement, Chiang, Annie P, Butte, Atul J, Chen, Rong, Musen, Mark A
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2646250/
https://www.ncbi.nlm.nih.gov/pubmed/19208184
http://dx.doi.org/10.1186/1471-2105-10-S2-S1
_version_ 1782164832551698432
author Shah, Nigam H
Jonquet, Clement
Chiang, Annie P
Butte, Atul J
Chen, Rong
Musen, Mark A
author_facet Shah, Nigam H
Jonquet, Clement
Chiang, Annie P
Butte, Atul J
Chen, Rong
Musen, Mark A
author_sort Shah, Nigam H
collection PubMed
description The volume of publicly available genomic scale data is increasing. Genomic datasets in public repositories are annotated with free-text fields describing the pathological state of the studied sample. These annotations are not mapped to concepts in any ontology, making it difficult to integrate these datasets across repositories. We have previously developed methods to map text-annotations of tissue microarrays to concepts in the NCI thesaurus and SNOMED-CT. In this work we generalize our methods to map text annotations of gene expression datasets to concepts in the UMLS. We demonstrate the utility of our methods by processing annotations of datasets in the Gene Expression Omnibus. We demonstrate that we enable ontology-based querying and integration of tissue and gene expression microarray data. We enable identification of datasets on specific diseases across both repositories. Our approach provides the basis for ontology-driven data integration for translational research on gene and protein expression data. Based on this work we have built a prototype system for ontology based annotation and indexing of biomedical data. The system processes the text metadata of diverse resource elements such as gene expression data sets, descriptions of radiology images, clinical-trial reports, and PubMed article abstracts to annotate and index them with concepts from appropriate ontologies. The key functionality of this system is to enable users to locate biomedical data resources related to particular ontology concepts.
format Text
id pubmed-2646250
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26462502009-02-23 Ontology-driven indexing of public datasets for translational bioinformatics Shah, Nigam H Jonquet, Clement Chiang, Annie P Butte, Atul J Chen, Rong Musen, Mark A BMC Bioinformatics Proceedings The volume of publicly available genomic scale data is increasing. Genomic datasets in public repositories are annotated with free-text fields describing the pathological state of the studied sample. These annotations are not mapped to concepts in any ontology, making it difficult to integrate these datasets across repositories. We have previously developed methods to map text-annotations of tissue microarrays to concepts in the NCI thesaurus and SNOMED-CT. In this work we generalize our methods to map text annotations of gene expression datasets to concepts in the UMLS. We demonstrate the utility of our methods by processing annotations of datasets in the Gene Expression Omnibus. We demonstrate that we enable ontology-based querying and integration of tissue and gene expression microarray data. We enable identification of datasets on specific diseases across both repositories. Our approach provides the basis for ontology-driven data integration for translational research on gene and protein expression data. Based on this work we have built a prototype system for ontology based annotation and indexing of biomedical data. The system processes the text metadata of diverse resource elements such as gene expression data sets, descriptions of radiology images, clinical-trial reports, and PubMed article abstracts to annotate and index them with concepts from appropriate ontologies. The key functionality of this system is to enable users to locate biomedical data resources related to particular ontology concepts. BioMed Central 2009-02-05 /pmc/articles/PMC2646250/ /pubmed/19208184 http://dx.doi.org/10.1186/1471-2105-10-S2-S1 Text en Copyright © 2009 Shah et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Shah, Nigam H
Jonquet, Clement
Chiang, Annie P
Butte, Atul J
Chen, Rong
Musen, Mark A
Ontology-driven indexing of public datasets for translational bioinformatics
title Ontology-driven indexing of public datasets for translational bioinformatics
title_full Ontology-driven indexing of public datasets for translational bioinformatics
title_fullStr Ontology-driven indexing of public datasets for translational bioinformatics
title_full_unstemmed Ontology-driven indexing of public datasets for translational bioinformatics
title_short Ontology-driven indexing of public datasets for translational bioinformatics
title_sort ontology-driven indexing of public datasets for translational bioinformatics
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2646250/
https://www.ncbi.nlm.nih.gov/pubmed/19208184
http://dx.doi.org/10.1186/1471-2105-10-S2-S1
work_keys_str_mv AT shahnigamh ontologydrivenindexingofpublicdatasetsfortranslationalbioinformatics
AT jonquetclement ontologydrivenindexingofpublicdatasetsfortranslationalbioinformatics
AT chianganniep ontologydrivenindexingofpublicdatasetsfortranslationalbioinformatics
AT butteatulj ontologydrivenindexingofpublicdatasetsfortranslationalbioinformatics
AT chenrong ontologydrivenindexingofpublicdatasetsfortranslationalbioinformatics
AT musenmarka ontologydrivenindexingofpublicdatasetsfortranslationalbioinformatics