Cargando…

Towards Structuring Unstructured GenBank Metadata for Enhancing Comparative Biological Studies

Within large sequence repositories such as GenBank there is a wealth of metadata providing contextual information that may enhance search and retrieval of relevant sequences for a range of subsequent analyses. One challenge is the use of free-text in these metadata fields where approaches are needed...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Elizabeth S., Sarkar, Indra Neil
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Medical Informatics Association 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3248757/
https://www.ncbi.nlm.nih.gov/pubmed/22211174
_version_ 1782220272238067712
author Chen, Elizabeth S.
Sarkar, Indra Neil
author_facet Chen, Elizabeth S.
Sarkar, Indra Neil
author_sort Chen, Elizabeth S.
collection PubMed
description Within large sequence repositories such as GenBank there is a wealth of metadata providing contextual information that may enhance search and retrieval of relevant sequences for a range of subsequent analyses. One challenge is the use of free-text in these metadata fields where approaches are needed to extract, structure, and encode essential information. The goal of the present study was to explore the feasibility of using a combination of existing resources for annotating unstructured GenBank metadata, initially focusing on the “host” and “isolation_source” fields. This paper summarizes early results for 10 host organisms that include a characterization of associated isolation sources with respect to biomedical ontologies and semantic types. The findings from this preliminary study provide insights to the rich amount of information captured within these unstructured metadata, guidance for addressing the challenges and issues encountered, and highlight the potential value for enriching comparative biological studies towards improving human health.
format Online
Article
Text
id pubmed-3248757
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher American Medical Informatics Association
record_format MEDLINE/PubMed
spelling pubmed-32487572011-12-30 Towards Structuring Unstructured GenBank Metadata for Enhancing Comparative Biological Studies Chen, Elizabeth S. Sarkar, Indra Neil AMIA Jt Summits Transl Sci Proc Articles Within large sequence repositories such as GenBank there is a wealth of metadata providing contextual information that may enhance search and retrieval of relevant sequences for a range of subsequent analyses. One challenge is the use of free-text in these metadata fields where approaches are needed to extract, structure, and encode essential information. The goal of the present study was to explore the feasibility of using a combination of existing resources for annotating unstructured GenBank metadata, initially focusing on the “host” and “isolation_source” fields. This paper summarizes early results for 10 host organisms that include a characterization of associated isolation sources with respect to biomedical ontologies and semantic types. The findings from this preliminary study provide insights to the rich amount of information captured within these unstructured metadata, guidance for addressing the challenges and issues encountered, and highlight the potential value for enriching comparative biological studies towards improving human health. American Medical Informatics Association 2011-03-07 /pmc/articles/PMC3248757/ /pubmed/22211174 Text en ©2011 AMIA - All rights reserved. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose
spellingShingle Articles
Chen, Elizabeth S.
Sarkar, Indra Neil
Towards Structuring Unstructured GenBank Metadata for Enhancing Comparative Biological Studies
title Towards Structuring Unstructured GenBank Metadata for Enhancing Comparative Biological Studies
title_full Towards Structuring Unstructured GenBank Metadata for Enhancing Comparative Biological Studies
title_fullStr Towards Structuring Unstructured GenBank Metadata for Enhancing Comparative Biological Studies
title_full_unstemmed Towards Structuring Unstructured GenBank Metadata for Enhancing Comparative Biological Studies
title_short Towards Structuring Unstructured GenBank Metadata for Enhancing Comparative Biological Studies
title_sort towards structuring unstructured genbank metadata for enhancing comparative biological studies
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3248757/
https://www.ncbi.nlm.nih.gov/pubmed/22211174
work_keys_str_mv AT chenelizabeths towardsstructuringunstructuredgenbankmetadataforenhancingcomparativebiologicalstudies
AT sarkarindraneil towardsstructuringunstructuredgenbankmetadataforenhancingcomparativebiologicalstudies