Cargando…
Towards Structuring Unstructured GenBank Metadata for Enhancing Comparative Biological Studies
Within large sequence repositories such as GenBank there is a wealth of metadata providing contextual information that may enhance search and retrieval of relevant sequences for a range of subsequent analyses. One challenge is the use of free-text in these metadata fields where approaches are needed...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Medical Informatics Association
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3248757/ https://www.ncbi.nlm.nih.gov/pubmed/22211174 |
_version_ | 1782220272238067712 |
---|---|
author | Chen, Elizabeth S. Sarkar, Indra Neil |
author_facet | Chen, Elizabeth S. Sarkar, Indra Neil |
author_sort | Chen, Elizabeth S. |
collection | PubMed |
description | Within large sequence repositories such as GenBank there is a wealth of metadata providing contextual information that may enhance search and retrieval of relevant sequences for a range of subsequent analyses. One challenge is the use of free-text in these metadata fields where approaches are needed to extract, structure, and encode essential information. The goal of the present study was to explore the feasibility of using a combination of existing resources for annotating unstructured GenBank metadata, initially focusing on the “host” and “isolation_source” fields. This paper summarizes early results for 10 host organisms that include a characterization of associated isolation sources with respect to biomedical ontologies and semantic types. The findings from this preliminary study provide insights to the rich amount of information captured within these unstructured metadata, guidance for addressing the challenges and issues encountered, and highlight the potential value for enriching comparative biological studies towards improving human health. |
format | Online Article Text |
id | pubmed-3248757 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | American Medical Informatics Association |
record_format | MEDLINE/PubMed |
spelling | pubmed-32487572011-12-30 Towards Structuring Unstructured GenBank Metadata for Enhancing Comparative Biological Studies Chen, Elizabeth S. Sarkar, Indra Neil AMIA Jt Summits Transl Sci Proc Articles Within large sequence repositories such as GenBank there is a wealth of metadata providing contextual information that may enhance search and retrieval of relevant sequences for a range of subsequent analyses. One challenge is the use of free-text in these metadata fields where approaches are needed to extract, structure, and encode essential information. The goal of the present study was to explore the feasibility of using a combination of existing resources for annotating unstructured GenBank metadata, initially focusing on the “host” and “isolation_source” fields. This paper summarizes early results for 10 host organisms that include a characterization of associated isolation sources with respect to biomedical ontologies and semantic types. The findings from this preliminary study provide insights to the rich amount of information captured within these unstructured metadata, guidance for addressing the challenges and issues encountered, and highlight the potential value for enriching comparative biological studies towards improving human health. American Medical Informatics Association 2011-03-07 /pmc/articles/PMC3248757/ /pubmed/22211174 Text en ©2011 AMIA - All rights reserved. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose |
spellingShingle | Articles Chen, Elizabeth S. Sarkar, Indra Neil Towards Structuring Unstructured GenBank Metadata for Enhancing Comparative Biological Studies |
title | Towards Structuring Unstructured GenBank Metadata for Enhancing Comparative Biological Studies |
title_full | Towards Structuring Unstructured GenBank Metadata for Enhancing Comparative Biological Studies |
title_fullStr | Towards Structuring Unstructured GenBank Metadata for Enhancing Comparative Biological Studies |
title_full_unstemmed | Towards Structuring Unstructured GenBank Metadata for Enhancing Comparative Biological Studies |
title_short | Towards Structuring Unstructured GenBank Metadata for Enhancing Comparative Biological Studies |
title_sort | towards structuring unstructured genbank metadata for enhancing comparative biological studies |
topic | Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3248757/ https://www.ncbi.nlm.nih.gov/pubmed/22211174 |
work_keys_str_mv | AT chenelizabeths towardsstructuringunstructuredgenbankmetadataforenhancingcomparativebiologicalstudies AT sarkarindraneil towardsstructuringunstructuredgenbankmetadataforenhancingcomparativebiologicalstudies |