Cargando…

Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE

High-throughput experiments and bioinformatics techniques are creating an exploding volume of data that are becoming overwhelming to keep track of for biologists and researchers who need to access, analyze and process existing data. Much of the available data are being deposited in specialized datab...

Descripción completa

Detalles Bibliográficos
Autores principales: Névéol, Aurélie, Wilbur, W. John, Lu, Zhiyong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3371192/
https://www.ncbi.nlm.nih.gov/pubmed/22685160
http://dx.doi.org/10.1093/database/bas026
_version_ 1782235179819991040
author Névéol, Aurélie
Wilbur, W. John
Lu, Zhiyong
author_facet Névéol, Aurélie
Wilbur, W. John
Lu, Zhiyong
author_sort Névéol, Aurélie
collection PubMed
description High-throughput experiments and bioinformatics techniques are creating an exploding volume of data that are becoming overwhelming to keep track of for biologists and researchers who need to access, analyze and process existing data. Much of the available data are being deposited in specialized databases, such as the Gene Expression Omnibus (GEO) for microarrays or the Protein Data Bank (PDB) for protein structures and coordinates. Data sets are also being described by their authors in publications archived in literature databases such as MEDLINE and PubMed Central. Currently, the curation of links between biological databases and the literature mainly relies on manual labour, which makes it a time-consuming and daunting task. Herein, we analysed the current state of link curation between GEO, PDB and MEDLINE. We found that the link curation is heterogeneous depending on the sources and databases involved, and that overlap between sources is low, <50% for PDB and GEO. Furthermore, we showed that text-mining tools can automatically provide valuable evidence to help curators broaden the scope of articles and database entries that they review. As a result, we made recommendations to improve the coverage of curated links, as well as the consistency of information available from different databases while maintaining high-quality curation. Database URLs: http://www.ncbi.nlm.nih.gov/PubMed, http://www.ncbi.nlm.nih.gov/geo/, http://www.rcsb.org/pdb/
format Online
Article
Text
id pubmed-3371192
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-33711922012-06-18 Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE Névéol, Aurélie Wilbur, W. John Lu, Zhiyong Database (Oxford) Original Articles High-throughput experiments and bioinformatics techniques are creating an exploding volume of data that are becoming overwhelming to keep track of for biologists and researchers who need to access, analyze and process existing data. Much of the available data are being deposited in specialized databases, such as the Gene Expression Omnibus (GEO) for microarrays or the Protein Data Bank (PDB) for protein structures and coordinates. Data sets are also being described by their authors in publications archived in literature databases such as MEDLINE and PubMed Central. Currently, the curation of links between biological databases and the literature mainly relies on manual labour, which makes it a time-consuming and daunting task. Herein, we analysed the current state of link curation between GEO, PDB and MEDLINE. We found that the link curation is heterogeneous depending on the sources and databases involved, and that overlap between sources is low, <50% for PDB and GEO. Furthermore, we showed that text-mining tools can automatically provide valuable evidence to help curators broaden the scope of articles and database entries that they review. As a result, we made recommendations to improve the coverage of curated links, as well as the consistency of information available from different databases while maintaining high-quality curation. Database URLs: http://www.ncbi.nlm.nih.gov/PubMed, http://www.ncbi.nlm.nih.gov/geo/, http://www.rcsb.org/pdb/ Oxford University Press 2012-06-08 /pmc/articles/PMC3371192/ /pubmed/22685160 http://dx.doi.org/10.1093/database/bas026 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Articles
Névéol, Aurélie
Wilbur, W. John
Lu, Zhiyong
Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE
title Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE
title_full Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE
title_fullStr Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE
title_full_unstemmed Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE
title_short Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE
title_sort improving links between literature and biological data with text mining: a case study with geo, pdb and medline
topic Original Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3371192/
https://www.ncbi.nlm.nih.gov/pubmed/22685160
http://dx.doi.org/10.1093/database/bas026
work_keys_str_mv AT neveolaurelie improvinglinksbetweenliteratureandbiologicaldatawithtextminingacasestudywithgeopdbandmedline
AT wilburwjohn improvinglinksbetweenliteratureandbiologicaldatawithtextminingacasestudywithgeopdbandmedline
AT luzhiyong improvinglinksbetweenliteratureandbiologicaldatawithtextminingacasestudywithgeopdbandmedline