Cargando…

Literature mining of genetic variants for curation: quantifying the importance of supplementary material

A major focus of modern biological research is the understanding of how genomic variation relates to disease. Although there are significant ongoing efforts to capture this understanding in curated resources, much of the information remains locked in unstructured sources, in particular, the scientif...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jimeno Yepes, Antonio, Verspoor, Karin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2014
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3920087/ https://www.ncbi.nlm.nih.gov/pubmed/24520105 http://dx.doi.org/10.1093/database/bau003

_version_	1782303131901624320
author	Jimeno Yepes, Antonio Verspoor, Karin
author_facet	Jimeno Yepes, Antonio Verspoor, Karin
author_sort	Jimeno Yepes, Antonio
collection	PubMed
description	A major focus of modern biological research is the understanding of how genomic variation relates to disease. Although there are significant ongoing efforts to capture this understanding in curated resources, much of the information remains locked in unstructured sources, in particular, the scientific literature. Thus, there have been several text mining systems developed to target extraction of mutations and other genetic variation from the literature. We have performed the first study of the use of text mining for the recovery of genetic variants curated directly from the literature. We consider two curated databases, COSMIC (Catalogue Of Somatic Mutations In Cancer) and InSiGHT (International Society for Gastro-intestinal Hereditary Tumours), that contain explicit links to the source literature for each included mutation. Our analysis shows that the recall of the mutations catalogued in the databases using a text mining tool is very low, despite the well-established good performance of the tool and even when the full text of the associated article is available for processing. We demonstrate that this discrepancy can be explained by considering the supplementary material linked to the published articles, not previously considered by text mining tools. Although it is anecdotally known that supplementary material contains ‘all of the information’, and some researchers have speculated about the role of supplementary material (Schenck et al. Extraction of genetic mutations associated with cancer from public literature. J Health Med Inform 2012;S2:2.), our analysis substantiates the significant extent to which this material is critical. Our results highlight the need for literature mining tools to consider not only the narrative content of a publication but also the full set of material related to a publication.
format	Online Article Text
id	pubmed-3920087
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-39200872014-02-11 Literature mining of genetic variants for curation: quantifying the importance of supplementary material Jimeno Yepes, Antonio Verspoor, Karin Database (Oxford) Original Article A major focus of modern biological research is the understanding of how genomic variation relates to disease. Although there are significant ongoing efforts to capture this understanding in curated resources, much of the information remains locked in unstructured sources, in particular, the scientific literature. Thus, there have been several text mining systems developed to target extraction of mutations and other genetic variation from the literature. We have performed the first study of the use of text mining for the recovery of genetic variants curated directly from the literature. We consider two curated databases, COSMIC (Catalogue Of Somatic Mutations In Cancer) and InSiGHT (International Society for Gastro-intestinal Hereditary Tumours), that contain explicit links to the source literature for each included mutation. Our analysis shows that the recall of the mutations catalogued in the databases using a text mining tool is very low, despite the well-established good performance of the tool and even when the full text of the associated article is available for processing. We demonstrate that this discrepancy can be explained by considering the supplementary material linked to the published articles, not previously considered by text mining tools. Although it is anecdotally known that supplementary material contains ‘all of the information’, and some researchers have speculated about the role of supplementary material (Schenck et al. Extraction of genetic mutations associated with cancer from public literature. J Health Med Inform 2012;S2:2.), our analysis substantiates the significant extent to which this material is critical. Our results highlight the need for literature mining tools to consider not only the narrative content of a publication but also the full set of material related to a publication. Oxford University Press 2014-02-10 /pmc/articles/PMC3920087/ /pubmed/24520105 http://dx.doi.org/10.1093/database/bau003 Text en © The Author(s) 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Article Jimeno Yepes, Antonio Verspoor, Karin Literature mining of genetic variants for curation: quantifying the importance of supplementary material
title	Literature mining of genetic variants for curation: quantifying the importance of supplementary material
title_full	Literature mining of genetic variants for curation: quantifying the importance of supplementary material
title_fullStr	Literature mining of genetic variants for curation: quantifying the importance of supplementary material
title_full_unstemmed	Literature mining of genetic variants for curation: quantifying the importance of supplementary material
title_short	Literature mining of genetic variants for curation: quantifying the importance of supplementary material
title_sort	literature mining of genetic variants for curation: quantifying the importance of supplementary material
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3920087/ https://www.ncbi.nlm.nih.gov/pubmed/24520105 http://dx.doi.org/10.1093/database/bau003
work_keys_str_mv	AT jimenoyepesantonio literatureminingofgeneticvariantsforcurationquantifyingtheimportanceofsupplementarymaterial AT verspoorkarin literatureminingofgeneticvariantsforcurationquantifyingtheimportanceofsupplementarymaterial

Literature mining of genetic variants for curation: quantifying the importance of supplementary material

Ejemplares similares