Cargando…

Mutation extraction tools can be combined for robust recognition of genetic variants in the literature

As the cost of genomic sequencing continues to fall, the amount of data being collected and studied for the purpose of understanding the genetic basis of disease is increasing dramatically. Much of the source information relevant to such efforts is available only from unstructured sources such as th...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jimeno Yepes, Antonio, Verspoor, Karin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	F1000Research 2014
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4176422/ https://www.ncbi.nlm.nih.gov/pubmed/25285203 http://dx.doi.org/10.12688/f1000research.3-18.v2

_version_	1782336630989783040
author	Jimeno Yepes, Antonio Verspoor, Karin
author_facet	Jimeno Yepes, Antonio Verspoor, Karin
author_sort	Jimeno Yepes, Antonio
collection	PubMed
description	As the cost of genomic sequencing continues to fall, the amount of data being collected and studied for the purpose of understanding the genetic basis of disease is increasing dramatically. Much of the source information relevant to such efforts is available only from unstructured sources such as the scientific literature, and significant resources are expended in manually curating and structuring the information in the literature. As such, there have been a number of systems developed to target automatic extraction of mutations and other genetic variation from the literature using text mining tools. We have performed a broad survey of the existing publicly available tools for extraction of genetic variants from the scientific literature. We consider not just one tool but a number of different tools, individually and in combination, and apply the tools in two scenarios. First, they are compared in an intrinsic evaluation context, where the tools are tested for their ability to identify specific mentions of genetic variants in a corpus of manually annotated papers, the Variome corpus. Second, they are compared in an extrinsic evaluation context based on our previous study of text mining support for curation of the COSMIC and InSiGHT databases. Our results demonstrate that no single tool covers the full range of genetic variants mentioned in the literature. Rather, several tools have complementary coverage and can be used together effectively. In the intrinsic evaluation on the Variome corpus, the combined performance is above 0.95 in F-measure, while in the extrinsic evaluation the combined recall performance is above 0.71 for COSMIC and above 0.62 for InSiGHT, a substantial improvement over the performance of any individual tool. Based on the analysis of these results, we suggest several directions for the improvement of text mining tools for genetic variant extraction from the literature.
format	Online Article Text
id	pubmed-4176422
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	F1000Research
record_format	MEDLINE/PubMed
spelling	pubmed-41764222014-10-02 Mutation extraction tools can be combined for robust recognition of genetic variants in the literature Jimeno Yepes, Antonio Verspoor, Karin F1000Res Research Article As the cost of genomic sequencing continues to fall, the amount of data being collected and studied for the purpose of understanding the genetic basis of disease is increasing dramatically. Much of the source information relevant to such efforts is available only from unstructured sources such as the scientific literature, and significant resources are expended in manually curating and structuring the information in the literature. As such, there have been a number of systems developed to target automatic extraction of mutations and other genetic variation from the literature using text mining tools. We have performed a broad survey of the existing publicly available tools for extraction of genetic variants from the scientific literature. We consider not just one tool but a number of different tools, individually and in combination, and apply the tools in two scenarios. First, they are compared in an intrinsic evaluation context, where the tools are tested for their ability to identify specific mentions of genetic variants in a corpus of manually annotated papers, the Variome corpus. Second, they are compared in an extrinsic evaluation context based on our previous study of text mining support for curation of the COSMIC and InSiGHT databases. Our results demonstrate that no single tool covers the full range of genetic variants mentioned in the literature. Rather, several tools have complementary coverage and can be used together effectively. In the intrinsic evaluation on the Variome corpus, the combined performance is above 0.95 in F-measure, while in the extrinsic evaluation the combined recall performance is above 0.71 for COSMIC and above 0.62 for InSiGHT, a substantial improvement over the performance of any individual tool. Based on the analysis of these results, we suggest several directions for the improvement of text mining tools for genetic variant extraction from the literature. F1000Research 2014-06-10 /pmc/articles/PMC4176422/ /pubmed/25285203 http://dx.doi.org/10.12688/f1000research.3-18.v2 Text en Copyright: © 2014 Jimeno Yepes A and Verspoor K http://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. http://creativecommons.org/publicdomain/zero/1.0/ Data associated with the article are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).
spellingShingle	Research Article Jimeno Yepes, Antonio Verspoor, Karin Mutation extraction tools can be combined for robust recognition of genetic variants in the literature
title	Mutation extraction tools can be combined for robust recognition of genetic variants in the literature
title_full	Mutation extraction tools can be combined for robust recognition of genetic variants in the literature
title_fullStr	Mutation extraction tools can be combined for robust recognition of genetic variants in the literature
title_full_unstemmed	Mutation extraction tools can be combined for robust recognition of genetic variants in the literature
title_short	Mutation extraction tools can be combined for robust recognition of genetic variants in the literature
title_sort	mutation extraction tools can be combined for robust recognition of genetic variants in the literature
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4176422/ https://www.ncbi.nlm.nih.gov/pubmed/25285203 http://dx.doi.org/10.12688/f1000research.3-18.v2
work_keys_str_mv	AT jimenoyepesantonio mutationextractiontoolscanbecombinedforrobustrecognitionofgeneticvariantsintheliterature AT verspoorkarin mutationextractiontoolscanbecombinedforrobustrecognitionofgeneticvariantsintheliterature

Mutation extraction tools can be combined for robust recognition of genetic variants in the literature

Ejemplares similares