Cargando…

Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases

BACKGROUND: The vast amount of data published in the primary biomedical literature represents a challenge for the automated extraction and codification of individual data elements. Biological databases that rely solely on manual extraction by expert curators are unable to comprehensively annotate th...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chatr-aryamontri, Andrew, Winter, Andrew, Perfetto, Livia, Briganti, Leonardo, Licata, Luana, Iannuccelli, Marta, Castagnoli, Luisa, Cesareni, Gianni, Tyers, Mike
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3269943/ https://www.ncbi.nlm.nih.gov/pubmed/22151178 http://dx.doi.org/10.1186/1471-2105-12-S8-S8

_version_	1782222524461875200
author	Chatr-aryamontri, Andrew Winter, Andrew Perfetto, Livia Briganti, Leonardo Licata, Luana Iannuccelli, Marta Castagnoli, Luisa Cesareni, Gianni Tyers, Mike
author_facet	Chatr-aryamontri, Andrew Winter, Andrew Perfetto, Livia Briganti, Leonardo Licata, Luana Iannuccelli, Marta Castagnoli, Luisa Cesareni, Gianni Tyers, Mike
author_sort	Chatr-aryamontri, Andrew
collection	PubMed
description	BACKGROUND: The vast amount of data published in the primary biomedical literature represents a challenge for the automated extraction and codification of individual data elements. Biological databases that rely solely on manual extraction by expert curators are unable to comprehensively annotate the information dispersed across the entire biomedical literature. The development of efficient tools based on natural language processing (NLP) systems is essential for the selection of relevant publications, identification of data attributes and partially automated annotation. One of the tasks of the Biocreative 2010 Challenge III was devoted to the evaluation of NLP systems developed to identify articles for curation and extraction of protein-protein interaction (PPI) data. RESULTS: The Biocreative 2010 competition addressed three tasks: gene normalization, article classification and interaction method identification. The BioGRID and MINT protein interaction databases both participated in the generation of the test publication set for gene normalization, annotated the development and test sets for article classification, and curated the test set for interaction method classification. These test datasets served as a gold standard for the evaluation of data extraction algorithms. CONCLUSION: The development of efficient tools for extraction of PPI data is a necessary step to achieve full curation of the biomedical literature. NLP systems can in the first instance facilitate expert curation by refining the list of candidate publications that contain PPI data; more ambitiously, NLP approaches may be able to directly extract relevant information from full-text articles for rapid inspection by expert curators. Close collaboration between biological databases and NLP systems developers will continue to facilitate the long-term objectives of both disciplines.
format	Online Article Text
id	pubmed-3269943
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-32699432012-02-02 Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases Chatr-aryamontri, Andrew Winter, Andrew Perfetto, Livia Briganti, Leonardo Licata, Luana Iannuccelli, Marta Castagnoli, Luisa Cesareni, Gianni Tyers, Mike BMC Bioinformatics Research BACKGROUND: The vast amount of data published in the primary biomedical literature represents a challenge for the automated extraction and codification of individual data elements. Biological databases that rely solely on manual extraction by expert curators are unable to comprehensively annotate the information dispersed across the entire biomedical literature. The development of efficient tools based on natural language processing (NLP) systems is essential for the selection of relevant publications, identification of data attributes and partially automated annotation. One of the tasks of the Biocreative 2010 Challenge III was devoted to the evaluation of NLP systems developed to identify articles for curation and extraction of protein-protein interaction (PPI) data. RESULTS: The Biocreative 2010 competition addressed three tasks: gene normalization, article classification and interaction method identification. The BioGRID and MINT protein interaction databases both participated in the generation of the test publication set for gene normalization, annotated the development and test sets for article classification, and curated the test set for interaction method classification. These test datasets served as a gold standard for the evaluation of data extraction algorithms. CONCLUSION: The development of efficient tools for extraction of PPI data is a necessary step to achieve full curation of the biomedical literature. NLP systems can in the first instance facilitate expert curation by refining the list of candidate publications that contain PPI data; more ambitiously, NLP approaches may be able to directly extract relevant information from full-text articles for rapid inspection by expert curators. Close collaboration between biological databases and NLP systems developers will continue to facilitate the long-term objectives of both disciplines. BioMed Central 2011-10-03 /pmc/articles/PMC3269943/ /pubmed/22151178 http://dx.doi.org/10.1186/1471-2105-12-S8-S8 Text en Copyright ©2011 Chatr-aryamontri et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Chatr-aryamontri, Andrew Winter, Andrew Perfetto, Livia Briganti, Leonardo Licata, Luana Iannuccelli, Marta Castagnoli, Luisa Cesareni, Gianni Tyers, Mike Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases
title	Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases
title_full	Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases
title_fullStr	Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases
title_full_unstemmed	Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases
title_short	Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases
title_sort	benchmarking of the 2010 biocreative challenge iii text-mining competition by the biogrid and mint interaction databases
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3269943/ https://www.ncbi.nlm.nih.gov/pubmed/22151178 http://dx.doi.org/10.1186/1471-2105-12-S8-S8
work_keys_str_mv	AT chatraryamontriandrew benchmarkingofthe2010biocreativechallengeiiitextminingcompetitionbythebiogridandmintinteractiondatabases AT winterandrew benchmarkingofthe2010biocreativechallengeiiitextminingcompetitionbythebiogridandmintinteractiondatabases AT perfettolivia benchmarkingofthe2010biocreativechallengeiiitextminingcompetitionbythebiogridandmintinteractiondatabases AT brigantileonardo benchmarkingofthe2010biocreativechallengeiiitextminingcompetitionbythebiogridandmintinteractiondatabases AT licataluana benchmarkingofthe2010biocreativechallengeiiitextminingcompetitionbythebiogridandmintinteractiondatabases AT iannuccellimarta benchmarkingofthe2010biocreativechallengeiiitextminingcompetitionbythebiogridandmintinteractiondatabases AT castagnoliluisa benchmarkingofthe2010biocreativechallengeiiitextminingcompetitionbythebiogridandmintinteractiondatabases AT cesarenigianni benchmarkingofthe2010biocreativechallengeiiitextminingcompetitionbythebiogridandmintinteractiondatabases AT tyersmike benchmarkingofthe2010biocreativechallengeiiitextminingcompetitionbythebiogridandmintinteractiondatabases

Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases

Ejemplares similares