Cargando…

Automatic concept recognition using the Human Phenotype Ontology reference and test suite corpora

Concept recognition tools rely on the availability of textual corpora to assess their performance and enable the identification of areas for improvement. Typically, corpora are developed for specific purposes, such as gene name recognition. Gene and protein name identification are longstanding goals...

Descripción completa

Detalles Bibliográficos
Autores principales:	Groza, Tudor, Köhler, Sebastian, Doelken, Sandra, Collier, Nigel, Oellrich, Anika, Smedley, Damian, Couto, Francisco M, Baynam, Gareth, Zankl, Andreas, Robinson, Peter N.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2015
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4343077/ https://www.ncbi.nlm.nih.gov/pubmed/25725061 http://dx.doi.org/10.1093/database/bav005

_version_	1782359358465638400
author	Groza, Tudor Köhler, Sebastian Doelken, Sandra Collier, Nigel Oellrich, Anika Smedley, Damian Couto, Francisco M Baynam, Gareth Zankl, Andreas Robinson, Peter N.
author_facet	Groza, Tudor Köhler, Sebastian Doelken, Sandra Collier, Nigel Oellrich, Anika Smedley, Damian Couto, Francisco M Baynam, Gareth Zankl, Andreas Robinson, Peter N.
author_sort	Groza, Tudor
collection	PubMed
description	Concept recognition tools rely on the availability of textual corpora to assess their performance and enable the identification of areas for improvement. Typically, corpora are developed for specific purposes, such as gene name recognition. Gene and protein name identification are longstanding goals of biomedical text mining, and therefore a number of different corpora exist. However, phenotypes only recently became an entity of interest for specialized concept recognition systems, and hardly any annotated text is available for performance testing and training. Here, we present a unique corpus, capturing text spans from 228 abstracts manually annotated with Human Phenotype Ontology (HPO) concepts and harmonized by three curators, which can be used as a reference standard for free text annotation of human phenotypes. Furthermore, we developed a test suite for standardized concept recognition error analysis, incorporating 32 different types of test cases corresponding to 2164 HPO concepts. Finally, three established phenotype concept recognizers (NCBO Annotator, OBO Annotator and Bio-LarK CR) were comprehensively evaluated, and results are reported against both the text corpus and the test suites. The gold standard and test suites corpora are available from http://bio-lark.org/hpo_res.html. Database URL: http://bio-lark.org/hpo_res.html
format	Online Article Text
id	pubmed-4343077
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-43430772015-03-17 Automatic concept recognition using the Human Phenotype Ontology reference and test suite corpora Groza, Tudor Köhler, Sebastian Doelken, Sandra Collier, Nigel Oellrich, Anika Smedley, Damian Couto, Francisco M Baynam, Gareth Zankl, Andreas Robinson, Peter N. Database (Oxford) Original Article Concept recognition tools rely on the availability of textual corpora to assess their performance and enable the identification of areas for improvement. Typically, corpora are developed for specific purposes, such as gene name recognition. Gene and protein name identification are longstanding goals of biomedical text mining, and therefore a number of different corpora exist. However, phenotypes only recently became an entity of interest for specialized concept recognition systems, and hardly any annotated text is available for performance testing and training. Here, we present a unique corpus, capturing text spans from 228 abstracts manually annotated with Human Phenotype Ontology (HPO) concepts and harmonized by three curators, which can be used as a reference standard for free text annotation of human phenotypes. Furthermore, we developed a test suite for standardized concept recognition error analysis, incorporating 32 different types of test cases corresponding to 2164 HPO concepts. Finally, three established phenotype concept recognizers (NCBO Annotator, OBO Annotator and Bio-LarK CR) were comprehensively evaluated, and results are reported against both the text corpus and the test suites. The gold standard and test suites corpora are available from http://bio-lark.org/hpo_res.html. Database URL: http://bio-lark.org/hpo_res.html Oxford University Press 2015-02-27 /pmc/articles/PMC4343077/ /pubmed/25725061 http://dx.doi.org/10.1093/database/bav005 Text en © The Author(s) 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Article Groza, Tudor Köhler, Sebastian Doelken, Sandra Collier, Nigel Oellrich, Anika Smedley, Damian Couto, Francisco M Baynam, Gareth Zankl, Andreas Robinson, Peter N. Automatic concept recognition using the Human Phenotype Ontology reference and test suite corpora
title	Automatic concept recognition using the Human Phenotype Ontology reference and test suite corpora
title_full	Automatic concept recognition using the Human Phenotype Ontology reference and test suite corpora
title_fullStr	Automatic concept recognition using the Human Phenotype Ontology reference and test suite corpora
title_full_unstemmed	Automatic concept recognition using the Human Phenotype Ontology reference and test suite corpora
title_short	Automatic concept recognition using the Human Phenotype Ontology reference and test suite corpora
title_sort	automatic concept recognition using the human phenotype ontology reference and test suite corpora
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4343077/ https://www.ncbi.nlm.nih.gov/pubmed/25725061 http://dx.doi.org/10.1093/database/bav005
work_keys_str_mv	AT grozatudor automaticconceptrecognitionusingthehumanphenotypeontologyreferenceandtestsuitecorpora AT kohlersebastian automaticconceptrecognitionusingthehumanphenotypeontologyreferenceandtestsuitecorpora AT doelkensandra automaticconceptrecognitionusingthehumanphenotypeontologyreferenceandtestsuitecorpora AT colliernigel automaticconceptrecognitionusingthehumanphenotypeontologyreferenceandtestsuitecorpora AT oellrichanika automaticconceptrecognitionusingthehumanphenotypeontologyreferenceandtestsuitecorpora AT smedleydamian automaticconceptrecognitionusingthehumanphenotypeontologyreferenceandtestsuitecorpora AT coutofranciscom automaticconceptrecognitionusingthehumanphenotypeontologyreferenceandtestsuitecorpora AT baynamgareth automaticconceptrecognitionusingthehumanphenotypeontologyreferenceandtestsuitecorpora AT zanklandreas automaticconceptrecognitionusingthehumanphenotypeontologyreferenceandtestsuitecorpora AT robinsonpetern automaticconceptrecognitionusingthehumanphenotypeontologyreferenceandtestsuitecorpora

Automatic concept recognition using the Human Phenotype Ontology reference and test suite corpora

Ejemplares similares