Cargando…

Deep Question Answering for protein annotation

Biomedical professionals have access to a huge amount of literature, but when they use a search engine, they often have to deal with too many documents to efficiently find the appropriate information in a reasonable time. In this perspective, question-answering (QA) engines are designed to display a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gobeill, Julien, Gaudinat, Arnaud, Pasche, Emilie, Vishnyakova, Dina, Gaudet, Pascale, Bairoch, Amos, Ruch, Patrick
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2015
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4572360/ https://www.ncbi.nlm.nih.gov/pubmed/26384372 http://dx.doi.org/10.1093/database/bav081

_version_	1782390391416291328
author	Gobeill, Julien Gaudinat, Arnaud Pasche, Emilie Vishnyakova, Dina Gaudet, Pascale Bairoch, Amos Ruch, Patrick
author_facet	Gobeill, Julien Gaudinat, Arnaud Pasche, Emilie Vishnyakova, Dina Gaudet, Pascale Bairoch, Amos Ruch, Patrick
author_sort	Gobeill, Julien
collection	PubMed
description	Biomedical professionals have access to a huge amount of literature, but when they use a search engine, they often have to deal with too many documents to efficiently find the appropriate information in a reasonable time. In this perspective, question-answering (QA) engines are designed to display answers, which were automatically extracted from the retrieved documents. Standard QA engines in literature process a user question, then retrieve relevant documents and finally extract some possible answers out of these documents using various named-entity recognition processes. In our study, we try to answer complex genomics questions, which can be adequately answered only using Gene Ontology (GO) concepts. Such complex answers cannot be found using state-of-the-art dictionary- and redundancy-based QA engines. We compare the effectiveness of two dictionary-based classifiers for extracting correct GO answers from a large set of 100 retrieved abstracts per question. In the same way, we also investigate the power of GOCat, a GO supervised classifier. GOCat exploits the GOA database to propose GO concepts that were annotated by curators for similar abstracts. This approach is called deep QA, as it adds an original classification step, and exploits curated biological data to infer answers, which are not explicitly mentioned in the retrieved documents. We show that for complex answers such as protein functional descriptions, the redundancy phenomenon has a limited effect. Similarly usual dictionary-based approaches are relatively ineffective. In contrast, we demonstrate how existing curated data, beyond information extraction, can be exploited by a supervised classifier, such as GOCat, to massively improve both the quantity and the quality of the answers with a +100% improvement for both recall and precision. Database URL: http://eagl.unige.ch/DeepQA4PA/
format	Online Article Text
id	pubmed-4572360
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-45723602015-09-18 Deep Question Answering for protein annotation Gobeill, Julien Gaudinat, Arnaud Pasche, Emilie Vishnyakova, Dina Gaudet, Pascale Bairoch, Amos Ruch, Patrick Database (Oxford) Original Article Biomedical professionals have access to a huge amount of literature, but when they use a search engine, they often have to deal with too many documents to efficiently find the appropriate information in a reasonable time. In this perspective, question-answering (QA) engines are designed to display answers, which were automatically extracted from the retrieved documents. Standard QA engines in literature process a user question, then retrieve relevant documents and finally extract some possible answers out of these documents using various named-entity recognition processes. In our study, we try to answer complex genomics questions, which can be adequately answered only using Gene Ontology (GO) concepts. Such complex answers cannot be found using state-of-the-art dictionary- and redundancy-based QA engines. We compare the effectiveness of two dictionary-based classifiers for extracting correct GO answers from a large set of 100 retrieved abstracts per question. In the same way, we also investigate the power of GOCat, a GO supervised classifier. GOCat exploits the GOA database to propose GO concepts that were annotated by curators for similar abstracts. This approach is called deep QA, as it adds an original classification step, and exploits curated biological data to infer answers, which are not explicitly mentioned in the retrieved documents. We show that for complex answers such as protein functional descriptions, the redundancy phenomenon has a limited effect. Similarly usual dictionary-based approaches are relatively ineffective. In contrast, we demonstrate how existing curated data, beyond information extraction, can be exploited by a supervised classifier, such as GOCat, to massively improve both the quantity and the quality of the answers with a +100% improvement for both recall and precision. Database URL: http://eagl.unige.ch/DeepQA4PA/ Oxford University Press 2015-09-16 /pmc/articles/PMC4572360/ /pubmed/26384372 http://dx.doi.org/10.1093/database/bav081 Text en © The Author(s) 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Article Gobeill, Julien Gaudinat, Arnaud Pasche, Emilie Vishnyakova, Dina Gaudet, Pascale Bairoch, Amos Ruch, Patrick Deep Question Answering for protein annotation
title	Deep Question Answering for protein annotation
title_full	Deep Question Answering for protein annotation
title_fullStr	Deep Question Answering for protein annotation
title_full_unstemmed	Deep Question Answering for protein annotation
title_short	Deep Question Answering for protein annotation
title_sort	deep question answering for protein annotation
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4572360/ https://www.ncbi.nlm.nih.gov/pubmed/26384372 http://dx.doi.org/10.1093/database/bav081
work_keys_str_mv	AT gobeilljulien deepquestionansweringforproteinannotation AT gaudinatarnaud deepquestionansweringforproteinannotation AT pascheemilie deepquestionansweringforproteinannotation AT vishnyakovadina deepquestionansweringforproteinannotation AT gaudetpascale deepquestionansweringforproteinannotation AT bairochamos deepquestionansweringforproteinannotation AT ruchpatrick deepquestionansweringforproteinannotation

Deep Question Answering for protein annotation

Ejemplares similares