Cargando…

Triage by ranking to support the curation of protein interactions

Today, molecular biology databases are the cornerstone of knowledge sharing for life and health sciences. The curation and maintenance of these resources are labour intensive. Although text mining is gaining impetus among curators, its integration in curation workflow has not yet been widely adopted...

Descripción completa

Detalles Bibliográficos
Autores principales: Mottin, Luc, Pasche, Emilie, Gobeill, Julien, Rech de Laval, Valentine, Gleizes, Anne, Michel, Pierre-André, Bairoch, Amos, Gaudet, Pascale, Ruch, Patrick
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5502361/
https://www.ncbi.nlm.nih.gov/pubmed/29220432
http://dx.doi.org/10.1093/database/bax040
_version_ 1783248941508198400
author Mottin, Luc
Pasche, Emilie
Gobeill, Julien
Rech de Laval, Valentine
Gleizes, Anne
Michel, Pierre-André
Bairoch, Amos
Gaudet, Pascale
Ruch, Patrick
author_facet Mottin, Luc
Pasche, Emilie
Gobeill, Julien
Rech de Laval, Valentine
Gleizes, Anne
Michel, Pierre-André
Bairoch, Amos
Gaudet, Pascale
Ruch, Patrick
author_sort Mottin, Luc
collection PubMed
description Today, molecular biology databases are the cornerstone of knowledge sharing for life and health sciences. The curation and maintenance of these resources are labour intensive. Although text mining is gaining impetus among curators, its integration in curation workflow has not yet been widely adopted. The Swiss Institute of Bioinformatics Text Mining and CALIPHO groups joined forces to design a new curation support system named nextA5. In this report, we explore the integration of novel triage services to support the curation of two types of biological data: protein–protein interactions (PPIs) and post-translational modifications (PTMs). The recognition of PPIs and PTMs poses a special challenge, as it not only requires the identification of biological entities (proteins or residues), but also that of particular relationships (e.g. binding or position). These relationships cannot be described with onto-terminological descriptors such as the Gene Ontology for molecular functions, which makes the triage task more challenging. Prioritizing papers for these tasks thus requires the development of different approaches. In this report, we propose a new method to prioritize articles containing information specific to PPIs and PTMs. The new resources (RESTful APIs, semantically annotated MEDLINE library) enrich the neXtA5 platform. We tuned the article prioritization model on a set of 100 proteins previously annotated by the CALIPHO group. The effectiveness of the triage service was tested with a dataset of 200 annotated proteins. We defined two sets of descriptors to support automatic triage: the first set to enrich for papers with PPI data, and the second for PTMs. All occurrences of these descriptors were marked-up in MEDLINE and indexed, thus constituting a semantically annotated version of MEDLINE. These annotations were then used to estimate the relevance of a particular article with respect to the chosen annotation type. This relevance score was combined with a local vector-space search engine to generate a ranked list of PMIDs. We also evaluated a query refinement strategy, which adds specific keywords (such as ‘binds’ or ‘interacts’) to the original query. Compared to PubMed, the search effectiveness of the nextA5 triage service is improved by 190% for the prioritization of papers with PPIs information and by 260% for papers with PTMs information. Combining advanced retrieval and query refinement strategies with automatically enriched MEDLINE contents is effective to improve triage in complex curation tasks such as the curation of protein PPIs and PTMs. Database URL: http://candy.hesge.ch/nextA5
format Online
Article
Text
id pubmed-5502361
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-55023612017-07-20 Triage by ranking to support the curation of protein interactions Mottin, Luc Pasche, Emilie Gobeill, Julien Rech de Laval, Valentine Gleizes, Anne Michel, Pierre-André Bairoch, Amos Gaudet, Pascale Ruch, Patrick Database (Oxford) Original Article Today, molecular biology databases are the cornerstone of knowledge sharing for life and health sciences. The curation and maintenance of these resources are labour intensive. Although text mining is gaining impetus among curators, its integration in curation workflow has not yet been widely adopted. The Swiss Institute of Bioinformatics Text Mining and CALIPHO groups joined forces to design a new curation support system named nextA5. In this report, we explore the integration of novel triage services to support the curation of two types of biological data: protein–protein interactions (PPIs) and post-translational modifications (PTMs). The recognition of PPIs and PTMs poses a special challenge, as it not only requires the identification of biological entities (proteins or residues), but also that of particular relationships (e.g. binding or position). These relationships cannot be described with onto-terminological descriptors such as the Gene Ontology for molecular functions, which makes the triage task more challenging. Prioritizing papers for these tasks thus requires the development of different approaches. In this report, we propose a new method to prioritize articles containing information specific to PPIs and PTMs. The new resources (RESTful APIs, semantically annotated MEDLINE library) enrich the neXtA5 platform. We tuned the article prioritization model on a set of 100 proteins previously annotated by the CALIPHO group. The effectiveness of the triage service was tested with a dataset of 200 annotated proteins. We defined two sets of descriptors to support automatic triage: the first set to enrich for papers with PPI data, and the second for PTMs. All occurrences of these descriptors were marked-up in MEDLINE and indexed, thus constituting a semantically annotated version of MEDLINE. These annotations were then used to estimate the relevance of a particular article with respect to the chosen annotation type. This relevance score was combined with a local vector-space search engine to generate a ranked list of PMIDs. We also evaluated a query refinement strategy, which adds specific keywords (such as ‘binds’ or ‘interacts’) to the original query. Compared to PubMed, the search effectiveness of the nextA5 triage service is improved by 190% for the prioritization of papers with PPIs information and by 260% for papers with PTMs information. Combining advanced retrieval and query refinement strategies with automatically enriched MEDLINE contents is effective to improve triage in complex curation tasks such as the curation of protein PPIs and PTMs. Database URL: http://candy.hesge.ch/nextA5 Oxford University Press 2017-06-11 /pmc/articles/PMC5502361/ /pubmed/29220432 http://dx.doi.org/10.1093/database/bax040 Text en © The Author(s) 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Mottin, Luc
Pasche, Emilie
Gobeill, Julien
Rech de Laval, Valentine
Gleizes, Anne
Michel, Pierre-André
Bairoch, Amos
Gaudet, Pascale
Ruch, Patrick
Triage by ranking to support the curation of protein interactions
title Triage by ranking to support the curation of protein interactions
title_full Triage by ranking to support the curation of protein interactions
title_fullStr Triage by ranking to support the curation of protein interactions
title_full_unstemmed Triage by ranking to support the curation of protein interactions
title_short Triage by ranking to support the curation of protein interactions
title_sort triage by ranking to support the curation of protein interactions
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5502361/
https://www.ncbi.nlm.nih.gov/pubmed/29220432
http://dx.doi.org/10.1093/database/bax040
work_keys_str_mv AT mottinluc triagebyrankingtosupportthecurationofproteininteractions
AT pascheemilie triagebyrankingtosupportthecurationofproteininteractions
AT gobeilljulien triagebyrankingtosupportthecurationofproteininteractions
AT rechdelavalvalentine triagebyrankingtosupportthecurationofproteininteractions
AT gleizesanne triagebyrankingtosupportthecurationofproteininteractions
AT michelpierreandre triagebyrankingtosupportthecurationofproteininteractions
AT bairochamos triagebyrankingtosupportthecurationofproteininteractions
AT gaudetpascale triagebyrankingtosupportthecurationofproteininteractions
AT ruchpatrick triagebyrankingtosupportthecurationofproteininteractions