Cargando…

Application of text-mining for updating protein post-translational modification annotation in UniProtKB

BACKGROUND: The annotation of protein post-translational modifications (PTMs) is an important task of UniProtKB curators and, with continuing improvements in experimental methodology, an ever greater number of articles are being published on this topic. To help curators cope with this growing body o...

Descripción completa

Detalles Bibliográficos
Autores principales: Veuthey, Anne-Lise, Bridge, Alan, Gobeill, Julien, Ruch, Patrick, McEntyre, Johanna R, Bougueleret, Lydie, Xenarios, Ioannis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3660268/
https://www.ncbi.nlm.nih.gov/pubmed/23517090
http://dx.doi.org/10.1186/1471-2105-14-104
_version_ 1782270531191439360
author Veuthey, Anne-Lise
Bridge, Alan
Gobeill, Julien
Ruch, Patrick
McEntyre, Johanna R
Bougueleret, Lydie
Xenarios, Ioannis
author_facet Veuthey, Anne-Lise
Bridge, Alan
Gobeill, Julien
Ruch, Patrick
McEntyre, Johanna R
Bougueleret, Lydie
Xenarios, Ioannis
author_sort Veuthey, Anne-Lise
collection PubMed
description BACKGROUND: The annotation of protein post-translational modifications (PTMs) is an important task of UniProtKB curators and, with continuing improvements in experimental methodology, an ever greater number of articles are being published on this topic. To help curators cope with this growing body of information we have developed a system which extracts information from the scientific literature for the most frequently annotated PTMs in UniProtKB. RESULTS: The procedure uses a pattern-matching and rule-based approach to extract sentences with information on the type and site of modification. A ranked list of protein candidates for the modification is also provided. For PTM extraction, precision varies from 57% to 94%, and recall from 75% to 95%, according to the type of modification. The procedure was used to track new publications on PTMs and to recover potential supporting evidence for phosphorylation sites annotated based on the results of large scale proteomics experiments. CONCLUSIONS: The information retrieval and extraction method we have developed in this study forms the basis of a simple tool for the manual curation of protein post-translational modifications in UniProtKB/Swiss-Prot. Our work demonstrates that even simple text-mining tools can be effectively adapted for database curation tasks, providing that a thorough understanding of the working process and requirements are first obtained. This system can be accessed at http://eagl.unige.ch/PTM/.
format Online
Article
Text
id pubmed-3660268
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-36602682013-05-22 Application of text-mining for updating protein post-translational modification annotation in UniProtKB Veuthey, Anne-Lise Bridge, Alan Gobeill, Julien Ruch, Patrick McEntyre, Johanna R Bougueleret, Lydie Xenarios, Ioannis BMC Bioinformatics Methodology Article BACKGROUND: The annotation of protein post-translational modifications (PTMs) is an important task of UniProtKB curators and, with continuing improvements in experimental methodology, an ever greater number of articles are being published on this topic. To help curators cope with this growing body of information we have developed a system which extracts information from the scientific literature for the most frequently annotated PTMs in UniProtKB. RESULTS: The procedure uses a pattern-matching and rule-based approach to extract sentences with information on the type and site of modification. A ranked list of protein candidates for the modification is also provided. For PTM extraction, precision varies from 57% to 94%, and recall from 75% to 95%, according to the type of modification. The procedure was used to track new publications on PTMs and to recover potential supporting evidence for phosphorylation sites annotated based on the results of large scale proteomics experiments. CONCLUSIONS: The information retrieval and extraction method we have developed in this study forms the basis of a simple tool for the manual curation of protein post-translational modifications in UniProtKB/Swiss-Prot. Our work demonstrates that even simple text-mining tools can be effectively adapted for database curation tasks, providing that a thorough understanding of the working process and requirements are first obtained. This system can be accessed at http://eagl.unige.ch/PTM/. BioMed Central 2013-03-22 /pmc/articles/PMC3660268/ /pubmed/23517090 http://dx.doi.org/10.1186/1471-2105-14-104 Text en Copyright © 2013 Veuthey et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Veuthey, Anne-Lise
Bridge, Alan
Gobeill, Julien
Ruch, Patrick
McEntyre, Johanna R
Bougueleret, Lydie
Xenarios, Ioannis
Application of text-mining for updating protein post-translational modification annotation in UniProtKB
title Application of text-mining for updating protein post-translational modification annotation in UniProtKB
title_full Application of text-mining for updating protein post-translational modification annotation in UniProtKB
title_fullStr Application of text-mining for updating protein post-translational modification annotation in UniProtKB
title_full_unstemmed Application of text-mining for updating protein post-translational modification annotation in UniProtKB
title_short Application of text-mining for updating protein post-translational modification annotation in UniProtKB
title_sort application of text-mining for updating protein post-translational modification annotation in uniprotkb
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3660268/
https://www.ncbi.nlm.nih.gov/pubmed/23517090
http://dx.doi.org/10.1186/1471-2105-14-104
work_keys_str_mv AT veutheyannelise applicationoftextminingforupdatingproteinposttranslationalmodificationannotationinuniprotkb
AT bridgealan applicationoftextminingforupdatingproteinposttranslationalmodificationannotationinuniprotkb
AT gobeilljulien applicationoftextminingforupdatingproteinposttranslationalmodificationannotationinuniprotkb
AT ruchpatrick applicationoftextminingforupdatingproteinposttranslationalmodificationannotationinuniprotkb
AT mcentyrejohannar applicationoftextminingforupdatingproteinposttranslationalmodificationannotationinuniprotkb
AT bougueleretlydie applicationoftextminingforupdatingproteinposttranslationalmodificationannotationinuniprotkb
AT xenariosioannis applicationoftextminingforupdatingproteinposttranslationalmodificationannotationinuniprotkb