Cargando…
Application of text-mining for updating protein post-translational modification annotation in UniProtKB
BACKGROUND: The annotation of protein post-translational modifications (PTMs) is an important task of UniProtKB curators and, with continuing improvements in experimental methodology, an ever greater number of articles are being published on this topic. To help curators cope with this growing body o...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3660268/ https://www.ncbi.nlm.nih.gov/pubmed/23517090 http://dx.doi.org/10.1186/1471-2105-14-104 |
_version_ | 1782270531191439360 |
---|---|
author | Veuthey, Anne-Lise Bridge, Alan Gobeill, Julien Ruch, Patrick McEntyre, Johanna R Bougueleret, Lydie Xenarios, Ioannis |
author_facet | Veuthey, Anne-Lise Bridge, Alan Gobeill, Julien Ruch, Patrick McEntyre, Johanna R Bougueleret, Lydie Xenarios, Ioannis |
author_sort | Veuthey, Anne-Lise |
collection | PubMed |
description | BACKGROUND: The annotation of protein post-translational modifications (PTMs) is an important task of UniProtKB curators and, with continuing improvements in experimental methodology, an ever greater number of articles are being published on this topic. To help curators cope with this growing body of information we have developed a system which extracts information from the scientific literature for the most frequently annotated PTMs in UniProtKB. RESULTS: The procedure uses a pattern-matching and rule-based approach to extract sentences with information on the type and site of modification. A ranked list of protein candidates for the modification is also provided. For PTM extraction, precision varies from 57% to 94%, and recall from 75% to 95%, according to the type of modification. The procedure was used to track new publications on PTMs and to recover potential supporting evidence for phosphorylation sites annotated based on the results of large scale proteomics experiments. CONCLUSIONS: The information retrieval and extraction method we have developed in this study forms the basis of a simple tool for the manual curation of protein post-translational modifications in UniProtKB/Swiss-Prot. Our work demonstrates that even simple text-mining tools can be effectively adapted for database curation tasks, providing that a thorough understanding of the working process and requirements are first obtained. This system can be accessed at http://eagl.unige.ch/PTM/. |
format | Online Article Text |
id | pubmed-3660268 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-36602682013-05-22 Application of text-mining for updating protein post-translational modification annotation in UniProtKB Veuthey, Anne-Lise Bridge, Alan Gobeill, Julien Ruch, Patrick McEntyre, Johanna R Bougueleret, Lydie Xenarios, Ioannis BMC Bioinformatics Methodology Article BACKGROUND: The annotation of protein post-translational modifications (PTMs) is an important task of UniProtKB curators and, with continuing improvements in experimental methodology, an ever greater number of articles are being published on this topic. To help curators cope with this growing body of information we have developed a system which extracts information from the scientific literature for the most frequently annotated PTMs in UniProtKB. RESULTS: The procedure uses a pattern-matching and rule-based approach to extract sentences with information on the type and site of modification. A ranked list of protein candidates for the modification is also provided. For PTM extraction, precision varies from 57% to 94%, and recall from 75% to 95%, according to the type of modification. The procedure was used to track new publications on PTMs and to recover potential supporting evidence for phosphorylation sites annotated based on the results of large scale proteomics experiments. CONCLUSIONS: The information retrieval and extraction method we have developed in this study forms the basis of a simple tool for the manual curation of protein post-translational modifications in UniProtKB/Swiss-Prot. Our work demonstrates that even simple text-mining tools can be effectively adapted for database curation tasks, providing that a thorough understanding of the working process and requirements are first obtained. This system can be accessed at http://eagl.unige.ch/PTM/. BioMed Central 2013-03-22 /pmc/articles/PMC3660268/ /pubmed/23517090 http://dx.doi.org/10.1186/1471-2105-14-104 Text en Copyright © 2013 Veuthey et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Veuthey, Anne-Lise Bridge, Alan Gobeill, Julien Ruch, Patrick McEntyre, Johanna R Bougueleret, Lydie Xenarios, Ioannis Application of text-mining for updating protein post-translational modification annotation in UniProtKB |
title | Application of text-mining for updating protein post-translational modification annotation in UniProtKB |
title_full | Application of text-mining for updating protein post-translational modification annotation in UniProtKB |
title_fullStr | Application of text-mining for updating protein post-translational modification annotation in UniProtKB |
title_full_unstemmed | Application of text-mining for updating protein post-translational modification annotation in UniProtKB |
title_short | Application of text-mining for updating protein post-translational modification annotation in UniProtKB |
title_sort | application of text-mining for updating protein post-translational modification annotation in uniprotkb |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3660268/ https://www.ncbi.nlm.nih.gov/pubmed/23517090 http://dx.doi.org/10.1186/1471-2105-14-104 |
work_keys_str_mv | AT veutheyannelise applicationoftextminingforupdatingproteinposttranslationalmodificationannotationinuniprotkb AT bridgealan applicationoftextminingforupdatingproteinposttranslationalmodificationannotationinuniprotkb AT gobeilljulien applicationoftextminingforupdatingproteinposttranslationalmodificationannotationinuniprotkb AT ruchpatrick applicationoftextminingforupdatingproteinposttranslationalmodificationannotationinuniprotkb AT mcentyrejohannar applicationoftextminingforupdatingproteinposttranslationalmodificationannotationinuniprotkb AT bougueleretlydie applicationoftextminingforupdatingproteinposttranslationalmodificationannotationinuniprotkb AT xenariosioannis applicationoftextminingforupdatingproteinposttranslationalmodificationannotationinuniprotkb |