Cargando…

BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation

The prediction of gene functions is crucial for a large number of different life science areas. Faster high throughput sequencing techniques generate more and larger datasets. The manual annotation by classical wet-lab experiments is not suitable for these large amounts of data. We showed earlier th...

Descripción completa

Detalles Bibliográficos
Autores principales: Dudek, Christian-Alexander, Dannheim, Henning, Schomburg, Dietmar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5531587/
https://www.ncbi.nlm.nih.gov/pubmed/28750104
http://dx.doi.org/10.1371/journal.pone.0182216
_version_ 1783253389544521728
author Dudek, Christian-Alexander
Dannheim, Henning
Schomburg, Dietmar
author_facet Dudek, Christian-Alexander
Dannheim, Henning
Schomburg, Dietmar
author_sort Dudek, Christian-Alexander
collection PubMed
description The prediction of gene functions is crucial for a large number of different life science areas. Faster high throughput sequencing techniques generate more and larger datasets. The manual annotation by classical wet-lab experiments is not suitable for these large amounts of data. We showed earlier that the automatic sequence pattern-based BrEPS protocol, based on manually curated sequences, can be used for the prediction of enzymatic functions of genes. The growing sequence databases provide the opportunity for more reliable patterns, but are also a challenge for the implementation of automatic protocols. We reimplemented and optimized the BrEPS pattern generation to be applicable for larger datasets in an acceptable timescale. Primary improvement of the new BrEPS protocol is the enhanced data selection step. Manually curated annotations from Swiss-Prot are used as reliable source for function prediction of enzymes observed on protein level. The pool of sequences is extended by highly similar sequences from TrEMBL and SwissProt. This allows us to restrict the selection of Swiss-Prot entries, without losing the diversity of sequences needed to generate significant patterns. Additionally, a supporting pattern type was introduced by extending the patterns at semi-conserved positions with highly similar amino acids. Extended patterns have an increased complexity, increasing the chance to match more sequences, without losing the essential structural information of the pattern. To enhance the usability of the database, we introduced enzyme function prediction based on consensus EC numbers and IUBMB enzyme nomenclature. BrEPS is part of the Braunschweig Enzyme Database (BRENDA) and is available on a completely redesigned website and as download. The database can be downloaded and used with the BrEPScmd command line tool for large scale sequence analysis. The BrEPS website and downloads for the database creation tool, command line tool and database are freely accessible at http://breps.tu-bs.de.
format Online
Article
Text
id pubmed-5531587
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-55315872017-08-07 BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation Dudek, Christian-Alexander Dannheim, Henning Schomburg, Dietmar PLoS One Research Article The prediction of gene functions is crucial for a large number of different life science areas. Faster high throughput sequencing techniques generate more and larger datasets. The manual annotation by classical wet-lab experiments is not suitable for these large amounts of data. We showed earlier that the automatic sequence pattern-based BrEPS protocol, based on manually curated sequences, can be used for the prediction of enzymatic functions of genes. The growing sequence databases provide the opportunity for more reliable patterns, but are also a challenge for the implementation of automatic protocols. We reimplemented and optimized the BrEPS pattern generation to be applicable for larger datasets in an acceptable timescale. Primary improvement of the new BrEPS protocol is the enhanced data selection step. Manually curated annotations from Swiss-Prot are used as reliable source for function prediction of enzymes observed on protein level. The pool of sequences is extended by highly similar sequences from TrEMBL and SwissProt. This allows us to restrict the selection of Swiss-Prot entries, without losing the diversity of sequences needed to generate significant patterns. Additionally, a supporting pattern type was introduced by extending the patterns at semi-conserved positions with highly similar amino acids. Extended patterns have an increased complexity, increasing the chance to match more sequences, without losing the essential structural information of the pattern. To enhance the usability of the database, we introduced enzyme function prediction based on consensus EC numbers and IUBMB enzyme nomenclature. BrEPS is part of the Braunschweig Enzyme Database (BRENDA) and is available on a completely redesigned website and as download. The database can be downloaded and used with the BrEPScmd command line tool for large scale sequence analysis. The BrEPS website and downloads for the database creation tool, command line tool and database are freely accessible at http://breps.tu-bs.de. Public Library of Science 2017-07-27 /pmc/articles/PMC5531587/ /pubmed/28750104 http://dx.doi.org/10.1371/journal.pone.0182216 Text en © 2017 Dudek et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Dudek, Christian-Alexander
Dannheim, Henning
Schomburg, Dietmar
BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation
title BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation
title_full BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation
title_fullStr BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation
title_full_unstemmed BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation
title_short BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation
title_sort breps 2.0: optimization of sequence pattern prediction for enzyme annotation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5531587/
https://www.ncbi.nlm.nih.gov/pubmed/28750104
http://dx.doi.org/10.1371/journal.pone.0182216
work_keys_str_mv AT dudekchristianalexander breps20optimizationofsequencepatternpredictionforenzymeannotation
AT dannheimhenning breps20optimizationofsequencepatternpredictionforenzymeannotation
AT schomburgdietmar breps20optimizationofsequencepatternpredictionforenzymeannotation