Cargando…
BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation
The prediction of gene functions is crucial for a large number of different life science areas. Faster high throughput sequencing techniques generate more and larger datasets. The manual annotation by classical wet-lab experiments is not suitable for these large amounts of data. We showed earlier th...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5531587/ https://www.ncbi.nlm.nih.gov/pubmed/28750104 http://dx.doi.org/10.1371/journal.pone.0182216 |
_version_ | 1783253389544521728 |
---|---|
author | Dudek, Christian-Alexander Dannheim, Henning Schomburg, Dietmar |
author_facet | Dudek, Christian-Alexander Dannheim, Henning Schomburg, Dietmar |
author_sort | Dudek, Christian-Alexander |
collection | PubMed |
description | The prediction of gene functions is crucial for a large number of different life science areas. Faster high throughput sequencing techniques generate more and larger datasets. The manual annotation by classical wet-lab experiments is not suitable for these large amounts of data. We showed earlier that the automatic sequence pattern-based BrEPS protocol, based on manually curated sequences, can be used for the prediction of enzymatic functions of genes. The growing sequence databases provide the opportunity for more reliable patterns, but are also a challenge for the implementation of automatic protocols. We reimplemented and optimized the BrEPS pattern generation to be applicable for larger datasets in an acceptable timescale. Primary improvement of the new BrEPS protocol is the enhanced data selection step. Manually curated annotations from Swiss-Prot are used as reliable source for function prediction of enzymes observed on protein level. The pool of sequences is extended by highly similar sequences from TrEMBL and SwissProt. This allows us to restrict the selection of Swiss-Prot entries, without losing the diversity of sequences needed to generate significant patterns. Additionally, a supporting pattern type was introduced by extending the patterns at semi-conserved positions with highly similar amino acids. Extended patterns have an increased complexity, increasing the chance to match more sequences, without losing the essential structural information of the pattern. To enhance the usability of the database, we introduced enzyme function prediction based on consensus EC numbers and IUBMB enzyme nomenclature. BrEPS is part of the Braunschweig Enzyme Database (BRENDA) and is available on a completely redesigned website and as download. The database can be downloaded and used with the BrEPScmd command line tool for large scale sequence analysis. The BrEPS website and downloads for the database creation tool, command line tool and database are freely accessible at http://breps.tu-bs.de. |
format | Online Article Text |
id | pubmed-5531587 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-55315872017-08-07 BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation Dudek, Christian-Alexander Dannheim, Henning Schomburg, Dietmar PLoS One Research Article The prediction of gene functions is crucial for a large number of different life science areas. Faster high throughput sequencing techniques generate more and larger datasets. The manual annotation by classical wet-lab experiments is not suitable for these large amounts of data. We showed earlier that the automatic sequence pattern-based BrEPS protocol, based on manually curated sequences, can be used for the prediction of enzymatic functions of genes. The growing sequence databases provide the opportunity for more reliable patterns, but are also a challenge for the implementation of automatic protocols. We reimplemented and optimized the BrEPS pattern generation to be applicable for larger datasets in an acceptable timescale. Primary improvement of the new BrEPS protocol is the enhanced data selection step. Manually curated annotations from Swiss-Prot are used as reliable source for function prediction of enzymes observed on protein level. The pool of sequences is extended by highly similar sequences from TrEMBL and SwissProt. This allows us to restrict the selection of Swiss-Prot entries, without losing the diversity of sequences needed to generate significant patterns. Additionally, a supporting pattern type was introduced by extending the patterns at semi-conserved positions with highly similar amino acids. Extended patterns have an increased complexity, increasing the chance to match more sequences, without losing the essential structural information of the pattern. To enhance the usability of the database, we introduced enzyme function prediction based on consensus EC numbers and IUBMB enzyme nomenclature. BrEPS is part of the Braunschweig Enzyme Database (BRENDA) and is available on a completely redesigned website and as download. The database can be downloaded and used with the BrEPScmd command line tool for large scale sequence analysis. The BrEPS website and downloads for the database creation tool, command line tool and database are freely accessible at http://breps.tu-bs.de. Public Library of Science 2017-07-27 /pmc/articles/PMC5531587/ /pubmed/28750104 http://dx.doi.org/10.1371/journal.pone.0182216 Text en © 2017 Dudek et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Dudek, Christian-Alexander Dannheim, Henning Schomburg, Dietmar BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation |
title | BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation |
title_full | BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation |
title_fullStr | BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation |
title_full_unstemmed | BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation |
title_short | BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation |
title_sort | breps 2.0: optimization of sequence pattern prediction for enzyme annotation |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5531587/ https://www.ncbi.nlm.nih.gov/pubmed/28750104 http://dx.doi.org/10.1371/journal.pone.0182216 |
work_keys_str_mv | AT dudekchristianalexander breps20optimizationofsequencepatternpredictionforenzymeannotation AT dannheimhenning breps20optimizationofsequencepatternpredictionforenzymeannotation AT schomburgdietmar breps20optimizationofsequencepatternpredictionforenzymeannotation |