Cargando…
Increasing the accuracy of single sequence prediction methods using a deep semi-supervised learning framework
MOTIVATION: Over the past 50 years, our ability to model protein sequences with evolutionary information has progressed in leaps and bounds. However, even with the latest deep learning methods, the modelling of a critically important class of proteins, single orphan sequences, remains unsolved. RESU...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8570780/ https://www.ncbi.nlm.nih.gov/pubmed/34213528 http://dx.doi.org/10.1093/bioinformatics/btab491 |
_version_ | 1784594890480418816 |
---|---|
author | Moffat, Lewis Jones, David T |
author_facet | Moffat, Lewis Jones, David T |
author_sort | Moffat, Lewis |
collection | PubMed |
description | MOTIVATION: Over the past 50 years, our ability to model protein sequences with evolutionary information has progressed in leaps and bounds. However, even with the latest deep learning methods, the modelling of a critically important class of proteins, single orphan sequences, remains unsolved. RESULTS: By taking a bioinformatics approach to semi-supervised machine learning, we develop Profile Augmentation of Single Sequences (PASS), a simple but powerful framework for building accurate single-sequence methods. To demonstrate the effectiveness of PASS we apply it to the mature field of secondary structure prediction. In doing so we develop S4PRED, the successor to the open-source PSIPRED-Single method, which achieves an unprecedented Q(3) score of 75.3% on the standard CB513 test. PASS provides a blueprint for the development of a new generation of predictive methods, advancing our ability to model individual protein sequences. AVAILABILITY AND IMPLEMENTATION: The S4PRED model is available as open source software on the PSIPRED GitHub repository (https://github.com/psipred/s4pred), along with documentation. It will also be provided as a part of the PSIPRED web service (http://bioinf.cs.ucl.ac.uk/psipred/). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-8570780 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-85707802021-11-08 Increasing the accuracy of single sequence prediction methods using a deep semi-supervised learning framework Moffat, Lewis Jones, David T Bioinformatics Original Papers MOTIVATION: Over the past 50 years, our ability to model protein sequences with evolutionary information has progressed in leaps and bounds. However, even with the latest deep learning methods, the modelling of a critically important class of proteins, single orphan sequences, remains unsolved. RESULTS: By taking a bioinformatics approach to semi-supervised machine learning, we develop Profile Augmentation of Single Sequences (PASS), a simple but powerful framework for building accurate single-sequence methods. To demonstrate the effectiveness of PASS we apply it to the mature field of secondary structure prediction. In doing so we develop S4PRED, the successor to the open-source PSIPRED-Single method, which achieves an unprecedented Q(3) score of 75.3% on the standard CB513 test. PASS provides a blueprint for the development of a new generation of predictive methods, advancing our ability to model individual protein sequences. AVAILABILITY AND IMPLEMENTATION: The S4PRED model is available as open source software on the PSIPRED GitHub repository (https://github.com/psipred/s4pred), along with documentation. It will also be provided as a part of the PSIPRED web service (http://bioinf.cs.ucl.ac.uk/psipred/). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-07-02 /pmc/articles/PMC8570780/ /pubmed/34213528 http://dx.doi.org/10.1093/bioinformatics/btab491 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Moffat, Lewis Jones, David T Increasing the accuracy of single sequence prediction methods using a deep semi-supervised learning framework |
title | Increasing the accuracy of single sequence prediction methods using a deep semi-supervised learning framework |
title_full | Increasing the accuracy of single sequence prediction methods using a deep semi-supervised learning framework |
title_fullStr | Increasing the accuracy of single sequence prediction methods using a deep semi-supervised learning framework |
title_full_unstemmed | Increasing the accuracy of single sequence prediction methods using a deep semi-supervised learning framework |
title_short | Increasing the accuracy of single sequence prediction methods using a deep semi-supervised learning framework |
title_sort | increasing the accuracy of single sequence prediction methods using a deep semi-supervised learning framework |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8570780/ https://www.ncbi.nlm.nih.gov/pubmed/34213528 http://dx.doi.org/10.1093/bioinformatics/btab491 |
work_keys_str_mv | AT moffatlewis increasingtheaccuracyofsinglesequencepredictionmethodsusingadeepsemisupervisedlearningframework AT jonesdavidt increasingtheaccuracyofsinglesequencepredictionmethodsusingadeepsemisupervisedlearningframework |