Cargando…

Increasing the accuracy of single sequence prediction methods using a deep semi-supervised learning framework

MOTIVATION: Over the past 50 years, our ability to model protein sequences with evolutionary information has progressed in leaps and bounds. However, even with the latest deep learning methods, the modelling of a critically important class of proteins, single orphan sequences, remains unsolved. RESU...

Descripción completa

Detalles Bibliográficos
Autores principales: Moffat, Lewis, Jones, David T
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8570780/
https://www.ncbi.nlm.nih.gov/pubmed/34213528
http://dx.doi.org/10.1093/bioinformatics/btab491
_version_ 1784594890480418816
author Moffat, Lewis
Jones, David T
author_facet Moffat, Lewis
Jones, David T
author_sort Moffat, Lewis
collection PubMed
description MOTIVATION: Over the past 50 years, our ability to model protein sequences with evolutionary information has progressed in leaps and bounds. However, even with the latest deep learning methods, the modelling of a critically important class of proteins, single orphan sequences, remains unsolved. RESULTS: By taking a bioinformatics approach to semi-supervised machine learning, we develop Profile Augmentation of Single Sequences (PASS), a simple but powerful framework for building accurate single-sequence methods. To demonstrate the effectiveness of PASS we apply it to the mature field of secondary structure prediction. In doing so we develop S4PRED, the successor to the open-source PSIPRED-Single method, which achieves an unprecedented Q(3) score of 75.3% on the standard CB513 test. PASS provides a blueprint for the development of a new generation of predictive methods, advancing our ability to model individual protein sequences. AVAILABILITY AND IMPLEMENTATION: The S4PRED model is available as open source software on the PSIPRED GitHub repository (https://github.com/psipred/s4pred), along with documentation. It will also be provided as a part of the PSIPRED web service (http://bioinf.cs.ucl.ac.uk/psipred/). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8570780
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-85707802021-11-08 Increasing the accuracy of single sequence prediction methods using a deep semi-supervised learning framework Moffat, Lewis Jones, David T Bioinformatics Original Papers MOTIVATION: Over the past 50 years, our ability to model protein sequences with evolutionary information has progressed in leaps and bounds. However, even with the latest deep learning methods, the modelling of a critically important class of proteins, single orphan sequences, remains unsolved. RESULTS: By taking a bioinformatics approach to semi-supervised machine learning, we develop Profile Augmentation of Single Sequences (PASS), a simple but powerful framework for building accurate single-sequence methods. To demonstrate the effectiveness of PASS we apply it to the mature field of secondary structure prediction. In doing so we develop S4PRED, the successor to the open-source PSIPRED-Single method, which achieves an unprecedented Q(3) score of 75.3% on the standard CB513 test. PASS provides a blueprint for the development of a new generation of predictive methods, advancing our ability to model individual protein sequences. AVAILABILITY AND IMPLEMENTATION: The S4PRED model is available as open source software on the PSIPRED GitHub repository (https://github.com/psipred/s4pred), along with documentation. It will also be provided as a part of the PSIPRED web service (http://bioinf.cs.ucl.ac.uk/psipred/). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-07-02 /pmc/articles/PMC8570780/ /pubmed/34213528 http://dx.doi.org/10.1093/bioinformatics/btab491 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Moffat, Lewis
Jones, David T
Increasing the accuracy of single sequence prediction methods using a deep semi-supervised learning framework
title Increasing the accuracy of single sequence prediction methods using a deep semi-supervised learning framework
title_full Increasing the accuracy of single sequence prediction methods using a deep semi-supervised learning framework
title_fullStr Increasing the accuracy of single sequence prediction methods using a deep semi-supervised learning framework
title_full_unstemmed Increasing the accuracy of single sequence prediction methods using a deep semi-supervised learning framework
title_short Increasing the accuracy of single sequence prediction methods using a deep semi-supervised learning framework
title_sort increasing the accuracy of single sequence prediction methods using a deep semi-supervised learning framework
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8570780/
https://www.ncbi.nlm.nih.gov/pubmed/34213528
http://dx.doi.org/10.1093/bioinformatics/btab491
work_keys_str_mv AT moffatlewis increasingtheaccuracyofsinglesequencepredictionmethodsusingadeepsemisupervisedlearningframework
AT jonesdavidt increasingtheaccuracyofsinglesequencepredictionmethodsusingadeepsemisupervisedlearningframework