Cargando…

A General Framework to Learn Tertiary Structure for Protein Sequence Characterization

During the past five years, deep-learning algorithms have enabled ground-breaking progress towards the prediction of tertiary structure from a protein sequence. Very recently, we developed SAdLSA, a new computational algorithm for protein sequence comparison via deep-learning of protein structural a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gao, Mu, Skolnick, Jeffrey
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2021
Materias:	Bioinformatics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8301223/ https://www.ncbi.nlm.nih.gov/pubmed/34308415 http://dx.doi.org/10.3389/fbinf.2021.689960

_version_	1783726621989011456
author	Gao, Mu Skolnick, Jeffrey
author_facet	Gao, Mu Skolnick, Jeffrey
author_sort	Gao, Mu
collection	PubMed
description	During the past five years, deep-learning algorithms have enabled ground-breaking progress towards the prediction of tertiary structure from a protein sequence. Very recently, we developed SAdLSA, a new computational algorithm for protein sequence comparison via deep-learning of protein structural alignments. SAdLSA shows significant improvement over established sequence alignment methods. In this contribution, we show that SAdLSA provides a general machine-learning framework for structurally characterizing protein sequences. By aligning a protein sequence against itself, SAdLSA generates a fold distogram for the input sequence, including challenging cases whose structural folds were not present in the training set. About 70% of the predicted distograms are statistically significant. Although at present the accuracy of the intra-sequence distogram predicted by SAdLSA self-alignment is not as good as deep-learning algorithms specifically trained for distogram prediction, it is remarkable that the prediction of single protein structures is encoded by an algorithm that learns ensembles of pairwise structural comparisons, without being explicitly trained to recognize individual structural folds. As such, SAdLSA can not only predict protein folds for individual sequences, but also detects subtle, yet significant, structural relationships between multiple protein sequences using the same deep-learning neural network. The former reduces to a special case in this general framework for protein sequence annotation.
format	Online Article Text
id	pubmed-8301223
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-83012232022-04-01 A General Framework to Learn Tertiary Structure for Protein Sequence Characterization Gao, Mu Skolnick, Jeffrey Front Bioinform Bioinformatics During the past five years, deep-learning algorithms have enabled ground-breaking progress towards the prediction of tertiary structure from a protein sequence. Very recently, we developed SAdLSA, a new computational algorithm for protein sequence comparison via deep-learning of protein structural alignments. SAdLSA shows significant improvement over established sequence alignment methods. In this contribution, we show that SAdLSA provides a general machine-learning framework for structurally characterizing protein sequences. By aligning a protein sequence against itself, SAdLSA generates a fold distogram for the input sequence, including challenging cases whose structural folds were not present in the training set. About 70% of the predicted distograms are statistically significant. Although at present the accuracy of the intra-sequence distogram predicted by SAdLSA self-alignment is not as good as deep-learning algorithms specifically trained for distogram prediction, it is remarkable that the prediction of single protein structures is encoded by an algorithm that learns ensembles of pairwise structural comparisons, without being explicitly trained to recognize individual structural folds. As such, SAdLSA can not only predict protein folds for individual sequences, but also detects subtle, yet significant, structural relationships between multiple protein sequences using the same deep-learning neural network. The former reduces to a special case in this general framework for protein sequence annotation. Frontiers Media S.A. 2021-05-21 /pmc/articles/PMC8301223/ /pubmed/34308415 http://dx.doi.org/10.3389/fbinf.2021.689960 Text en Copyright © 2021 Gao and Skolnick. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Bioinformatics Gao, Mu Skolnick, Jeffrey A General Framework to Learn Tertiary Structure for Protein Sequence Characterization
title	A General Framework to Learn Tertiary Structure for Protein Sequence Characterization
title_full	A General Framework to Learn Tertiary Structure for Protein Sequence Characterization
title_fullStr	A General Framework to Learn Tertiary Structure for Protein Sequence Characterization
title_full_unstemmed	A General Framework to Learn Tertiary Structure for Protein Sequence Characterization
title_short	A General Framework to Learn Tertiary Structure for Protein Sequence Characterization
title_sort	general framework to learn tertiary structure for protein sequence characterization
topic	Bioinformatics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8301223/ https://www.ncbi.nlm.nih.gov/pubmed/34308415 http://dx.doi.org/10.3389/fbinf.2021.689960
work_keys_str_mv	AT gaomu ageneralframeworktolearntertiarystructureforproteinsequencecharacterization AT skolnickjeffrey ageneralframeworktolearntertiarystructureforproteinsequencecharacterization AT gaomu generalframeworktolearntertiarystructureforproteinsequencecharacterization AT skolnickjeffrey generalframeworktolearntertiarystructureforproteinsequencecharacterization

A General Framework to Learn Tertiary Structure for Protein Sequence Characterization

Ejemplares similares