Cargando…

Entropy and Variability: A Second Opinion by Deep Learning

Background: Analysis of the distribution of amino acid types found at equivalent positions in multiple sequence alignments has found applications in human genetics, protein engineering, drug design, protein structure prediction, and many other fields. These analyses tend to revolve around measures o...

Descripción completa

Detalles Bibliográficos
Autores principales:	Rademaker, Daniel T., Xue, Li C., ‘t Hoen, Peter A. C., Vriend, Gert
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9775329/ https://www.ncbi.nlm.nih.gov/pubmed/36551168 http://dx.doi.org/10.3390/biom12121740

_version_	1784855617680179200
author	Rademaker, Daniel T. Xue, Li C. ‘t Hoen, Peter A. C. Vriend, Gert
author_facet	Rademaker, Daniel T. Xue, Li C. ‘t Hoen, Peter A. C. Vriend, Gert
author_sort	Rademaker, Daniel T.
collection	PubMed
description	Background: Analysis of the distribution of amino acid types found at equivalent positions in multiple sequence alignments has found applications in human genetics, protein engineering, drug design, protein structure prediction, and many other fields. These analyses tend to revolve around measures of the distribution of the twenty amino acid types found at evolutionary equivalent positions: the columns in multiple sequence alignments. Commonly used measures are variability, average hydrophobicity, or Shannon entropy. One of these techniques, called entropy–variability analysis, as the name already suggests, reduces the distribution of observed residue types in one column to two numbers: the Shannon entropy and the variability as defined by the number of residue types observed. Results: We applied a deep learning, unsupervised feature extraction method to analyse the multiple sequence alignments of all human proteins. An auto-encoder neural architecture was trained on 27,835 multiple sequence alignments for human proteins to obtain the two features that best describe the seven million variability patterns. These two unsupervised learned features strongly resemble entropy and variability, indicating that these are the projections that retain most information when reducing the dimensionality of the information hidden in columns in multiple sequence alignments.
format	Online Article Text
id	pubmed-9775329
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-97753292022-12-23 Entropy and Variability: A Second Opinion by Deep Learning Rademaker, Daniel T. Xue, Li C. ‘t Hoen, Peter A. C. Vriend, Gert Biomolecules Article Background: Analysis of the distribution of amino acid types found at equivalent positions in multiple sequence alignments has found applications in human genetics, protein engineering, drug design, protein structure prediction, and many other fields. These analyses tend to revolve around measures of the distribution of the twenty amino acid types found at evolutionary equivalent positions: the columns in multiple sequence alignments. Commonly used measures are variability, average hydrophobicity, or Shannon entropy. One of these techniques, called entropy–variability analysis, as the name already suggests, reduces the distribution of observed residue types in one column to two numbers: the Shannon entropy and the variability as defined by the number of residue types observed. Results: We applied a deep learning, unsupervised feature extraction method to analyse the multiple sequence alignments of all human proteins. An auto-encoder neural architecture was trained on 27,835 multiple sequence alignments for human proteins to obtain the two features that best describe the seven million variability patterns. These two unsupervised learned features strongly resemble entropy and variability, indicating that these are the projections that retain most information when reducing the dimensionality of the information hidden in columns in multiple sequence alignments. MDPI 2022-11-23 /pmc/articles/PMC9775329/ /pubmed/36551168 http://dx.doi.org/10.3390/biom12121740 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Rademaker, Daniel T. Xue, Li C. ‘t Hoen, Peter A. C. Vriend, Gert Entropy and Variability: A Second Opinion by Deep Learning
title	Entropy and Variability: A Second Opinion by Deep Learning
title_full	Entropy and Variability: A Second Opinion by Deep Learning
title_fullStr	Entropy and Variability: A Second Opinion by Deep Learning
title_full_unstemmed	Entropy and Variability: A Second Opinion by Deep Learning
title_short	Entropy and Variability: A Second Opinion by Deep Learning
title_sort	entropy and variability: a second opinion by deep learning
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9775329/ https://www.ncbi.nlm.nih.gov/pubmed/36551168 http://dx.doi.org/10.3390/biom12121740
work_keys_str_mv	AT rademakerdanielt entropyandvariabilityasecondopinionbydeeplearning AT xuelic entropyandvariabilityasecondopinionbydeeplearning AT thoenpeterac entropyandvariabilityasecondopinionbydeeplearning AT vriendgert entropyandvariabilityasecondopinionbydeeplearning

Entropy and Variability: A Second Opinion by Deep Learning

Ejemplares similares