Cargando…
Unsupervised Deep Learning Can Identify Protein Functional Groups from Unaligned Sequences
Interpreting protein function from sequence data is a fundamental goal of bioinformatics. However, our current understanding of protein diversity is bottlenecked by the fact that most proteins have only been functionally validated in model organisms, limiting our understanding of how function varies...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10231473/ https://www.ncbi.nlm.nih.gov/pubmed/37217837 http://dx.doi.org/10.1093/gbe/evad084 |
_version_ | 1785051763995312128 |
---|---|
author | David, Kyle T Halanych, Kenneth M |
author_facet | David, Kyle T Halanych, Kenneth M |
author_sort | David, Kyle T |
collection | PubMed |
description | Interpreting protein function from sequence data is a fundamental goal of bioinformatics. However, our current understanding of protein diversity is bottlenecked by the fact that most proteins have only been functionally validated in model organisms, limiting our understanding of how function varies with gene sequence diversity. Thus, accuracy of inferences in clades without model representatives is questionable. Unsupervised learning may help to ameliorate this bias by identifying highly complex patterns and structure from large data sets without external labels. Here, we present DeepSeqProt, an unsupervised deep learning program for exploring large protein sequence data sets. DeepSeqProt is a clustering tool capable of distinguishing between broad classes of proteins while learning local and global structure of functional space. DeepSeqProt is capable of learning salient biological features from unaligned, unannotated sequences. DeepSeqProt is more likely to capture complete protein families and statistically significant shared ontologies within proteomes than other clustering methods. We hope this framework will prove of use to researchers and provide a preliminary step in further developing unsupervised deep learning in molecular biology. |
format | Online Article Text |
id | pubmed-10231473 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-102314732023-06-01 Unsupervised Deep Learning Can Identify Protein Functional Groups from Unaligned Sequences David, Kyle T Halanych, Kenneth M Genome Biol Evol Article Interpreting protein function from sequence data is a fundamental goal of bioinformatics. However, our current understanding of protein diversity is bottlenecked by the fact that most proteins have only been functionally validated in model organisms, limiting our understanding of how function varies with gene sequence diversity. Thus, accuracy of inferences in clades without model representatives is questionable. Unsupervised learning may help to ameliorate this bias by identifying highly complex patterns and structure from large data sets without external labels. Here, we present DeepSeqProt, an unsupervised deep learning program for exploring large protein sequence data sets. DeepSeqProt is a clustering tool capable of distinguishing between broad classes of proteins while learning local and global structure of functional space. DeepSeqProt is capable of learning salient biological features from unaligned, unannotated sequences. DeepSeqProt is more likely to capture complete protein families and statistically significant shared ontologies within proteomes than other clustering methods. We hope this framework will prove of use to researchers and provide a preliminary step in further developing unsupervised deep learning in molecular biology. Oxford University Press 2023-05-22 /pmc/articles/PMC10231473/ /pubmed/37217837 http://dx.doi.org/10.1093/gbe/evad084 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Article David, Kyle T Halanych, Kenneth M Unsupervised Deep Learning Can Identify Protein Functional Groups from Unaligned Sequences |
title | Unsupervised Deep Learning Can Identify Protein Functional Groups from Unaligned Sequences |
title_full | Unsupervised Deep Learning Can Identify Protein Functional Groups from Unaligned Sequences |
title_fullStr | Unsupervised Deep Learning Can Identify Protein Functional Groups from Unaligned Sequences |
title_full_unstemmed | Unsupervised Deep Learning Can Identify Protein Functional Groups from Unaligned Sequences |
title_short | Unsupervised Deep Learning Can Identify Protein Functional Groups from Unaligned Sequences |
title_sort | unsupervised deep learning can identify protein functional groups from unaligned sequences |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10231473/ https://www.ncbi.nlm.nih.gov/pubmed/37217837 http://dx.doi.org/10.1093/gbe/evad084 |
work_keys_str_mv | AT davidkylet unsuperviseddeeplearningcanidentifyproteinfunctionalgroupsfromunalignedsequences AT halanychkennethm unsuperviseddeeplearningcanidentifyproteinfunctionalgroupsfromunalignedsequences |