Cargando…

LCD-Composer: an intuitive, composition-centric method enabling the identification and detailed functional mapping of low-complexity domains

Low complexity domains (LCDs) in proteins are regions predominantly composed of a small subset of the possible amino acids. LCDs are involved in a variety of normal and pathological processes across all domains of life. Existing methods define LCDs using information-theoretical complexity thresholds...

Descripción completa

Detalles Bibliográficos
Autores principales: Cascarina, Sean M, King, David C, Osborne Nishimura, Erin, Ross, Eric D
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8153834/
https://www.ncbi.nlm.nih.gov/pubmed/34056598
http://dx.doi.org/10.1093/nargab/lqab048
_version_ 1783698883133571072
author Cascarina, Sean M
King, David C
Osborne Nishimura, Erin
Ross, Eric D
author_facet Cascarina, Sean M
King, David C
Osborne Nishimura, Erin
Ross, Eric D
author_sort Cascarina, Sean M
collection PubMed
description Low complexity domains (LCDs) in proteins are regions predominantly composed of a small subset of the possible amino acids. LCDs are involved in a variety of normal and pathological processes across all domains of life. Existing methods define LCDs using information-theoretical complexity thresholds, sequence alignment with repetitive regions, or statistical overrepresentation of amino acids relative to whole-proteome frequencies. While these methods have proven valuable, they are all indirectly quantifying amino acid composition, which is the fundamental and biologically-relevant feature related to protein sequence complexity. Here, we present a new computational tool, LCD-Composer, that directly identifies LCDs based on amino acid composition and linear amino acid dispersion. Using LCD-Composer's default parameters, we identified simple LCDs across all organisms available through UniProt and provide the resulting data in an accessible form as a resource. Furthermore, we describe large-scale differences between organisms from different domains of life and explore organisms with extreme LCD content for different LCD classes. Finally, we illustrate the versatility and specificity achievable with LCD-Composer by identifying diverse classes of LCDs using both simple and multifaceted composition criteria. We demonstrate that the ability to dissect LCDs based on these multifaceted criteria enhances the functional mapping and classification of LCDs.
format Online
Article
Text
id pubmed-8153834
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-81538342021-05-28 LCD-Composer: an intuitive, composition-centric method enabling the identification and detailed functional mapping of low-complexity domains Cascarina, Sean M King, David C Osborne Nishimura, Erin Ross, Eric D NAR Genom Bioinform Methods Article Low complexity domains (LCDs) in proteins are regions predominantly composed of a small subset of the possible amino acids. LCDs are involved in a variety of normal and pathological processes across all domains of life. Existing methods define LCDs using information-theoretical complexity thresholds, sequence alignment with repetitive regions, or statistical overrepresentation of amino acids relative to whole-proteome frequencies. While these methods have proven valuable, they are all indirectly quantifying amino acid composition, which is the fundamental and biologically-relevant feature related to protein sequence complexity. Here, we present a new computational tool, LCD-Composer, that directly identifies LCDs based on amino acid composition and linear amino acid dispersion. Using LCD-Composer's default parameters, we identified simple LCDs across all organisms available through UniProt and provide the resulting data in an accessible form as a resource. Furthermore, we describe large-scale differences between organisms from different domains of life and explore organisms with extreme LCD content for different LCD classes. Finally, we illustrate the versatility and specificity achievable with LCD-Composer by identifying diverse classes of LCDs using both simple and multifaceted composition criteria. We demonstrate that the ability to dissect LCDs based on these multifaceted criteria enhances the functional mapping and classification of LCDs. Oxford University Press 2021-05-26 /pmc/articles/PMC8153834/ /pubmed/34056598 http://dx.doi.org/10.1093/nargab/lqab048 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Article
Cascarina, Sean M
King, David C
Osborne Nishimura, Erin
Ross, Eric D
LCD-Composer: an intuitive, composition-centric method enabling the identification and detailed functional mapping of low-complexity domains
title LCD-Composer: an intuitive, composition-centric method enabling the identification and detailed functional mapping of low-complexity domains
title_full LCD-Composer: an intuitive, composition-centric method enabling the identification and detailed functional mapping of low-complexity domains
title_fullStr LCD-Composer: an intuitive, composition-centric method enabling the identification and detailed functional mapping of low-complexity domains
title_full_unstemmed LCD-Composer: an intuitive, composition-centric method enabling the identification and detailed functional mapping of low-complexity domains
title_short LCD-Composer: an intuitive, composition-centric method enabling the identification and detailed functional mapping of low-complexity domains
title_sort lcd-composer: an intuitive, composition-centric method enabling the identification and detailed functional mapping of low-complexity domains
topic Methods Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8153834/
https://www.ncbi.nlm.nih.gov/pubmed/34056598
http://dx.doi.org/10.1093/nargab/lqab048
work_keys_str_mv AT cascarinaseanm lcdcomposeranintuitivecompositioncentricmethodenablingtheidentificationanddetailedfunctionalmappingoflowcomplexitydomains
AT kingdavidc lcdcomposeranintuitivecompositioncentricmethodenablingtheidentificationanddetailedfunctionalmappingoflowcomplexitydomains
AT osbornenishimuraerin lcdcomposeranintuitivecompositioncentricmethodenablingtheidentificationanddetailedfunctionalmappingoflowcomplexitydomains
AT rossericd lcdcomposeranintuitivecompositioncentricmethodenablingtheidentificationanddetailedfunctionalmappingoflowcomplexitydomains