Cargando…

Insights from analyses of low complexity regions with canonical methods for protein sequence comparison

Low complexity regions are fragments of protein sequences composed of only a few types of amino acids. These regions frequently occur in proteins and can play an important role in their functions. However, scientists are mainly focused on regions characterized by high diversity of amino acid composi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jarnot, Patryk, Ziemska-Legiecka, Joanna, Grynberg, Marcin, Gruca, Aleksandra
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2022
Materias:	Problem Solving Protocol
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9487646/ https://www.ncbi.nlm.nih.gov/pubmed/35914952 http://dx.doi.org/10.1093/bib/bbac299

_version_	1784792497262690304
author	Jarnot, Patryk Ziemska-Legiecka, Joanna Grynberg, Marcin Gruca, Aleksandra
author_facet	Jarnot, Patryk Ziemska-Legiecka, Joanna Grynberg, Marcin Gruca, Aleksandra
author_sort	Jarnot, Patryk
collection	PubMed
description	Low complexity regions are fragments of protein sequences composed of only a few types of amino acids. These regions frequently occur in proteins and can play an important role in their functions. However, scientists are mainly focused on regions characterized by high diversity of amino acid composition. Similarity between regions of protein sequences frequently reflect functional similarity between them. In this article, we discuss strengths and weaknesses of the similarity analysis of low complexity regions using BLAST, HHblits and CD-HIT. These methods are considered to be the gold standard in protein similarity analysis and were designed for comparison of high complexity regions. However, we lack specialized methods that could be used to compare the similarity of low complexity regions. Therefore, we investigated the existing methods in order to understand how they can be applied to compare such regions. Our results are supported by exploratory study, discussion of amino acid composition and biological roles of selected examples. We show that existing methods need improvements to efficiently search for similar low complexity regions. We suggest features that have to be re-designed specifically for comparing low complexity regions: scoring matrix, multiple sequence alignment, e-value, local alignment and clustering based on a set of representative sequences. Results of this analysis can either be used to improve existing methods or to create new methods for the similarity analysis of low complexity regions.
format	Online Article Text
id	pubmed-9487646
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-94876462022-09-21 Insights from analyses of low complexity regions with canonical methods for protein sequence comparison Jarnot, Patryk Ziemska-Legiecka, Joanna Grynberg, Marcin Gruca, Aleksandra Brief Bioinform Problem Solving Protocol Low complexity regions are fragments of protein sequences composed of only a few types of amino acids. These regions frequently occur in proteins and can play an important role in their functions. However, scientists are mainly focused on regions characterized by high diversity of amino acid composition. Similarity between regions of protein sequences frequently reflect functional similarity between them. In this article, we discuss strengths and weaknesses of the similarity analysis of low complexity regions using BLAST, HHblits and CD-HIT. These methods are considered to be the gold standard in protein similarity analysis and were designed for comparison of high complexity regions. However, we lack specialized methods that could be used to compare the similarity of low complexity regions. Therefore, we investigated the existing methods in order to understand how they can be applied to compare such regions. Our results are supported by exploratory study, discussion of amino acid composition and biological roles of selected examples. We show that existing methods need improvements to efficiently search for similar low complexity regions. We suggest features that have to be re-designed specifically for comparing low complexity regions: scoring matrix, multiple sequence alignment, e-value, local alignment and clustering based on a set of representative sequences. Results of this analysis can either be used to improve existing methods or to create new methods for the similarity analysis of low complexity regions. Oxford University Press 2022-08-01 /pmc/articles/PMC9487646/ /pubmed/35914952 http://dx.doi.org/10.1093/bib/bbac299 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Problem Solving Protocol Jarnot, Patryk Ziemska-Legiecka, Joanna Grynberg, Marcin Gruca, Aleksandra Insights from analyses of low complexity regions with canonical methods for protein sequence comparison
title	Insights from analyses of low complexity regions with canonical methods for protein sequence comparison
title_full	Insights from analyses of low complexity regions with canonical methods for protein sequence comparison
title_fullStr	Insights from analyses of low complexity regions with canonical methods for protein sequence comparison
title_full_unstemmed	Insights from analyses of low complexity regions with canonical methods for protein sequence comparison
title_short	Insights from analyses of low complexity regions with canonical methods for protein sequence comparison
title_sort	insights from analyses of low complexity regions with canonical methods for protein sequence comparison
topic	Problem Solving Protocol
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9487646/ https://www.ncbi.nlm.nih.gov/pubmed/35914952 http://dx.doi.org/10.1093/bib/bbac299
work_keys_str_mv	AT jarnotpatryk insightsfromanalysesoflowcomplexityregionswithcanonicalmethodsforproteinsequencecomparison AT ziemskalegieckajoanna insightsfromanalysesoflowcomplexityregionswithcanonicalmethodsforproteinsequencecomparison AT grynbergmarcin insightsfromanalysesoflowcomplexityregionswithcanonicalmethodsforproteinsequencecomparison AT grucaaleksandra insightsfromanalysesoflowcomplexityregionswithcanonicalmethodsforproteinsequencecomparison

Insights from analyses of low complexity regions with canonical methods for protein sequence comparison

Ejemplares similares