Cargando…

A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins

Because of the limited effectiveness of prevailing phylogenetic methods when applied to highly divergent protein sequences, the phylogenetic analysis problem remains challenging. Here, we propose a sequence-based evolutionary distance algorithm termed sequence distance (SD), which innovatively incor...

Descripción completa

Detalles Bibliográficos
Autores principales: Cao, Wei, Wu, Lu-Yun, Xia, Xia-Yu, Chen, Xiang, Wang, Zhi-Xin, Pan, Xian-Ming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10662474/
https://www.ncbi.nlm.nih.gov/pubmed/37985846
http://dx.doi.org/10.1038/s41598-023-47496-9
Descripción
Sumario:Because of the limited effectiveness of prevailing phylogenetic methods when applied to highly divergent protein sequences, the phylogenetic analysis problem remains challenging. Here, we propose a sequence-based evolutionary distance algorithm termed sequence distance (SD), which innovatively incorporates site-to-site correlation within protein sequences into the distance estimation. In protein superfamilies, SD can effectively distinguish evolutionary relationships both within and between protein families, producing phylogenetic trees that closely align with those based on structural information, even with sequence identity less than 20%. SD is highly correlated with the similarity of the protein structure, and can calculate evolutionary distances for thousands of protein pairs within seconds using a single CPU, which is significantly faster than most protein structure prediction methods that demand high computational resources and long run times. The development of SD will significantly advance phylogenetics, providing researchers with a more accurate and reliable tool for exploring evolutionary relationships.