Cargando…

ClipKIT: A multiple sequence alignment trimming software for accurate phylogenomic inference

Highly divergent sites in multiple sequence alignments (MSAs), which can stem from erroneous inference of homology and saturation of substitutions, are thought to negatively impact phylogenetic inference. Thus, several different trimming strategies have been developed for identifying and removing th...

Descripción completa

Detalles Bibliográficos
Autores principales: Steenwyk, Jacob L., Buida, Thomas J., Li, Yuanning, Shen, Xing-Xing, Rokas, Antonis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7735675/
https://www.ncbi.nlm.nih.gov/pubmed/33264284
http://dx.doi.org/10.1371/journal.pbio.3001007
_version_ 1783622682468679680
author Steenwyk, Jacob L.
Buida, Thomas J.
Li, Yuanning
Shen, Xing-Xing
Rokas, Antonis
author_facet Steenwyk, Jacob L.
Buida, Thomas J.
Li, Yuanning
Shen, Xing-Xing
Rokas, Antonis
author_sort Steenwyk, Jacob L.
collection PubMed
description Highly divergent sites in multiple sequence alignments (MSAs), which can stem from erroneous inference of homology and saturation of substitutions, are thought to negatively impact phylogenetic inference. Thus, several different trimming strategies have been developed for identifying and removing these sites prior to phylogenetic inference. However, a recent study reported that doing so can worsen inference, underscoring the need for alternative alignment trimming strategies. Here, we introduce ClipKIT, an alignment trimming software that, rather than identifying and removing putatively phylogenetically uninformative sites, instead aims to identify and retain parsimony-informative sites, which are known to be phylogenetically informative. To test the efficacy of ClipKIT, we examined the accuracy and support of phylogenies inferred from 14 different alignment trimming strategies, including those implemented in ClipKIT, across nearly 140,000 alignments from a broad sampling of evolutionary histories. Phylogenies inferred from ClipKIT-trimmed alignments are accurate, robust, and time saving. Furthermore, ClipKIT consistently outperformed other trimming methods across diverse datasets, suggesting that strategies based on identifying and retaining parsimony-informative sites provide a robust framework for alignment trimming.
format Online
Article
Text
id pubmed-7735675
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-77356752020-12-22 ClipKIT: A multiple sequence alignment trimming software for accurate phylogenomic inference Steenwyk, Jacob L. Buida, Thomas J. Li, Yuanning Shen, Xing-Xing Rokas, Antonis PLoS Biol Methods and Resources Highly divergent sites in multiple sequence alignments (MSAs), which can stem from erroneous inference of homology and saturation of substitutions, are thought to negatively impact phylogenetic inference. Thus, several different trimming strategies have been developed for identifying and removing these sites prior to phylogenetic inference. However, a recent study reported that doing so can worsen inference, underscoring the need for alternative alignment trimming strategies. Here, we introduce ClipKIT, an alignment trimming software that, rather than identifying and removing putatively phylogenetically uninformative sites, instead aims to identify and retain parsimony-informative sites, which are known to be phylogenetically informative. To test the efficacy of ClipKIT, we examined the accuracy and support of phylogenies inferred from 14 different alignment trimming strategies, including those implemented in ClipKIT, across nearly 140,000 alignments from a broad sampling of evolutionary histories. Phylogenies inferred from ClipKIT-trimmed alignments are accurate, robust, and time saving. Furthermore, ClipKIT consistently outperformed other trimming methods across diverse datasets, suggesting that strategies based on identifying and retaining parsimony-informative sites provide a robust framework for alignment trimming. Public Library of Science 2020-12-02 /pmc/articles/PMC7735675/ /pubmed/33264284 http://dx.doi.org/10.1371/journal.pbio.3001007 Text en © 2020 Steenwyk et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Methods and Resources
Steenwyk, Jacob L.
Buida, Thomas J.
Li, Yuanning
Shen, Xing-Xing
Rokas, Antonis
ClipKIT: A multiple sequence alignment trimming software for accurate phylogenomic inference
title ClipKIT: A multiple sequence alignment trimming software for accurate phylogenomic inference
title_full ClipKIT: A multiple sequence alignment trimming software for accurate phylogenomic inference
title_fullStr ClipKIT: A multiple sequence alignment trimming software for accurate phylogenomic inference
title_full_unstemmed ClipKIT: A multiple sequence alignment trimming software for accurate phylogenomic inference
title_short ClipKIT: A multiple sequence alignment trimming software for accurate phylogenomic inference
title_sort clipkit: a multiple sequence alignment trimming software for accurate phylogenomic inference
topic Methods and Resources
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7735675/
https://www.ncbi.nlm.nih.gov/pubmed/33264284
http://dx.doi.org/10.1371/journal.pbio.3001007
work_keys_str_mv AT steenwykjacobl clipkitamultiplesequencealignmenttrimmingsoftwareforaccuratephylogenomicinference
AT buidathomasj clipkitamultiplesequencealignmenttrimmingsoftwareforaccuratephylogenomicinference
AT liyuanning clipkitamultiplesequencealignmenttrimmingsoftwareforaccuratephylogenomicinference
AT shenxingxing clipkitamultiplesequencealignmenttrimmingsoftwareforaccuratephylogenomicinference
AT rokasantonis clipkitamultiplesequencealignmenttrimmingsoftwareforaccuratephylogenomicinference