Cargando…

Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference

Phylogenetic inference is generally performed on the basis of multiple sequence alignments (MSA). Because errors in an alignment can lead to errors in tree estimation, there is a strong interest in identifying and removing unreliable parts of the alignment. In recent years several automated filterin...

Descripción completa

Detalles Bibliográficos
Autores principales: Tan, Ge, Muffato, Matthieu, Ledergerber, Christian, Herrero, Javier, Goldman, Nick, Gil, Manuel, Dessimoz, Christophe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4538881/
https://www.ncbi.nlm.nih.gov/pubmed/26031838
http://dx.doi.org/10.1093/sysbio/syv033
_version_ 1782386049223229440
author Tan, Ge
Muffato, Matthieu
Ledergerber, Christian
Herrero, Javier
Goldman, Nick
Gil, Manuel
Dessimoz, Christophe
author_facet Tan, Ge
Muffato, Matthieu
Ledergerber, Christian
Herrero, Javier
Goldman, Nick
Gil, Manuel
Dessimoz, Christophe
author_sort Tan, Ge
collection PubMed
description Phylogenetic inference is generally performed on the basis of multiple sequence alignments (MSA). Because errors in an alignment can lead to errors in tree estimation, there is a strong interest in identifying and removing unreliable parts of the alignment. In recent years several automated filtering approaches have been proposed, but despite their popularity, a systematic and comprehensive comparison of different alignment filtering methods on real data has been lacking. Here, we extend and apply recently introduced phylogenetic tests of alignment accuracy on a large number of gene families and contrast the performance of unfiltered versus filtered alignments in the context of single-gene phylogeny reconstruction. Based on multiple genome-wide empirical and simulated data sets, we show that the trees obtained from filtered MSAs are on average worse than those obtained from unfiltered MSAs. Furthermore, alignment filtering often leads to an increase in the proportion of well-supported branches that are actually wrong. We confirm that our findings hold for a wide range of parameters and methods. Although our results suggest that light filtering (up to 20% of alignment positions) has little impact on tree accuracy and may save some computation time, contrary to widespread practice, we do not generally recommend the use of current alignment filtering methods for phylogenetic inference. By providing a way to rigorously and systematically measure the impact of filtering on alignments, the methodology set forth here will guide the development of better filtering algorithms.
format Online
Article
Text
id pubmed-4538881
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-45388812015-08-18 Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference Tan, Ge Muffato, Matthieu Ledergerber, Christian Herrero, Javier Goldman, Nick Gil, Manuel Dessimoz, Christophe Syst Biol Regular Articles Phylogenetic inference is generally performed on the basis of multiple sequence alignments (MSA). Because errors in an alignment can lead to errors in tree estimation, there is a strong interest in identifying and removing unreliable parts of the alignment. In recent years several automated filtering approaches have been proposed, but despite their popularity, a systematic and comprehensive comparison of different alignment filtering methods on real data has been lacking. Here, we extend and apply recently introduced phylogenetic tests of alignment accuracy on a large number of gene families and contrast the performance of unfiltered versus filtered alignments in the context of single-gene phylogeny reconstruction. Based on multiple genome-wide empirical and simulated data sets, we show that the trees obtained from filtered MSAs are on average worse than those obtained from unfiltered MSAs. Furthermore, alignment filtering often leads to an increase in the proportion of well-supported branches that are actually wrong. We confirm that our findings hold for a wide range of parameters and methods. Although our results suggest that light filtering (up to 20% of alignment positions) has little impact on tree accuracy and may save some computation time, contrary to widespread practice, we do not generally recommend the use of current alignment filtering methods for phylogenetic inference. By providing a way to rigorously and systematically measure the impact of filtering on alignments, the methodology set forth here will guide the development of better filtering algorithms. Oxford University Press 2015-09 2015-06-01 /pmc/articles/PMC4538881/ /pubmed/26031838 http://dx.doi.org/10.1093/sysbio/syv033 Text en © The Author(s) 2015. Published by Oxford University Press on behalf of the Society of Systematic Biologists. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Regular Articles
Tan, Ge
Muffato, Matthieu
Ledergerber, Christian
Herrero, Javier
Goldman, Nick
Gil, Manuel
Dessimoz, Christophe
Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference
title Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference
title_full Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference
title_fullStr Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference
title_full_unstemmed Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference
title_short Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference
title_sort current methods for automated filtering of multiple sequence alignments frequently worsen single-gene phylogenetic inference
topic Regular Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4538881/
https://www.ncbi.nlm.nih.gov/pubmed/26031838
http://dx.doi.org/10.1093/sysbio/syv033
work_keys_str_mv AT tange currentmethodsforautomatedfilteringofmultiplesequencealignmentsfrequentlyworsensinglegenephylogeneticinference
AT muffatomatthieu currentmethodsforautomatedfilteringofmultiplesequencealignmentsfrequentlyworsensinglegenephylogeneticinference
AT ledergerberchristian currentmethodsforautomatedfilteringofmultiplesequencealignmentsfrequentlyworsensinglegenephylogeneticinference
AT herrerojavier currentmethodsforautomatedfilteringofmultiplesequencealignmentsfrequentlyworsensinglegenephylogeneticinference
AT goldmannick currentmethodsforautomatedfilteringofmultiplesequencealignmentsfrequentlyworsensinglegenephylogeneticinference
AT gilmanuel currentmethodsforautomatedfilteringofmultiplesequencealignmentsfrequentlyworsensinglegenephylogeneticinference
AT dessimozchristophe currentmethodsforautomatedfilteringofmultiplesequencealignmentsfrequentlyworsensinglegenephylogeneticinference