Cargando…

Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees

BACKGROUND: Methods of alignment masking, which refers to the technique of excluding alignment blocks prior to tree reconstructions, have been successful in improving the signal-to-noise ratio in sequence alignments. However, the lack of formally well defined methods to identify randomness in sequen...

Descripción completa

Detalles Bibliográficos
Autores principales: Kück, Patrick, Meusemann, Karen, Dambach, Johannes, Thormann, Birthe, von Reumont, Björn M, Wägele, Johann W, Misof, Bernhard
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2867768/
https://www.ncbi.nlm.nih.gov/pubmed/20356385
http://dx.doi.org/10.1186/1742-9994-7-10
_version_ 1782180990608736256
author Kück, Patrick
Meusemann, Karen
Dambach, Johannes
Thormann, Birthe
von Reumont, Björn M
Wägele, Johann W
Misof, Bernhard
author_facet Kück, Patrick
Meusemann, Karen
Dambach, Johannes
Thormann, Birthe
von Reumont, Björn M
Wägele, Johann W
Misof, Bernhard
author_sort Kück, Patrick
collection PubMed
description BACKGROUND: Methods of alignment masking, which refers to the technique of excluding alignment blocks prior to tree reconstructions, have been successful in improving the signal-to-noise ratio in sequence alignments. However, the lack of formally well defined methods to identify randomness in sequence alignments has prevented a routine application of alignment masking. In this study, we compared the effects on tree reconstructions of the most commonly used profiling method (GBLOCKS) which uses a predefined set of rules in combination with alignment masking, with a new profiling approach (ALISCORE) based on Monte Carlo resampling within a sliding window, using different data sets and alignment methods. While the GBLOCKS approach excludes variable sections above a certain threshold which choice is left arbitrary, the ALISCORE algorithm is free of a priori rating of parameter space and therefore more objective. RESULTS: ALISCORE was successfully extended to amino acids using a proportional model and empirical substitution matrices to score randomness in multiple sequence alignments. A complex bootstrap resampling leads to an even distribution of scores of randomly similar sequences to assess randomness of the observed sequence similarity. Testing performance on real data, both masking methods, GBLOCKS and ALISCORE, helped to improve tree resolution. The sliding window approach was less sensitive to different alignments of identical data sets and performed equally well on all data sets. Concurrently, ALISCORE is capable of dealing with different substitution patterns and heterogeneous base composition. ALISCORE and the most relaxed GBLOCKS gap parameter setting performed best on all data sets. Correspondingly, Neighbor-Net analyses showed the most decrease in conflict. CONCLUSIONS: Alignment masking improves signal-to-noise ratio in multiple sequence alignments prior to phylogenetic reconstruction. Given the robust performance of alignment profiling, alignment masking should routinely be used to improve tree reconstructions. Parametric methods of alignment profiling can be easily extended to more complex likelihood based models of sequence evolution which opens the possibility of further improvements.
format Text
id pubmed-2867768
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28677682010-05-12 Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees Kück, Patrick Meusemann, Karen Dambach, Johannes Thormann, Birthe von Reumont, Björn M Wägele, Johann W Misof, Bernhard Front Zool Methodology BACKGROUND: Methods of alignment masking, which refers to the technique of excluding alignment blocks prior to tree reconstructions, have been successful in improving the signal-to-noise ratio in sequence alignments. However, the lack of formally well defined methods to identify randomness in sequence alignments has prevented a routine application of alignment masking. In this study, we compared the effects on tree reconstructions of the most commonly used profiling method (GBLOCKS) which uses a predefined set of rules in combination with alignment masking, with a new profiling approach (ALISCORE) based on Monte Carlo resampling within a sliding window, using different data sets and alignment methods. While the GBLOCKS approach excludes variable sections above a certain threshold which choice is left arbitrary, the ALISCORE algorithm is free of a priori rating of parameter space and therefore more objective. RESULTS: ALISCORE was successfully extended to amino acids using a proportional model and empirical substitution matrices to score randomness in multiple sequence alignments. A complex bootstrap resampling leads to an even distribution of scores of randomly similar sequences to assess randomness of the observed sequence similarity. Testing performance on real data, both masking methods, GBLOCKS and ALISCORE, helped to improve tree resolution. The sliding window approach was less sensitive to different alignments of identical data sets and performed equally well on all data sets. Concurrently, ALISCORE is capable of dealing with different substitution patterns and heterogeneous base composition. ALISCORE and the most relaxed GBLOCKS gap parameter setting performed best on all data sets. Correspondingly, Neighbor-Net analyses showed the most decrease in conflict. CONCLUSIONS: Alignment masking improves signal-to-noise ratio in multiple sequence alignments prior to phylogenetic reconstruction. Given the robust performance of alignment profiling, alignment masking should routinely be used to improve tree reconstructions. Parametric methods of alignment profiling can be easily extended to more complex likelihood based models of sequence evolution which opens the possibility of further improvements. BioMed Central 2010-03-31 /pmc/articles/PMC2867768/ /pubmed/20356385 http://dx.doi.org/10.1186/1742-9994-7-10 Text en Copyright ©2010 Kück et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology
Kück, Patrick
Meusemann, Karen
Dambach, Johannes
Thormann, Birthe
von Reumont, Björn M
Wägele, Johann W
Misof, Bernhard
Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees
title Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees
title_full Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees
title_fullStr Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees
title_full_unstemmed Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees
title_short Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees
title_sort parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2867768/
https://www.ncbi.nlm.nih.gov/pubmed/20356385
http://dx.doi.org/10.1186/1742-9994-7-10
work_keys_str_mv AT kuckpatrick parametricandnonparametricmaskingofrandomnessinsequencealignmentscanbeimprovedandleadstobetterresolvedtrees
AT meusemannkaren parametricandnonparametricmaskingofrandomnessinsequencealignmentscanbeimprovedandleadstobetterresolvedtrees
AT dambachjohannes parametricandnonparametricmaskingofrandomnessinsequencealignmentscanbeimprovedandleadstobetterresolvedtrees
AT thormannbirthe parametricandnonparametricmaskingofrandomnessinsequencealignmentscanbeimprovedandleadstobetterresolvedtrees
AT vonreumontbjornm parametricandnonparametricmaskingofrandomnessinsequencealignmentscanbeimprovedandleadstobetterresolvedtrees
AT wagelejohannw parametricandnonparametricmaskingofrandomnessinsequencealignmentscanbeimprovedandleadstobetterresolvedtrees
AT misofbernhard parametricandnonparametricmaskingofrandomnessinsequencealignmentscanbeimprovedandleadstobetterresolvedtrees