Cargando…
Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees
BACKGROUND: Methods of alignment masking, which refers to the technique of excluding alignment blocks prior to tree reconstructions, have been successful in improving the signal-to-noise ratio in sequence alignments. However, the lack of formally well defined methods to identify randomness in sequen...
Autores principales: | , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2867768/ https://www.ncbi.nlm.nih.gov/pubmed/20356385 http://dx.doi.org/10.1186/1742-9994-7-10 |
_version_ | 1782180990608736256 |
---|---|
author | Kück, Patrick Meusemann, Karen Dambach, Johannes Thormann, Birthe von Reumont, Björn M Wägele, Johann W Misof, Bernhard |
author_facet | Kück, Patrick Meusemann, Karen Dambach, Johannes Thormann, Birthe von Reumont, Björn M Wägele, Johann W Misof, Bernhard |
author_sort | Kück, Patrick |
collection | PubMed |
description | BACKGROUND: Methods of alignment masking, which refers to the technique of excluding alignment blocks prior to tree reconstructions, have been successful in improving the signal-to-noise ratio in sequence alignments. However, the lack of formally well defined methods to identify randomness in sequence alignments has prevented a routine application of alignment masking. In this study, we compared the effects on tree reconstructions of the most commonly used profiling method (GBLOCKS) which uses a predefined set of rules in combination with alignment masking, with a new profiling approach (ALISCORE) based on Monte Carlo resampling within a sliding window, using different data sets and alignment methods. While the GBLOCKS approach excludes variable sections above a certain threshold which choice is left arbitrary, the ALISCORE algorithm is free of a priori rating of parameter space and therefore more objective. RESULTS: ALISCORE was successfully extended to amino acids using a proportional model and empirical substitution matrices to score randomness in multiple sequence alignments. A complex bootstrap resampling leads to an even distribution of scores of randomly similar sequences to assess randomness of the observed sequence similarity. Testing performance on real data, both masking methods, GBLOCKS and ALISCORE, helped to improve tree resolution. The sliding window approach was less sensitive to different alignments of identical data sets and performed equally well on all data sets. Concurrently, ALISCORE is capable of dealing with different substitution patterns and heterogeneous base composition. ALISCORE and the most relaxed GBLOCKS gap parameter setting performed best on all data sets. Correspondingly, Neighbor-Net analyses showed the most decrease in conflict. CONCLUSIONS: Alignment masking improves signal-to-noise ratio in multiple sequence alignments prior to phylogenetic reconstruction. Given the robust performance of alignment profiling, alignment masking should routinely be used to improve tree reconstructions. Parametric methods of alignment profiling can be easily extended to more complex likelihood based models of sequence evolution which opens the possibility of further improvements. |
format | Text |
id | pubmed-2867768 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-28677682010-05-12 Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees Kück, Patrick Meusemann, Karen Dambach, Johannes Thormann, Birthe von Reumont, Björn M Wägele, Johann W Misof, Bernhard Front Zool Methodology BACKGROUND: Methods of alignment masking, which refers to the technique of excluding alignment blocks prior to tree reconstructions, have been successful in improving the signal-to-noise ratio in sequence alignments. However, the lack of formally well defined methods to identify randomness in sequence alignments has prevented a routine application of alignment masking. In this study, we compared the effects on tree reconstructions of the most commonly used profiling method (GBLOCKS) which uses a predefined set of rules in combination with alignment masking, with a new profiling approach (ALISCORE) based on Monte Carlo resampling within a sliding window, using different data sets and alignment methods. While the GBLOCKS approach excludes variable sections above a certain threshold which choice is left arbitrary, the ALISCORE algorithm is free of a priori rating of parameter space and therefore more objective. RESULTS: ALISCORE was successfully extended to amino acids using a proportional model and empirical substitution matrices to score randomness in multiple sequence alignments. A complex bootstrap resampling leads to an even distribution of scores of randomly similar sequences to assess randomness of the observed sequence similarity. Testing performance on real data, both masking methods, GBLOCKS and ALISCORE, helped to improve tree resolution. The sliding window approach was less sensitive to different alignments of identical data sets and performed equally well on all data sets. Concurrently, ALISCORE is capable of dealing with different substitution patterns and heterogeneous base composition. ALISCORE and the most relaxed GBLOCKS gap parameter setting performed best on all data sets. Correspondingly, Neighbor-Net analyses showed the most decrease in conflict. CONCLUSIONS: Alignment masking improves signal-to-noise ratio in multiple sequence alignments prior to phylogenetic reconstruction. Given the robust performance of alignment profiling, alignment masking should routinely be used to improve tree reconstructions. Parametric methods of alignment profiling can be easily extended to more complex likelihood based models of sequence evolution which opens the possibility of further improvements. BioMed Central 2010-03-31 /pmc/articles/PMC2867768/ /pubmed/20356385 http://dx.doi.org/10.1186/1742-9994-7-10 Text en Copyright ©2010 Kück et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Kück, Patrick Meusemann, Karen Dambach, Johannes Thormann, Birthe von Reumont, Björn M Wägele, Johann W Misof, Bernhard Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees |
title | Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees |
title_full | Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees |
title_fullStr | Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees |
title_full_unstemmed | Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees |
title_short | Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees |
title_sort | parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2867768/ https://www.ncbi.nlm.nih.gov/pubmed/20356385 http://dx.doi.org/10.1186/1742-9994-7-10 |
work_keys_str_mv | AT kuckpatrick parametricandnonparametricmaskingofrandomnessinsequencealignmentscanbeimprovedandleadstobetterresolvedtrees AT meusemannkaren parametricandnonparametricmaskingofrandomnessinsequencealignmentscanbeimprovedandleadstobetterresolvedtrees AT dambachjohannes parametricandnonparametricmaskingofrandomnessinsequencealignmentscanbeimprovedandleadstobetterresolvedtrees AT thormannbirthe parametricandnonparametricmaskingofrandomnessinsequencealignmentscanbeimprovedandleadstobetterresolvedtrees AT vonreumontbjornm parametricandnonparametricmaskingofrandomnessinsequencealignmentscanbeimprovedandleadstobetterresolvedtrees AT wagelejohannw parametricandnonparametricmaskingofrandomnessinsequencealignmentscanbeimprovedandleadstobetterresolvedtrees AT misofbernhard parametricandnonparametricmaskingofrandomnessinsequencealignmentscanbeimprovedandleadstobetterresolvedtrees |