Cargando…

SWAMP: Sliding Window Alignment Masker for PAML

With the greater availability of genetic data, large genome-wide scans for positive selection increasingly incorporate data from a range of sources. These data sets may be derived from different sequencing methods, each of which has potential sources of error. Sequencing errors, compounded by alignm...

Descripción completa

Detalles Bibliográficos
Autores principales: Harrison, Peter W, Jordan, Gregory E, Montgomery, Stephen H
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Libertas Academica 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4251194/
https://www.ncbi.nlm.nih.gov/pubmed/25525323
http://dx.doi.org/10.4137/EBO.S18193
_version_ 1782347013594021888
author Harrison, Peter W
Jordan, Gregory E
Montgomery, Stephen H
author_facet Harrison, Peter W
Jordan, Gregory E
Montgomery, Stephen H
author_sort Harrison, Peter W
collection PubMed
description With the greater availability of genetic data, large genome-wide scans for positive selection increasingly incorporate data from a range of sources. These data sets may be derived from different sequencing methods, each of which has potential sources of error. Sequencing errors, compounded by alignment errors, greatly increase the number of false positives in tests for adaptive evolution. Genome-wide analyses often fail to fully address these issues or to provide sufficient detail on postalignment masking/filtering. Here, we introduce a Sliding Window Alignment Masker for Phylogenetic Analysis by Maximum Likelihood (SWAMP) that scans multiple-sequence alignments for short regions enriched with unreasonably high rates of nonsynonymous substitutions caused, for example, by sequence or alignment errors. SWAMP prevents their inclusion in downstream evolutionary analyses and therefore increases the reliability of downstream analyses. It is able to effectively mask short stretches of erroneous sequence, particularly prevalent in low-coverage genomes, which may not be detected by existing methods based on filtering by sitewise conservation or alignment confidence. SWAMP offers a flexible masking approach, and the user can apply different masking regimens to specific branches or sequences in the phylogeny allowing the stringency of masking to vary according to branch length, expected divergence levels, or assembly quality. We exemplify SWAMPs effectiveness on a dataset of 6,379 protein-coding genes from primate species, including data of variable quality. Full reporting of the software parameters will further improve the reproducibility of genome-wide analyses, as well as reduce false-positive rates.
format Online
Article
Text
id pubmed-4251194
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Libertas Academica
record_format MEDLINE/PubMed
spelling pubmed-42511942014-12-18 SWAMP: Sliding Window Alignment Masker for PAML Harrison, Peter W Jordan, Gregory E Montgomery, Stephen H Evol Bioinform Online Technical Advance With the greater availability of genetic data, large genome-wide scans for positive selection increasingly incorporate data from a range of sources. These data sets may be derived from different sequencing methods, each of which has potential sources of error. Sequencing errors, compounded by alignment errors, greatly increase the number of false positives in tests for adaptive evolution. Genome-wide analyses often fail to fully address these issues or to provide sufficient detail on postalignment masking/filtering. Here, we introduce a Sliding Window Alignment Masker for Phylogenetic Analysis by Maximum Likelihood (SWAMP) that scans multiple-sequence alignments for short regions enriched with unreasonably high rates of nonsynonymous substitutions caused, for example, by sequence or alignment errors. SWAMP prevents their inclusion in downstream evolutionary analyses and therefore increases the reliability of downstream analyses. It is able to effectively mask short stretches of erroneous sequence, particularly prevalent in low-coverage genomes, which may not be detected by existing methods based on filtering by sitewise conservation or alignment confidence. SWAMP offers a flexible masking approach, and the user can apply different masking regimens to specific branches or sequences in the phylogeny allowing the stringency of masking to vary according to branch length, expected divergence levels, or assembly quality. We exemplify SWAMPs effectiveness on a dataset of 6,379 protein-coding genes from primate species, including data of variable quality. Full reporting of the software parameters will further improve the reproducibility of genome-wide analyses, as well as reduce false-positive rates. Libertas Academica 2014-12-01 /pmc/articles/PMC4251194/ /pubmed/25525323 http://dx.doi.org/10.4137/EBO.S18193 Text en © 2014 the author(s), publisher and licensee Libertas Academica Ltd. This is an open-access article distributed under the terms of the Creative Commons CC-BY-NC 3.0 License.
spellingShingle Technical Advance
Harrison, Peter W
Jordan, Gregory E
Montgomery, Stephen H
SWAMP: Sliding Window Alignment Masker for PAML
title SWAMP: Sliding Window Alignment Masker for PAML
title_full SWAMP: Sliding Window Alignment Masker for PAML
title_fullStr SWAMP: Sliding Window Alignment Masker for PAML
title_full_unstemmed SWAMP: Sliding Window Alignment Masker for PAML
title_short SWAMP: Sliding Window Alignment Masker for PAML
title_sort swamp: sliding window alignment masker for paml
topic Technical Advance
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4251194/
https://www.ncbi.nlm.nih.gov/pubmed/25525323
http://dx.doi.org/10.4137/EBO.S18193
work_keys_str_mv AT harrisonpeterw swampslidingwindowalignmentmaskerforpaml
AT jordangregorye swampslidingwindowalignmentmaskerforpaml
AT montgomerystephenh swampslidingwindowalignmentmaskerforpaml