Cargando…
AnchorWave: Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication
Millions of species are currently being sequenced, and their genomes are being compared. Many of them have more complex genomes than model systems and raise novel challenges for genome alignment. Widely used local alignment strategies often produce limited or incongruous results when applied to geno...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
National Academy of Sciences
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8740769/ https://www.ncbi.nlm.nih.gov/pubmed/34934012 http://dx.doi.org/10.1073/pnas.2113075119 |
_version_ | 1784629370772520960 |
---|---|
author | Song, Baoxing Marco-Sola, Santiago Moreto, Miquel Johnson, Lynn Buckler, Edward S. Stitzer, Michelle C. |
author_facet | Song, Baoxing Marco-Sola, Santiago Moreto, Miquel Johnson, Lynn Buckler, Edward S. Stitzer, Michelle C. |
author_sort | Song, Baoxing |
collection | PubMed |
description | Millions of species are currently being sequenced, and their genomes are being compared. Many of them have more complex genomes than model systems and raise novel challenges for genome alignment. Widely used local alignment strategies often produce limited or incongruous results when applied to genomes with dispersed repeats, long indels, and highly diverse sequences. Moreover, alignment using many-to-many or reciprocal best hit approaches conflicts with well-studied patterns between species with different rounds of whole-genome duplication. Here, we introduce Anchored Wavefront alignment (AnchorWave), which performs whole-genome duplication–informed collinear anchor identification between genomes and performs base pair–resolved global alignment for collinear blocks using a two-piece affine gap cost strategy. This strategy enables AnchorWave to precisely identify multikilobase indels generated by transposable element (TE) presence/absence variants (PAVs). When aligning two maize genomes, AnchorWave successfully recalled 87% of previously reported TE PAVs. By contrast, other genome alignment tools showed low power for TE PAV recall. AnchorWave precisely aligns up to three times more of the genome as position matches or indels than the closest competitive approach when comparing diverse genomes. Moreover, AnchorWave recalls transcription factor–binding sites at a rate of 1.05- to 74.85-fold higher than other tools with significantly lower false-positive alignments. AnchorWave complements available genome alignment tools by showing obvious improvement when applied to genomes with dispersed repeats, active TEs, high sequence diversity, and whole-genome duplication variation. |
format | Online Article Text |
id | pubmed-8740769 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | National Academy of Sciences |
record_format | MEDLINE/PubMed |
spelling | pubmed-87407692022-06-21 AnchorWave: Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication Song, Baoxing Marco-Sola, Santiago Moreto, Miquel Johnson, Lynn Buckler, Edward S. Stitzer, Michelle C. Proc Natl Acad Sci U S A Biological Sciences Millions of species are currently being sequenced, and their genomes are being compared. Many of them have more complex genomes than model systems and raise novel challenges for genome alignment. Widely used local alignment strategies often produce limited or incongruous results when applied to genomes with dispersed repeats, long indels, and highly diverse sequences. Moreover, alignment using many-to-many or reciprocal best hit approaches conflicts with well-studied patterns between species with different rounds of whole-genome duplication. Here, we introduce Anchored Wavefront alignment (AnchorWave), which performs whole-genome duplication–informed collinear anchor identification between genomes and performs base pair–resolved global alignment for collinear blocks using a two-piece affine gap cost strategy. This strategy enables AnchorWave to precisely identify multikilobase indels generated by transposable element (TE) presence/absence variants (PAVs). When aligning two maize genomes, AnchorWave successfully recalled 87% of previously reported TE PAVs. By contrast, other genome alignment tools showed low power for TE PAV recall. AnchorWave precisely aligns up to three times more of the genome as position matches or indels than the closest competitive approach when comparing diverse genomes. Moreover, AnchorWave recalls transcription factor–binding sites at a rate of 1.05- to 74.85-fold higher than other tools with significantly lower false-positive alignments. AnchorWave complements available genome alignment tools by showing obvious improvement when applied to genomes with dispersed repeats, active TEs, high sequence diversity, and whole-genome duplication variation. National Academy of Sciences 2021-12-21 2022-01-04 /pmc/articles/PMC8740769/ /pubmed/34934012 http://dx.doi.org/10.1073/pnas.2113075119 Text en Copyright © 2021 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/This article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) . |
spellingShingle | Biological Sciences Song, Baoxing Marco-Sola, Santiago Moreto, Miquel Johnson, Lynn Buckler, Edward S. Stitzer, Michelle C. AnchorWave: Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication |
title | AnchorWave: Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication |
title_full | AnchorWave: Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication |
title_fullStr | AnchorWave: Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication |
title_full_unstemmed | AnchorWave: Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication |
title_short | AnchorWave: Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication |
title_sort | anchorwave: sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication |
topic | Biological Sciences |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8740769/ https://www.ncbi.nlm.nih.gov/pubmed/34934012 http://dx.doi.org/10.1073/pnas.2113075119 |
work_keys_str_mv | AT songbaoxing anchorwavesensitivealignmentofgenomeswithhighsequencediversityextensivestructuralpolymorphismandwholegenomeduplication AT marcosolasantiago anchorwavesensitivealignmentofgenomeswithhighsequencediversityextensivestructuralpolymorphismandwholegenomeduplication AT moretomiquel anchorwavesensitivealignmentofgenomeswithhighsequencediversityextensivestructuralpolymorphismandwholegenomeduplication AT johnsonlynn anchorwavesensitivealignmentofgenomeswithhighsequencediversityextensivestructuralpolymorphismandwholegenomeduplication AT buckleredwards anchorwavesensitivealignmentofgenomeswithhighsequencediversityextensivestructuralpolymorphismandwholegenomeduplication AT stitzermichellec anchorwavesensitivealignmentofgenomeswithhighsequencediversityextensivestructuralpolymorphismandwholegenomeduplication |