Cargando…

AnchorWave: Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication

Millions of species are currently being sequenced, and their genomes are being compared. Many of them have more complex genomes than model systems and raise novel challenges for genome alignment. Widely used local alignment strategies often produce limited or incongruous results when applied to geno...

Descripción completa

Detalles Bibliográficos
Autores principales: Song, Baoxing, Marco-Sola, Santiago, Moreto, Miquel, Johnson, Lynn, Buckler, Edward S., Stitzer, Michelle C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: National Academy of Sciences 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8740769/
https://www.ncbi.nlm.nih.gov/pubmed/34934012
http://dx.doi.org/10.1073/pnas.2113075119
_version_ 1784629370772520960
author Song, Baoxing
Marco-Sola, Santiago
Moreto, Miquel
Johnson, Lynn
Buckler, Edward S.
Stitzer, Michelle C.
author_facet Song, Baoxing
Marco-Sola, Santiago
Moreto, Miquel
Johnson, Lynn
Buckler, Edward S.
Stitzer, Michelle C.
author_sort Song, Baoxing
collection PubMed
description Millions of species are currently being sequenced, and their genomes are being compared. Many of them have more complex genomes than model systems and raise novel challenges for genome alignment. Widely used local alignment strategies often produce limited or incongruous results when applied to genomes with dispersed repeats, long indels, and highly diverse sequences. Moreover, alignment using many-to-many or reciprocal best hit approaches conflicts with well-studied patterns between species with different rounds of whole-genome duplication. Here, we introduce Anchored Wavefront alignment (AnchorWave), which performs whole-genome duplication–informed collinear anchor identification between genomes and performs base pair–resolved global alignment for collinear blocks using a two-piece affine gap cost strategy. This strategy enables AnchorWave to precisely identify multikilobase indels generated by transposable element (TE) presence/absence variants (PAVs). When aligning two maize genomes, AnchorWave successfully recalled 87% of previously reported TE PAVs. By contrast, other genome alignment tools showed low power for TE PAV recall. AnchorWave precisely aligns up to three times more of the genome as position matches or indels than the closest competitive approach when comparing diverse genomes. Moreover, AnchorWave recalls transcription factor–binding sites at a rate of 1.05- to 74.85-fold higher than other tools with significantly lower false-positive alignments. AnchorWave complements available genome alignment tools by showing obvious improvement when applied to genomes with dispersed repeats, active TEs, high sequence diversity, and whole-genome duplication variation.
format Online
Article
Text
id pubmed-8740769
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher National Academy of Sciences
record_format MEDLINE/PubMed
spelling pubmed-87407692022-06-21 AnchorWave: Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication Song, Baoxing Marco-Sola, Santiago Moreto, Miquel Johnson, Lynn Buckler, Edward S. Stitzer, Michelle C. Proc Natl Acad Sci U S A Biological Sciences Millions of species are currently being sequenced, and their genomes are being compared. Many of them have more complex genomes than model systems and raise novel challenges for genome alignment. Widely used local alignment strategies often produce limited or incongruous results when applied to genomes with dispersed repeats, long indels, and highly diverse sequences. Moreover, alignment using many-to-many or reciprocal best hit approaches conflicts with well-studied patterns between species with different rounds of whole-genome duplication. Here, we introduce Anchored Wavefront alignment (AnchorWave), which performs whole-genome duplication–informed collinear anchor identification between genomes and performs base pair–resolved global alignment for collinear blocks using a two-piece affine gap cost strategy. This strategy enables AnchorWave to precisely identify multikilobase indels generated by transposable element (TE) presence/absence variants (PAVs). When aligning two maize genomes, AnchorWave successfully recalled 87% of previously reported TE PAVs. By contrast, other genome alignment tools showed low power for TE PAV recall. AnchorWave precisely aligns up to three times more of the genome as position matches or indels than the closest competitive approach when comparing diverse genomes. Moreover, AnchorWave recalls transcription factor–binding sites at a rate of 1.05- to 74.85-fold higher than other tools with significantly lower false-positive alignments. AnchorWave complements available genome alignment tools by showing obvious improvement when applied to genomes with dispersed repeats, active TEs, high sequence diversity, and whole-genome duplication variation. National Academy of Sciences 2021-12-21 2022-01-04 /pmc/articles/PMC8740769/ /pubmed/34934012 http://dx.doi.org/10.1073/pnas.2113075119 Text en Copyright © 2021 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/This article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) .
spellingShingle Biological Sciences
Song, Baoxing
Marco-Sola, Santiago
Moreto, Miquel
Johnson, Lynn
Buckler, Edward S.
Stitzer, Michelle C.
AnchorWave: Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication
title AnchorWave: Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication
title_full AnchorWave: Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication
title_fullStr AnchorWave: Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication
title_full_unstemmed AnchorWave: Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication
title_short AnchorWave: Sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication
title_sort anchorwave: sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication
topic Biological Sciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8740769/
https://www.ncbi.nlm.nih.gov/pubmed/34934012
http://dx.doi.org/10.1073/pnas.2113075119
work_keys_str_mv AT songbaoxing anchorwavesensitivealignmentofgenomeswithhighsequencediversityextensivestructuralpolymorphismandwholegenomeduplication
AT marcosolasantiago anchorwavesensitivealignmentofgenomeswithhighsequencediversityextensivestructuralpolymorphismandwholegenomeduplication
AT moretomiquel anchorwavesensitivealignmentofgenomeswithhighsequencediversityextensivestructuralpolymorphismandwholegenomeduplication
AT johnsonlynn anchorwavesensitivealignmentofgenomeswithhighsequencediversityextensivestructuralpolymorphismandwholegenomeduplication
AT buckleredwards anchorwavesensitivealignmentofgenomeswithhighsequencediversityextensivestructuralpolymorphismandwholegenomeduplication
AT stitzermichellec anchorwavesensitivealignmentofgenomeswithhighsequencediversityextensivestructuralpolymorphismandwholegenomeduplication