Cargando…

Recovering individual haplotypes and a contiguous genome assembly from pooled long-read sequencing of the diamondback moth (Lepidoptera: Plutellidae)

The assembly of divergent haplotypes using noisy long-read data presents a challenge to the reconstruction of haploid genome assemblies, due to overlapping distributions of technical sequencing error, intralocus genetic variation, and interlocus similarity within these data. Here, we present a compa...

Descripción completa

Detalles Bibliográficos
Autores principales: Whiteford, Samuel, van’t Hof, Arjen E, Krishna, Ritesh, Marubbi, Thea, Widdison, Stephanie, Saccheri, Ilik J, Guest, Marcus, Morrison, Neil I, Darby, Alistair C
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9526047/
https://www.ncbi.nlm.nih.gov/pubmed/35980174
http://dx.doi.org/10.1093/g3journal/jkac210
_version_ 1784800793454444544
author Whiteford, Samuel
van’t Hof, Arjen E
Krishna, Ritesh
Marubbi, Thea
Widdison, Stephanie
Saccheri, Ilik J
Guest, Marcus
Morrison, Neil I
Darby, Alistair C
author_facet Whiteford, Samuel
van’t Hof, Arjen E
Krishna, Ritesh
Marubbi, Thea
Widdison, Stephanie
Saccheri, Ilik J
Guest, Marcus
Morrison, Neil I
Darby, Alistair C
author_sort Whiteford, Samuel
collection PubMed
description The assembly of divergent haplotypes using noisy long-read data presents a challenge to the reconstruction of haploid genome assemblies, due to overlapping distributions of technical sequencing error, intralocus genetic variation, and interlocus similarity within these data. Here, we present a comparative analysis of assembly algorithms representing overlap-layout-consensus, repeat graph, and de Bruijn graph methods. We examine how postprocessing strategies attempting to reduce redundant heterozygosity interact with the choice of initial assembly algorithm and ultimately produce a series of chromosome-level assemblies for an agricultural pest, the diamondback moth, Plutella xylostella (L.). We compare evaluation methods and show that BUSCO analyses may overestimate haplotig removal processing in long-read draft genomes, in comparison to a k-mer method. We discuss the trade-offs inherent in assembly algorithm and curation choices and suggest that “best practice” is research question dependent. We demonstrate a link between allelic divergence and allele-derived contig redundancy in final genome assemblies and document the patterns of coding and noncoding diversity between redundant sequences. We also document a link between an excess of nonsynonymous polymorphism and haplotigs that are unresolved by assembly or postassembly algorithms. Finally, we discuss how this phenomenon may have relevance for the usage of noisy long-read genome assemblies in comparative genomics.
format Online
Article
Text
id pubmed-9526047
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-95260472022-10-03 Recovering individual haplotypes and a contiguous genome assembly from pooled long-read sequencing of the diamondback moth (Lepidoptera: Plutellidae) Whiteford, Samuel van’t Hof, Arjen E Krishna, Ritesh Marubbi, Thea Widdison, Stephanie Saccheri, Ilik J Guest, Marcus Morrison, Neil I Darby, Alistair C G3 (Bethesda) Investigation The assembly of divergent haplotypes using noisy long-read data presents a challenge to the reconstruction of haploid genome assemblies, due to overlapping distributions of technical sequencing error, intralocus genetic variation, and interlocus similarity within these data. Here, we present a comparative analysis of assembly algorithms representing overlap-layout-consensus, repeat graph, and de Bruijn graph methods. We examine how postprocessing strategies attempting to reduce redundant heterozygosity interact with the choice of initial assembly algorithm and ultimately produce a series of chromosome-level assemblies for an agricultural pest, the diamondback moth, Plutella xylostella (L.). We compare evaluation methods and show that BUSCO analyses may overestimate haplotig removal processing in long-read draft genomes, in comparison to a k-mer method. We discuss the trade-offs inherent in assembly algorithm and curation choices and suggest that “best practice” is research question dependent. We demonstrate a link between allelic divergence and allele-derived contig redundancy in final genome assemblies and document the patterns of coding and noncoding diversity between redundant sequences. We also document a link between an excess of nonsynonymous polymorphism and haplotigs that are unresolved by assembly or postassembly algorithms. Finally, we discuss how this phenomenon may have relevance for the usage of noisy long-read genome assemblies in comparative genomics. Oxford University Press 2022-08-18 /pmc/articles/PMC9526047/ /pubmed/35980174 http://dx.doi.org/10.1093/g3journal/jkac210 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Genetics Society of America. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Investigation
Whiteford, Samuel
van’t Hof, Arjen E
Krishna, Ritesh
Marubbi, Thea
Widdison, Stephanie
Saccheri, Ilik J
Guest, Marcus
Morrison, Neil I
Darby, Alistair C
Recovering individual haplotypes and a contiguous genome assembly from pooled long-read sequencing of the diamondback moth (Lepidoptera: Plutellidae)
title Recovering individual haplotypes and a contiguous genome assembly from pooled long-read sequencing of the diamondback moth (Lepidoptera: Plutellidae)
title_full Recovering individual haplotypes and a contiguous genome assembly from pooled long-read sequencing of the diamondback moth (Lepidoptera: Plutellidae)
title_fullStr Recovering individual haplotypes and a contiguous genome assembly from pooled long-read sequencing of the diamondback moth (Lepidoptera: Plutellidae)
title_full_unstemmed Recovering individual haplotypes and a contiguous genome assembly from pooled long-read sequencing of the diamondback moth (Lepidoptera: Plutellidae)
title_short Recovering individual haplotypes and a contiguous genome assembly from pooled long-read sequencing of the diamondback moth (Lepidoptera: Plutellidae)
title_sort recovering individual haplotypes and a contiguous genome assembly from pooled long-read sequencing of the diamondback moth (lepidoptera: plutellidae)
topic Investigation
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9526047/
https://www.ncbi.nlm.nih.gov/pubmed/35980174
http://dx.doi.org/10.1093/g3journal/jkac210
work_keys_str_mv AT whitefordsamuel recoveringindividualhaplotypesandacontiguousgenomeassemblyfrompooledlongreadsequencingofthediamondbackmothlepidopteraplutellidae
AT vanthofarjene recoveringindividualhaplotypesandacontiguousgenomeassemblyfrompooledlongreadsequencingofthediamondbackmothlepidopteraplutellidae
AT krishnaritesh recoveringindividualhaplotypesandacontiguousgenomeassemblyfrompooledlongreadsequencingofthediamondbackmothlepidopteraplutellidae
AT marubbithea recoveringindividualhaplotypesandacontiguousgenomeassemblyfrompooledlongreadsequencingofthediamondbackmothlepidopteraplutellidae
AT widdisonstephanie recoveringindividualhaplotypesandacontiguousgenomeassemblyfrompooledlongreadsequencingofthediamondbackmothlepidopteraplutellidae
AT saccheriilikj recoveringindividualhaplotypesandacontiguousgenomeassemblyfrompooledlongreadsequencingofthediamondbackmothlepidopteraplutellidae
AT guestmarcus recoveringindividualhaplotypesandacontiguousgenomeassemblyfrompooledlongreadsequencingofthediamondbackmothlepidopteraplutellidae
AT morrisonneili recoveringindividualhaplotypesandacontiguousgenomeassemblyfrompooledlongreadsequencingofthediamondbackmothlepidopteraplutellidae
AT darbyalistairc recoveringindividualhaplotypesandacontiguousgenomeassemblyfrompooledlongreadsequencingofthediamondbackmothlepidopteraplutellidae