Cargando…

Rapid Low-Cost Assembly of the Drosophila melanogaster Reference Genome Using Low-Coverage, Long-Read Sequencing

Accurate and comprehensive characterization of genetic variation is essential for deciphering the genetic basis of diseases and other phenotypes. A vast amount of genetic variation stems from large-scale sequence changes arising from the duplication, deletion, inversion, and translocation of sequenc...

Descripción completa

Detalles Bibliográficos
Autores principales: Solares, Edwin A., Chakraborty, Mahul, Miller, Danny E., Kalsow, Shannon, Hall, Kate, Perera, Anoja G., Emerson, J. J., Hawley, R. Scott
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6169397/
https://www.ncbi.nlm.nih.gov/pubmed/30018084
http://dx.doi.org/10.1534/g3.118.200162
_version_ 1783360510867013632
author Solares, Edwin A.
Chakraborty, Mahul
Miller, Danny E.
Kalsow, Shannon
Hall, Kate
Perera, Anoja G.
Emerson, J. J.
Hawley, R. Scott
author_facet Solares, Edwin A.
Chakraborty, Mahul
Miller, Danny E.
Kalsow, Shannon
Hall, Kate
Perera, Anoja G.
Emerson, J. J.
Hawley, R. Scott
author_sort Solares, Edwin A.
collection PubMed
description Accurate and comprehensive characterization of genetic variation is essential for deciphering the genetic basis of diseases and other phenotypes. A vast amount of genetic variation stems from large-scale sequence changes arising from the duplication, deletion, inversion, and translocation of sequences. In the past 10 years, high-throughput short reads have greatly expanded our ability to assay sequence variation due to single nucleotide polymorphisms. However, a recent de novo assembly of a second Drosophila melanogaster reference genome has revealed that short read genotyping methods miss hundreds of structural variants, including those affecting phenotypes. While genomes assembled using high-coverage long reads can achieve high levels of contiguity and completeness, concerns about cost, errors, and low yield have limited widespread adoption of such sequencing approaches. Here we resequenced the reference strain of D. melanogaster (ISO1) on a single Oxford Nanopore MinION flow cell run for 24 hr. Using only reads longer than 1 kb or with at least 30x coverage, we assembled a highly contiguous de novo genome. The addition of inexpensive paired reads and subsequent scaffolding using an optical map technology achieved an assembly with completeness and contiguity comparable to the D. melanogaster reference assembly. Comparison of our assembly to the reference assembly of ISO1 uncovered a number of structural variants (SVs), including novel LTR transposable element insertions and duplications affecting genes with developmental, behavioral, and metabolic functions. Collectively, these SVs provide a snapshot of the dynamics of genome evolution. Furthermore, our assembly and comparison to the D. melanogaster reference genome demonstrates that high-quality de novo assembly of reference genomes and comprehensive variant discovery using such assemblies are now possible by a single lab for under $1,000 (USD).
format Online
Article
Text
id pubmed-6169397
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-61693972018-10-04 Rapid Low-Cost Assembly of the Drosophila melanogaster Reference Genome Using Low-Coverage, Long-Read Sequencing Solares, Edwin A. Chakraborty, Mahul Miller, Danny E. Kalsow, Shannon Hall, Kate Perera, Anoja G. Emerson, J. J. Hawley, R. Scott G3 (Bethesda) Investigations Accurate and comprehensive characterization of genetic variation is essential for deciphering the genetic basis of diseases and other phenotypes. A vast amount of genetic variation stems from large-scale sequence changes arising from the duplication, deletion, inversion, and translocation of sequences. In the past 10 years, high-throughput short reads have greatly expanded our ability to assay sequence variation due to single nucleotide polymorphisms. However, a recent de novo assembly of a second Drosophila melanogaster reference genome has revealed that short read genotyping methods miss hundreds of structural variants, including those affecting phenotypes. While genomes assembled using high-coverage long reads can achieve high levels of contiguity and completeness, concerns about cost, errors, and low yield have limited widespread adoption of such sequencing approaches. Here we resequenced the reference strain of D. melanogaster (ISO1) on a single Oxford Nanopore MinION flow cell run for 24 hr. Using only reads longer than 1 kb or with at least 30x coverage, we assembled a highly contiguous de novo genome. The addition of inexpensive paired reads and subsequent scaffolding using an optical map technology achieved an assembly with completeness and contiguity comparable to the D. melanogaster reference assembly. Comparison of our assembly to the reference assembly of ISO1 uncovered a number of structural variants (SVs), including novel LTR transposable element insertions and duplications affecting genes with developmental, behavioral, and metabolic functions. Collectively, these SVs provide a snapshot of the dynamics of genome evolution. Furthermore, our assembly and comparison to the D. melanogaster reference genome demonstrates that high-quality de novo assembly of reference genomes and comprehensive variant discovery using such assemblies are now possible by a single lab for under $1,000 (USD). Genetics Society of America 2018-07-19 /pmc/articles/PMC6169397/ /pubmed/30018084 http://dx.doi.org/10.1534/g3.118.200162 Text en Copyright © 2018 Solares et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Investigations
Solares, Edwin A.
Chakraborty, Mahul
Miller, Danny E.
Kalsow, Shannon
Hall, Kate
Perera, Anoja G.
Emerson, J. J.
Hawley, R. Scott
Rapid Low-Cost Assembly of the Drosophila melanogaster Reference Genome Using Low-Coverage, Long-Read Sequencing
title Rapid Low-Cost Assembly of the Drosophila melanogaster Reference Genome Using Low-Coverage, Long-Read Sequencing
title_full Rapid Low-Cost Assembly of the Drosophila melanogaster Reference Genome Using Low-Coverage, Long-Read Sequencing
title_fullStr Rapid Low-Cost Assembly of the Drosophila melanogaster Reference Genome Using Low-Coverage, Long-Read Sequencing
title_full_unstemmed Rapid Low-Cost Assembly of the Drosophila melanogaster Reference Genome Using Low-Coverage, Long-Read Sequencing
title_short Rapid Low-Cost Assembly of the Drosophila melanogaster Reference Genome Using Low-Coverage, Long-Read Sequencing
title_sort rapid low-cost assembly of the drosophila melanogaster reference genome using low-coverage, long-read sequencing
topic Investigations
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6169397/
https://www.ncbi.nlm.nih.gov/pubmed/30018084
http://dx.doi.org/10.1534/g3.118.200162
work_keys_str_mv AT solaresedwina rapidlowcostassemblyofthedrosophilamelanogasterreferencegenomeusinglowcoveragelongreadsequencing
AT chakrabortymahul rapidlowcostassemblyofthedrosophilamelanogasterreferencegenomeusinglowcoveragelongreadsequencing
AT millerdannye rapidlowcostassemblyofthedrosophilamelanogasterreferencegenomeusinglowcoveragelongreadsequencing
AT kalsowshannon rapidlowcostassemblyofthedrosophilamelanogasterreferencegenomeusinglowcoveragelongreadsequencing
AT hallkate rapidlowcostassemblyofthedrosophilamelanogasterreferencegenomeusinglowcoveragelongreadsequencing
AT pereraanojag rapidlowcostassemblyofthedrosophilamelanogasterreferencegenomeusinglowcoveragelongreadsequencing
AT emersonjj rapidlowcostassemblyofthedrosophilamelanogasterreferencegenomeusinglowcoveragelongreadsequencing
AT hawleyrscott rapidlowcostassemblyofthedrosophilamelanogasterreferencegenomeusinglowcoveragelongreadsequencing