Cargando…
Rapid Low-Cost Assembly of the Drosophila melanogaster Reference Genome Using Low-Coverage, Long-Read Sequencing
Accurate and comprehensive characterization of genetic variation is essential for deciphering the genetic basis of diseases and other phenotypes. A vast amount of genetic variation stems from large-scale sequence changes arising from the duplication, deletion, inversion, and translocation of sequenc...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Genetics Society of America
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6169397/ https://www.ncbi.nlm.nih.gov/pubmed/30018084 http://dx.doi.org/10.1534/g3.118.200162 |
_version_ | 1783360510867013632 |
---|---|
author | Solares, Edwin A. Chakraborty, Mahul Miller, Danny E. Kalsow, Shannon Hall, Kate Perera, Anoja G. Emerson, J. J. Hawley, R. Scott |
author_facet | Solares, Edwin A. Chakraborty, Mahul Miller, Danny E. Kalsow, Shannon Hall, Kate Perera, Anoja G. Emerson, J. J. Hawley, R. Scott |
author_sort | Solares, Edwin A. |
collection | PubMed |
description | Accurate and comprehensive characterization of genetic variation is essential for deciphering the genetic basis of diseases and other phenotypes. A vast amount of genetic variation stems from large-scale sequence changes arising from the duplication, deletion, inversion, and translocation of sequences. In the past 10 years, high-throughput short reads have greatly expanded our ability to assay sequence variation due to single nucleotide polymorphisms. However, a recent de novo assembly of a second Drosophila melanogaster reference genome has revealed that short read genotyping methods miss hundreds of structural variants, including those affecting phenotypes. While genomes assembled using high-coverage long reads can achieve high levels of contiguity and completeness, concerns about cost, errors, and low yield have limited widespread adoption of such sequencing approaches. Here we resequenced the reference strain of D. melanogaster (ISO1) on a single Oxford Nanopore MinION flow cell run for 24 hr. Using only reads longer than 1 kb or with at least 30x coverage, we assembled a highly contiguous de novo genome. The addition of inexpensive paired reads and subsequent scaffolding using an optical map technology achieved an assembly with completeness and contiguity comparable to the D. melanogaster reference assembly. Comparison of our assembly to the reference assembly of ISO1 uncovered a number of structural variants (SVs), including novel LTR transposable element insertions and duplications affecting genes with developmental, behavioral, and metabolic functions. Collectively, these SVs provide a snapshot of the dynamics of genome evolution. Furthermore, our assembly and comparison to the D. melanogaster reference genome demonstrates that high-quality de novo assembly of reference genomes and comprehensive variant discovery using such assemblies are now possible by a single lab for under $1,000 (USD). |
format | Online Article Text |
id | pubmed-6169397 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Genetics Society of America |
record_format | MEDLINE/PubMed |
spelling | pubmed-61693972018-10-04 Rapid Low-Cost Assembly of the Drosophila melanogaster Reference Genome Using Low-Coverage, Long-Read Sequencing Solares, Edwin A. Chakraborty, Mahul Miller, Danny E. Kalsow, Shannon Hall, Kate Perera, Anoja G. Emerson, J. J. Hawley, R. Scott G3 (Bethesda) Investigations Accurate and comprehensive characterization of genetic variation is essential for deciphering the genetic basis of diseases and other phenotypes. A vast amount of genetic variation stems from large-scale sequence changes arising from the duplication, deletion, inversion, and translocation of sequences. In the past 10 years, high-throughput short reads have greatly expanded our ability to assay sequence variation due to single nucleotide polymorphisms. However, a recent de novo assembly of a second Drosophila melanogaster reference genome has revealed that short read genotyping methods miss hundreds of structural variants, including those affecting phenotypes. While genomes assembled using high-coverage long reads can achieve high levels of contiguity and completeness, concerns about cost, errors, and low yield have limited widespread adoption of such sequencing approaches. Here we resequenced the reference strain of D. melanogaster (ISO1) on a single Oxford Nanopore MinION flow cell run for 24 hr. Using only reads longer than 1 kb or with at least 30x coverage, we assembled a highly contiguous de novo genome. The addition of inexpensive paired reads and subsequent scaffolding using an optical map technology achieved an assembly with completeness and contiguity comparable to the D. melanogaster reference assembly. Comparison of our assembly to the reference assembly of ISO1 uncovered a number of structural variants (SVs), including novel LTR transposable element insertions and duplications affecting genes with developmental, behavioral, and metabolic functions. Collectively, these SVs provide a snapshot of the dynamics of genome evolution. Furthermore, our assembly and comparison to the D. melanogaster reference genome demonstrates that high-quality de novo assembly of reference genomes and comprehensive variant discovery using such assemblies are now possible by a single lab for under $1,000 (USD). Genetics Society of America 2018-07-19 /pmc/articles/PMC6169397/ /pubmed/30018084 http://dx.doi.org/10.1534/g3.118.200162 Text en Copyright © 2018 Solares et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Investigations Solares, Edwin A. Chakraborty, Mahul Miller, Danny E. Kalsow, Shannon Hall, Kate Perera, Anoja G. Emerson, J. J. Hawley, R. Scott Rapid Low-Cost Assembly of the Drosophila melanogaster Reference Genome Using Low-Coverage, Long-Read Sequencing |
title | Rapid Low-Cost Assembly of the Drosophila melanogaster Reference Genome Using Low-Coverage, Long-Read Sequencing |
title_full | Rapid Low-Cost Assembly of the Drosophila melanogaster Reference Genome Using Low-Coverage, Long-Read Sequencing |
title_fullStr | Rapid Low-Cost Assembly of the Drosophila melanogaster Reference Genome Using Low-Coverage, Long-Read Sequencing |
title_full_unstemmed | Rapid Low-Cost Assembly of the Drosophila melanogaster Reference Genome Using Low-Coverage, Long-Read Sequencing |
title_short | Rapid Low-Cost Assembly of the Drosophila melanogaster Reference Genome Using Low-Coverage, Long-Read Sequencing |
title_sort | rapid low-cost assembly of the drosophila melanogaster reference genome using low-coverage, long-read sequencing |
topic | Investigations |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6169397/ https://www.ncbi.nlm.nih.gov/pubmed/30018084 http://dx.doi.org/10.1534/g3.118.200162 |
work_keys_str_mv | AT solaresedwina rapidlowcostassemblyofthedrosophilamelanogasterreferencegenomeusinglowcoveragelongreadsequencing AT chakrabortymahul rapidlowcostassemblyofthedrosophilamelanogasterreferencegenomeusinglowcoveragelongreadsequencing AT millerdannye rapidlowcostassemblyofthedrosophilamelanogasterreferencegenomeusinglowcoveragelongreadsequencing AT kalsowshannon rapidlowcostassemblyofthedrosophilamelanogasterreferencegenomeusinglowcoveragelongreadsequencing AT hallkate rapidlowcostassemblyofthedrosophilamelanogasterreferencegenomeusinglowcoveragelongreadsequencing AT pereraanojag rapidlowcostassemblyofthedrosophilamelanogasterreferencegenomeusinglowcoveragelongreadsequencing AT emersonjj rapidlowcostassemblyofthedrosophilamelanogasterreferencegenomeusinglowcoveragelongreadsequencing AT hawleyrscott rapidlowcostassemblyofthedrosophilamelanogasterreferencegenomeusinglowcoveragelongreadsequencing |