Cargando…

Single-Molecule Sequencing of the Drosophila serrata Genome

Long-read sequencing technology promises to greatly enhance de novo assembly of genomes for nonmodel species. Although the error rates of long reads have been a stumbling block, sequencing at high coverage permits the self-correction of many errors. Here, we sequence and de novo assemble the genome...

Descripción completa

Detalles Bibliográficos
Autores principales: Allen, Scott L., Delaney, Emily K., Kopp, Artyom, Chenoweth, Stephen F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5345708/
https://www.ncbi.nlm.nih.gov/pubmed/28143951
http://dx.doi.org/10.1534/g3.116.037598
_version_ 1782513768557707264
author Allen, Scott L.
Delaney, Emily K.
Kopp, Artyom
Chenoweth, Stephen F.
author_facet Allen, Scott L.
Delaney, Emily K.
Kopp, Artyom
Chenoweth, Stephen F.
author_sort Allen, Scott L.
collection PubMed
description Long-read sequencing technology promises to greatly enhance de novo assembly of genomes for nonmodel species. Although the error rates of long reads have been a stumbling block, sequencing at high coverage permits the self-correction of many errors. Here, we sequence and de novo assemble the genome of Drosophila serrata, a species from the montium subgroup that has been well-studied for latitudinal clines, sexual selection, and gene expression, but which lacks a reference genome. Using 11 PacBio single-molecule real-time (SMRT cells), we generated 12 Gbp of raw sequence data comprising ∼65 × whole-genome coverage. Read lengths averaged 8940 bp (NRead50 12,200) with the longest read at 53 kbp. We self-corrected reads using the PBDagCon algorithm and assembled the genome using the MHAP algorithm within the PBcR assembler. Total genome length was 198 Mbp with an N50 just under 1 Mbp. Contigs displayed a high degree of chromosome arm-level conservation with the D. melanogaster genome and many could be sensibly placed on the D. serrata physical map. We also provide an initial annotation for this genome using in silico gene predictions that were supported by RNA-seq data.
format Online
Article
Text
id pubmed-5345708
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-53457082017-03-21 Single-Molecule Sequencing of the Drosophila serrata Genome Allen, Scott L. Delaney, Emily K. Kopp, Artyom Chenoweth, Stephen F. G3 (Bethesda) Investigations Long-read sequencing technology promises to greatly enhance de novo assembly of genomes for nonmodel species. Although the error rates of long reads have been a stumbling block, sequencing at high coverage permits the self-correction of many errors. Here, we sequence and de novo assemble the genome of Drosophila serrata, a species from the montium subgroup that has been well-studied for latitudinal clines, sexual selection, and gene expression, but which lacks a reference genome. Using 11 PacBio single-molecule real-time (SMRT cells), we generated 12 Gbp of raw sequence data comprising ∼65 × whole-genome coverage. Read lengths averaged 8940 bp (NRead50 12,200) with the longest read at 53 kbp. We self-corrected reads using the PBDagCon algorithm and assembled the genome using the MHAP algorithm within the PBcR assembler. Total genome length was 198 Mbp with an N50 just under 1 Mbp. Contigs displayed a high degree of chromosome arm-level conservation with the D. melanogaster genome and many could be sensibly placed on the D. serrata physical map. We also provide an initial annotation for this genome using in silico gene predictions that were supported by RNA-seq data. Genetics Society of America 2017-01-30 /pmc/articles/PMC5345708/ /pubmed/28143951 http://dx.doi.org/10.1534/g3.116.037598 Text en Copyright © 2017 Allen et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Investigations
Allen, Scott L.
Delaney, Emily K.
Kopp, Artyom
Chenoweth, Stephen F.
Single-Molecule Sequencing of the Drosophila serrata Genome
title Single-Molecule Sequencing of the Drosophila serrata Genome
title_full Single-Molecule Sequencing of the Drosophila serrata Genome
title_fullStr Single-Molecule Sequencing of the Drosophila serrata Genome
title_full_unstemmed Single-Molecule Sequencing of the Drosophila serrata Genome
title_short Single-Molecule Sequencing of the Drosophila serrata Genome
title_sort single-molecule sequencing of the drosophila serrata genome
topic Investigations
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5345708/
https://www.ncbi.nlm.nih.gov/pubmed/28143951
http://dx.doi.org/10.1534/g3.116.037598
work_keys_str_mv AT allenscottl singlemoleculesequencingofthedrosophilaserratagenome
AT delaneyemilyk singlemoleculesequencingofthedrosophilaserratagenome
AT koppartyom singlemoleculesequencingofthedrosophilaserratagenome
AT chenowethstephenf singlemoleculesequencingofthedrosophilaserratagenome