Cargando…

An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing

The 22-gigabase genome of loblolly pine (Pinus taeda) is one of the largest ever sequenced. The draft assembly published in 2014 was built entirely from short Illumina reads, with lengths ranging from 100 to 250 base pairs (bp). The assembly was quite fragmented, containing over 11 million contigs w...

Descripción completa

Detalles Bibliográficos
Autores principales: Zimin, Aleksey V., Stevens, Kristian A., Crepeau, Marc W., Puiu, Daniela, Wegrzyn, Jill L., Yorke, James A., Langley, Charles H., Neale, David B., Salzberg, Steven L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5437942/
https://www.ncbi.nlm.nih.gov/pubmed/28369353
http://dx.doi.org/10.1093/gigascience/giw016
_version_ 1783237677266501632
author Zimin, Aleksey V.
Stevens, Kristian A.
Crepeau, Marc W.
Puiu, Daniela
Wegrzyn, Jill L.
Yorke, James A.
Langley, Charles H.
Neale, David B.
Salzberg, Steven L.
author_facet Zimin, Aleksey V.
Stevens, Kristian A.
Crepeau, Marc W.
Puiu, Daniela
Wegrzyn, Jill L.
Yorke, James A.
Langley, Charles H.
Neale, David B.
Salzberg, Steven L.
author_sort Zimin, Aleksey V.
collection PubMed
description The 22-gigabase genome of loblolly pine (Pinus taeda) is one of the largest ever sequenced. The draft assembly published in 2014 was built entirely from short Illumina reads, with lengths ranging from 100 to 250 base pairs (bp). The assembly was quite fragmented, containing over 11 million contigs whose weighted average (N50) size was 8206 bp. To improve this result, we generated approximately 12-fold coverage in long reads using the Single Molecule Real Time sequencing technology developed at Pacific Biosciences. We assembled the long and short reads together using the MaSuRCA mega-reads assembly algorithm, which produced a substantially better assembly, P. taeda version 2.0. The new assembly has an N50 contig size of 25 361, more than three times as large as achieved in the original assembly, and an N50 scaffold size of 107 821, 61% larger than the previous assembly.
format Online
Article
Text
id pubmed-5437942
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-54379422017-06-14 An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing Zimin, Aleksey V. Stevens, Kristian A. Crepeau, Marc W. Puiu, Daniela Wegrzyn, Jill L. Yorke, James A. Langley, Charles H. Neale, David B. Salzberg, Steven L. Gigascience Data Note The 22-gigabase genome of loblolly pine (Pinus taeda) is one of the largest ever sequenced. The draft assembly published in 2014 was built entirely from short Illumina reads, with lengths ranging from 100 to 250 base pairs (bp). The assembly was quite fragmented, containing over 11 million contigs whose weighted average (N50) size was 8206 bp. To improve this result, we generated approximately 12-fold coverage in long reads using the Single Molecule Real Time sequencing technology developed at Pacific Biosciences. We assembled the long and short reads together using the MaSuRCA mega-reads assembly algorithm, which produced a substantially better assembly, P. taeda version 2.0. The new assembly has an N50 contig size of 25 361, more than three times as large as achieved in the original assembly, and an N50 scaffold size of 107 821, 61% larger than the previous assembly. Oxford University Press 2017-02-15 /pmc/articles/PMC5437942/ /pubmed/28369353 http://dx.doi.org/10.1093/gigascience/giw016 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Data Note
Zimin, Aleksey V.
Stevens, Kristian A.
Crepeau, Marc W.
Puiu, Daniela
Wegrzyn, Jill L.
Yorke, James A.
Langley, Charles H.
Neale, David B.
Salzberg, Steven L.
An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing
title An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing
title_full An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing
title_fullStr An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing
title_full_unstemmed An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing
title_short An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing
title_sort improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing
topic Data Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5437942/
https://www.ncbi.nlm.nih.gov/pubmed/28369353
http://dx.doi.org/10.1093/gigascience/giw016
work_keys_str_mv AT ziminalekseyv animprovedassemblyoftheloblollypinemegagenomeusinglongreadsinglemoleculesequencing
AT stevenskristiana animprovedassemblyoftheloblollypinemegagenomeusinglongreadsinglemoleculesequencing
AT crepeaumarcw animprovedassemblyoftheloblollypinemegagenomeusinglongreadsinglemoleculesequencing
AT puiudaniela animprovedassemblyoftheloblollypinemegagenomeusinglongreadsinglemoleculesequencing
AT wegrzynjilll animprovedassemblyoftheloblollypinemegagenomeusinglongreadsinglemoleculesequencing
AT yorkejamesa animprovedassemblyoftheloblollypinemegagenomeusinglongreadsinglemoleculesequencing
AT langleycharlesh animprovedassemblyoftheloblollypinemegagenomeusinglongreadsinglemoleculesequencing
AT nealedavidb animprovedassemblyoftheloblollypinemegagenomeusinglongreadsinglemoleculesequencing
AT salzbergstevenl animprovedassemblyoftheloblollypinemegagenomeusinglongreadsinglemoleculesequencing
AT ziminalekseyv improvedassemblyoftheloblollypinemegagenomeusinglongreadsinglemoleculesequencing
AT stevenskristiana improvedassemblyoftheloblollypinemegagenomeusinglongreadsinglemoleculesequencing
AT crepeaumarcw improvedassemblyoftheloblollypinemegagenomeusinglongreadsinglemoleculesequencing
AT puiudaniela improvedassemblyoftheloblollypinemegagenomeusinglongreadsinglemoleculesequencing
AT wegrzynjilll improvedassemblyoftheloblollypinemegagenomeusinglongreadsinglemoleculesequencing
AT yorkejamesa improvedassemblyoftheloblollypinemegagenomeusinglongreadsinglemoleculesequencing
AT langleycharlesh improvedassemblyoftheloblollypinemegagenomeusinglongreadsinglemoleculesequencing
AT nealedavidb improvedassemblyoftheloblollypinemegagenomeusinglongreadsinglemoleculesequencing
AT salzbergstevenl improvedassemblyoftheloblollypinemegagenomeusinglongreadsinglemoleculesequencing