Cargando…

Sequencing and Assembly of the 22-Gb Loblolly Pine Genome

Conifers are the predominant gymnosperm. The size and complexity of their genomes has presented formidable technical challenges for whole-genome shotgun sequencing and assembly. We employed novel strategies that allowed us to determine the loblolly pine (Pinus taeda) reference genome sequence, the l...

Descripción completa

Detalles Bibliográficos
Autores principales: Zimin, Aleksey, Stevens, Kristian A., Crepeau, Marc W., Holtz-Morris, Ann, Koriabine, Maxim, Marçais, Guillaume, Puiu, Daniela, Roberts, Michael, Wegrzyn, Jill L., de Jong, Pieter J., Neale, David B., Salzberg, Steven L., Yorke, James A., Langley, Charles H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3948813/
https://www.ncbi.nlm.nih.gov/pubmed/24653210
http://dx.doi.org/10.1534/genetics.113.159715
_version_ 1782306832818110464
author Zimin, Aleksey
Stevens, Kristian A.
Crepeau, Marc W.
Holtz-Morris, Ann
Koriabine, Maxim
Marçais, Guillaume
Puiu, Daniela
Roberts, Michael
Wegrzyn, Jill L.
de Jong, Pieter J.
Neale, David B.
Salzberg, Steven L.
Yorke, James A.
Langley, Charles H.
author_facet Zimin, Aleksey
Stevens, Kristian A.
Crepeau, Marc W.
Holtz-Morris, Ann
Koriabine, Maxim
Marçais, Guillaume
Puiu, Daniela
Roberts, Michael
Wegrzyn, Jill L.
de Jong, Pieter J.
Neale, David B.
Salzberg, Steven L.
Yorke, James A.
Langley, Charles H.
author_sort Zimin, Aleksey
collection PubMed
description Conifers are the predominant gymnosperm. The size and complexity of their genomes has presented formidable technical challenges for whole-genome shotgun sequencing and assembly. We employed novel strategies that allowed us to determine the loblolly pine (Pinus taeda) reference genome sequence, the largest genome assembled to date. Most of the sequence data were derived from whole-genome shotgun sequencing of a single megagametophyte, the haploid tissue of a single pine seed. Although that constrained the quantity of available DNA, the resulting haploid sequence data were well-suited for assembly. The haploid sequence was augmented with multiple linking long-fragment mate pair libraries from the parental diploid DNA. For the longest fragments, we used novel fosmid DiTag libraries. Sequences from the linking libraries that did not match the megagametophyte were identified and removed. Assembly of the sequence data were aided by condensing the enormous number of paired-end reads into a much smaller set of longer “super-reads,” rendering subsequent assembly with an overlap-based assembly algorithm computationally feasible. To further improve the contiguity and biological utility of the genome sequence, additional scaffolding methods utilizing independent genome and transcriptome assemblies were implemented. The combination of these strategies resulted in a draft genome sequence of 20.15 billion bases, with an N50 scaffold size of 66.9 kbp.
format Online
Article
Text
id pubmed-3948813
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-39488132015-03-01 Sequencing and Assembly of the 22-Gb Loblolly Pine Genome Zimin, Aleksey Stevens, Kristian A. Crepeau, Marc W. Holtz-Morris, Ann Koriabine, Maxim Marçais, Guillaume Puiu, Daniela Roberts, Michael Wegrzyn, Jill L. de Jong, Pieter J. Neale, David B. Salzberg, Steven L. Yorke, James A. Langley, Charles H. Genetics Investigations Conifers are the predominant gymnosperm. The size and complexity of their genomes has presented formidable technical challenges for whole-genome shotgun sequencing and assembly. We employed novel strategies that allowed us to determine the loblolly pine (Pinus taeda) reference genome sequence, the largest genome assembled to date. Most of the sequence data were derived from whole-genome shotgun sequencing of a single megagametophyte, the haploid tissue of a single pine seed. Although that constrained the quantity of available DNA, the resulting haploid sequence data were well-suited for assembly. The haploid sequence was augmented with multiple linking long-fragment mate pair libraries from the parental diploid DNA. For the longest fragments, we used novel fosmid DiTag libraries. Sequences from the linking libraries that did not match the megagametophyte were identified and removed. Assembly of the sequence data were aided by condensing the enormous number of paired-end reads into a much smaller set of longer “super-reads,” rendering subsequent assembly with an overlap-based assembly algorithm computationally feasible. To further improve the contiguity and biological utility of the genome sequence, additional scaffolding methods utilizing independent genome and transcriptome assemblies were implemented. The combination of these strategies resulted in a draft genome sequence of 20.15 billion bases, with an N50 scaffold size of 66.9 kbp. Genetics Society of America 2014-03 /pmc/articles/PMC3948813/ /pubmed/24653210 http://dx.doi.org/10.1534/genetics.113.159715 Text en Copyright © 2014 by the Genetics Society of America Available freely online through the author-supported open access option.
spellingShingle Investigations
Zimin, Aleksey
Stevens, Kristian A.
Crepeau, Marc W.
Holtz-Morris, Ann
Koriabine, Maxim
Marçais, Guillaume
Puiu, Daniela
Roberts, Michael
Wegrzyn, Jill L.
de Jong, Pieter J.
Neale, David B.
Salzberg, Steven L.
Yorke, James A.
Langley, Charles H.
Sequencing and Assembly of the 22-Gb Loblolly Pine Genome
title Sequencing and Assembly of the 22-Gb Loblolly Pine Genome
title_full Sequencing and Assembly of the 22-Gb Loblolly Pine Genome
title_fullStr Sequencing and Assembly of the 22-Gb Loblolly Pine Genome
title_full_unstemmed Sequencing and Assembly of the 22-Gb Loblolly Pine Genome
title_short Sequencing and Assembly of the 22-Gb Loblolly Pine Genome
title_sort sequencing and assembly of the 22-gb loblolly pine genome
topic Investigations
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3948813/
https://www.ncbi.nlm.nih.gov/pubmed/24653210
http://dx.doi.org/10.1534/genetics.113.159715
work_keys_str_mv AT ziminaleksey sequencingandassemblyofthe22gbloblollypinegenome
AT stevenskristiana sequencingandassemblyofthe22gbloblollypinegenome
AT crepeaumarcw sequencingandassemblyofthe22gbloblollypinegenome
AT holtzmorrisann sequencingandassemblyofthe22gbloblollypinegenome
AT koriabinemaxim sequencingandassemblyofthe22gbloblollypinegenome
AT marcaisguillaume sequencingandassemblyofthe22gbloblollypinegenome
AT puiudaniela sequencingandassemblyofthe22gbloblollypinegenome
AT robertsmichael sequencingandassemblyofthe22gbloblollypinegenome
AT wegrzynjilll sequencingandassemblyofthe22gbloblollypinegenome
AT dejongpieterj sequencingandassemblyofthe22gbloblollypinegenome
AT nealedavidb sequencingandassemblyofthe22gbloblollypinegenome
AT salzbergstevenl sequencingandassemblyofthe22gbloblollypinegenome
AT yorkejamesa sequencingandassemblyofthe22gbloblollypinegenome
AT langleycharlesh sequencingandassemblyofthe22gbloblollypinegenome