Cargando…
Sequencing and Assembly of the 22-Gb Loblolly Pine Genome
Conifers are the predominant gymnosperm. The size and complexity of their genomes has presented formidable technical challenges for whole-genome shotgun sequencing and assembly. We employed novel strategies that allowed us to determine the loblolly pine (Pinus taeda) reference genome sequence, the l...
Autores principales: | , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Genetics Society of America
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3948813/ https://www.ncbi.nlm.nih.gov/pubmed/24653210 http://dx.doi.org/10.1534/genetics.113.159715 |
_version_ | 1782306832818110464 |
---|---|
author | Zimin, Aleksey Stevens, Kristian A. Crepeau, Marc W. Holtz-Morris, Ann Koriabine, Maxim Marçais, Guillaume Puiu, Daniela Roberts, Michael Wegrzyn, Jill L. de Jong, Pieter J. Neale, David B. Salzberg, Steven L. Yorke, James A. Langley, Charles H. |
author_facet | Zimin, Aleksey Stevens, Kristian A. Crepeau, Marc W. Holtz-Morris, Ann Koriabine, Maxim Marçais, Guillaume Puiu, Daniela Roberts, Michael Wegrzyn, Jill L. de Jong, Pieter J. Neale, David B. Salzberg, Steven L. Yorke, James A. Langley, Charles H. |
author_sort | Zimin, Aleksey |
collection | PubMed |
description | Conifers are the predominant gymnosperm. The size and complexity of their genomes has presented formidable technical challenges for whole-genome shotgun sequencing and assembly. We employed novel strategies that allowed us to determine the loblolly pine (Pinus taeda) reference genome sequence, the largest genome assembled to date. Most of the sequence data were derived from whole-genome shotgun sequencing of a single megagametophyte, the haploid tissue of a single pine seed. Although that constrained the quantity of available DNA, the resulting haploid sequence data were well-suited for assembly. The haploid sequence was augmented with multiple linking long-fragment mate pair libraries from the parental diploid DNA. For the longest fragments, we used novel fosmid DiTag libraries. Sequences from the linking libraries that did not match the megagametophyte were identified and removed. Assembly of the sequence data were aided by condensing the enormous number of paired-end reads into a much smaller set of longer “super-reads,” rendering subsequent assembly with an overlap-based assembly algorithm computationally feasible. To further improve the contiguity and biological utility of the genome sequence, additional scaffolding methods utilizing independent genome and transcriptome assemblies were implemented. The combination of these strategies resulted in a draft genome sequence of 20.15 billion bases, with an N50 scaffold size of 66.9 kbp. |
format | Online Article Text |
id | pubmed-3948813 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Genetics Society of America |
record_format | MEDLINE/PubMed |
spelling | pubmed-39488132015-03-01 Sequencing and Assembly of the 22-Gb Loblolly Pine Genome Zimin, Aleksey Stevens, Kristian A. Crepeau, Marc W. Holtz-Morris, Ann Koriabine, Maxim Marçais, Guillaume Puiu, Daniela Roberts, Michael Wegrzyn, Jill L. de Jong, Pieter J. Neale, David B. Salzberg, Steven L. Yorke, James A. Langley, Charles H. Genetics Investigations Conifers are the predominant gymnosperm. The size and complexity of their genomes has presented formidable technical challenges for whole-genome shotgun sequencing and assembly. We employed novel strategies that allowed us to determine the loblolly pine (Pinus taeda) reference genome sequence, the largest genome assembled to date. Most of the sequence data were derived from whole-genome shotgun sequencing of a single megagametophyte, the haploid tissue of a single pine seed. Although that constrained the quantity of available DNA, the resulting haploid sequence data were well-suited for assembly. The haploid sequence was augmented with multiple linking long-fragment mate pair libraries from the parental diploid DNA. For the longest fragments, we used novel fosmid DiTag libraries. Sequences from the linking libraries that did not match the megagametophyte were identified and removed. Assembly of the sequence data were aided by condensing the enormous number of paired-end reads into a much smaller set of longer “super-reads,” rendering subsequent assembly with an overlap-based assembly algorithm computationally feasible. To further improve the contiguity and biological utility of the genome sequence, additional scaffolding methods utilizing independent genome and transcriptome assemblies were implemented. The combination of these strategies resulted in a draft genome sequence of 20.15 billion bases, with an N50 scaffold size of 66.9 kbp. Genetics Society of America 2014-03 /pmc/articles/PMC3948813/ /pubmed/24653210 http://dx.doi.org/10.1534/genetics.113.159715 Text en Copyright © 2014 by the Genetics Society of America Available freely online through the author-supported open access option. |
spellingShingle | Investigations Zimin, Aleksey Stevens, Kristian A. Crepeau, Marc W. Holtz-Morris, Ann Koriabine, Maxim Marçais, Guillaume Puiu, Daniela Roberts, Michael Wegrzyn, Jill L. de Jong, Pieter J. Neale, David B. Salzberg, Steven L. Yorke, James A. Langley, Charles H. Sequencing and Assembly of the 22-Gb Loblolly Pine Genome |
title | Sequencing and Assembly of the 22-Gb Loblolly Pine Genome |
title_full | Sequencing and Assembly of the 22-Gb Loblolly Pine Genome |
title_fullStr | Sequencing and Assembly of the 22-Gb Loblolly Pine Genome |
title_full_unstemmed | Sequencing and Assembly of the 22-Gb Loblolly Pine Genome |
title_short | Sequencing and Assembly of the 22-Gb Loblolly Pine Genome |
title_sort | sequencing and assembly of the 22-gb loblolly pine genome |
topic | Investigations |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3948813/ https://www.ncbi.nlm.nih.gov/pubmed/24653210 http://dx.doi.org/10.1534/genetics.113.159715 |
work_keys_str_mv | AT ziminaleksey sequencingandassemblyofthe22gbloblollypinegenome AT stevenskristiana sequencingandassemblyofthe22gbloblollypinegenome AT crepeaumarcw sequencingandassemblyofthe22gbloblollypinegenome AT holtzmorrisann sequencingandassemblyofthe22gbloblollypinegenome AT koriabinemaxim sequencingandassemblyofthe22gbloblollypinegenome AT marcaisguillaume sequencingandassemblyofthe22gbloblollypinegenome AT puiudaniela sequencingandassemblyofthe22gbloblollypinegenome AT robertsmichael sequencingandassemblyofthe22gbloblollypinegenome AT wegrzynjilll sequencingandassemblyofthe22gbloblollypinegenome AT dejongpieterj sequencingandassemblyofthe22gbloblollypinegenome AT nealedavidb sequencingandassemblyofthe22gbloblollypinegenome AT salzbergstevenl sequencingandassemblyofthe22gbloblollypinegenome AT yorkejamesa sequencingandassemblyofthe22gbloblollypinegenome AT langleycharlesh sequencingandassemblyofthe22gbloblollypinegenome |