Cargando…

The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences

BACKGROUND: In today's age of genomic discovery, no attempt has been made to comprehensively sequence a gymnosperm genome. The largest genus in the coniferous family Pinaceae is Pinus, whose 110-120 species have extremely large genomes (c. 20-40 Gb, 2N = 24). The size and complexity of these ge...

Descripción completa

Detalles Bibliográficos
Autores principales: Kovach, Allen, Wegrzyn, Jill L, Parra, Genis, Holt, Carson, Bruening, George E, Loopstra, Carol A, Hartigan, James, Yandell, Mark, Langley, Charles H, Korf, Ian, Neale, David B
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2996948/
https://www.ncbi.nlm.nih.gov/pubmed/20609256
http://dx.doi.org/10.1186/1471-2164-11-420
_version_ 1782193241430425600
author Kovach, Allen
Wegrzyn, Jill L
Parra, Genis
Holt, Carson
Bruening, George E
Loopstra, Carol A
Hartigan, James
Yandell, Mark
Langley, Charles H
Korf, Ian
Neale, David B
author_facet Kovach, Allen
Wegrzyn, Jill L
Parra, Genis
Holt, Carson
Bruening, George E
Loopstra, Carol A
Hartigan, James
Yandell, Mark
Langley, Charles H
Korf, Ian
Neale, David B
author_sort Kovach, Allen
collection PubMed
description BACKGROUND: In today's age of genomic discovery, no attempt has been made to comprehensively sequence a gymnosperm genome. The largest genus in the coniferous family Pinaceae is Pinus, whose 110-120 species have extremely large genomes (c. 20-40 Gb, 2N = 24). The size and complexity of these genomes have prompted much speculation as to the feasibility of completing a conifer genome sequence. Conifer genomes are reputed to be highly repetitive, but there is little information available on the nature and identity of repetitive units in gymnosperms. The pines have extensive genetic resources, with approximately 329000 ESTs from eleven species and genetic maps in eight species, including a dense genetic map of the twelve linkage groups in Pinus taeda. RESULTS: We present here the Sanger sequence and annotation of ten P. taeda BAC clones and Genome Analyzer II whole genome shotgun (WGS) sequences representing 7.5% of the genome. Computational annotation of ten BACs predicts three putative protein-coding genes and at least fifteen likely pseudogenes in nearly one megabase of sequence. We found three conifer-specific LTR retroelements in the BACs, and tentatively identified at least 15 others based on evidence from the distantly related angiosperms. Alignment of WGS sequences to the BACs indicates that 80% of BAC sequences have similar copies (≥ 75% nucleotide identity) elsewhere in the genome, but only 23% have identical copies (99% identity). The three most common repetitive elements in the genome were identified and, when combined, represent less than 5% of the genome. CONCLUSIONS: This study indicates that the majority of repeats in the P. taeda genome are 'novel' and will therefore require additional BAC or genomic sequencing for accurate characterization. The pine genome contains a very large number of diverged and probably defunct repetitive elements. This study also provides new evidence that sequencing a pine genome using a WGS approach is a feasible goal.
format Text
id pubmed-2996948
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29969482010-12-07 The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences Kovach, Allen Wegrzyn, Jill L Parra, Genis Holt, Carson Bruening, George E Loopstra, Carol A Hartigan, James Yandell, Mark Langley, Charles H Korf, Ian Neale, David B BMC Genomics Research Article BACKGROUND: In today's age of genomic discovery, no attempt has been made to comprehensively sequence a gymnosperm genome. The largest genus in the coniferous family Pinaceae is Pinus, whose 110-120 species have extremely large genomes (c. 20-40 Gb, 2N = 24). The size and complexity of these genomes have prompted much speculation as to the feasibility of completing a conifer genome sequence. Conifer genomes are reputed to be highly repetitive, but there is little information available on the nature and identity of repetitive units in gymnosperms. The pines have extensive genetic resources, with approximately 329000 ESTs from eleven species and genetic maps in eight species, including a dense genetic map of the twelve linkage groups in Pinus taeda. RESULTS: We present here the Sanger sequence and annotation of ten P. taeda BAC clones and Genome Analyzer II whole genome shotgun (WGS) sequences representing 7.5% of the genome. Computational annotation of ten BACs predicts three putative protein-coding genes and at least fifteen likely pseudogenes in nearly one megabase of sequence. We found three conifer-specific LTR retroelements in the BACs, and tentatively identified at least 15 others based on evidence from the distantly related angiosperms. Alignment of WGS sequences to the BACs indicates that 80% of BAC sequences have similar copies (≥ 75% nucleotide identity) elsewhere in the genome, but only 23% have identical copies (99% identity). The three most common repetitive elements in the genome were identified and, when combined, represent less than 5% of the genome. CONCLUSIONS: This study indicates that the majority of repeats in the P. taeda genome are 'novel' and will therefore require additional BAC or genomic sequencing for accurate characterization. The pine genome contains a very large number of diverged and probably defunct repetitive elements. This study also provides new evidence that sequencing a pine genome using a WGS approach is a feasible goal. BioMed Central 2010-07-07 /pmc/articles/PMC2996948/ /pubmed/20609256 http://dx.doi.org/10.1186/1471-2164-11-420 Text en Copyright ©2010 Kovach et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Kovach, Allen
Wegrzyn, Jill L
Parra, Genis
Holt, Carson
Bruening, George E
Loopstra, Carol A
Hartigan, James
Yandell, Mark
Langley, Charles H
Korf, Ian
Neale, David B
The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences
title The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences
title_full The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences
title_fullStr The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences
title_full_unstemmed The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences
title_short The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences
title_sort pinus taeda genome is characterized by diverse and highly diverged repetitive sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2996948/
https://www.ncbi.nlm.nih.gov/pubmed/20609256
http://dx.doi.org/10.1186/1471-2164-11-420
work_keys_str_mv AT kovachallen thepinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences
AT wegrzynjilll thepinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences
AT parragenis thepinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences
AT holtcarson thepinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences
AT brueninggeorgee thepinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences
AT loopstracarola thepinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences
AT hartiganjames thepinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences
AT yandellmark thepinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences
AT langleycharlesh thepinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences
AT korfian thepinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences
AT nealedavidb thepinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences
AT kovachallen pinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences
AT wegrzynjilll pinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences
AT parragenis pinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences
AT holtcarson pinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences
AT brueninggeorgee pinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences
AT loopstracarola pinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences
AT hartiganjames pinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences
AT yandellmark pinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences
AT langleycharlesh pinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences
AT korfian pinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences
AT nealedavidb pinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences