Cargando…
The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences
BACKGROUND: In today's age of genomic discovery, no attempt has been made to comprehensively sequence a gymnosperm genome. The largest genus in the coniferous family Pinaceae is Pinus, whose 110-120 species have extremely large genomes (c. 20-40 Gb, 2N = 24). The size and complexity of these ge...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2996948/ https://www.ncbi.nlm.nih.gov/pubmed/20609256 http://dx.doi.org/10.1186/1471-2164-11-420 |
_version_ | 1782193241430425600 |
---|---|
author | Kovach, Allen Wegrzyn, Jill L Parra, Genis Holt, Carson Bruening, George E Loopstra, Carol A Hartigan, James Yandell, Mark Langley, Charles H Korf, Ian Neale, David B |
author_facet | Kovach, Allen Wegrzyn, Jill L Parra, Genis Holt, Carson Bruening, George E Loopstra, Carol A Hartigan, James Yandell, Mark Langley, Charles H Korf, Ian Neale, David B |
author_sort | Kovach, Allen |
collection | PubMed |
description | BACKGROUND: In today's age of genomic discovery, no attempt has been made to comprehensively sequence a gymnosperm genome. The largest genus in the coniferous family Pinaceae is Pinus, whose 110-120 species have extremely large genomes (c. 20-40 Gb, 2N = 24). The size and complexity of these genomes have prompted much speculation as to the feasibility of completing a conifer genome sequence. Conifer genomes are reputed to be highly repetitive, but there is little information available on the nature and identity of repetitive units in gymnosperms. The pines have extensive genetic resources, with approximately 329000 ESTs from eleven species and genetic maps in eight species, including a dense genetic map of the twelve linkage groups in Pinus taeda. RESULTS: We present here the Sanger sequence and annotation of ten P. taeda BAC clones and Genome Analyzer II whole genome shotgun (WGS) sequences representing 7.5% of the genome. Computational annotation of ten BACs predicts three putative protein-coding genes and at least fifteen likely pseudogenes in nearly one megabase of sequence. We found three conifer-specific LTR retroelements in the BACs, and tentatively identified at least 15 others based on evidence from the distantly related angiosperms. Alignment of WGS sequences to the BACs indicates that 80% of BAC sequences have similar copies (≥ 75% nucleotide identity) elsewhere in the genome, but only 23% have identical copies (99% identity). The three most common repetitive elements in the genome were identified and, when combined, represent less than 5% of the genome. CONCLUSIONS: This study indicates that the majority of repeats in the P. taeda genome are 'novel' and will therefore require additional BAC or genomic sequencing for accurate characterization. The pine genome contains a very large number of diverged and probably defunct repetitive elements. This study also provides new evidence that sequencing a pine genome using a WGS approach is a feasible goal. |
format | Text |
id | pubmed-2996948 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-29969482010-12-07 The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences Kovach, Allen Wegrzyn, Jill L Parra, Genis Holt, Carson Bruening, George E Loopstra, Carol A Hartigan, James Yandell, Mark Langley, Charles H Korf, Ian Neale, David B BMC Genomics Research Article BACKGROUND: In today's age of genomic discovery, no attempt has been made to comprehensively sequence a gymnosperm genome. The largest genus in the coniferous family Pinaceae is Pinus, whose 110-120 species have extremely large genomes (c. 20-40 Gb, 2N = 24). The size and complexity of these genomes have prompted much speculation as to the feasibility of completing a conifer genome sequence. Conifer genomes are reputed to be highly repetitive, but there is little information available on the nature and identity of repetitive units in gymnosperms. The pines have extensive genetic resources, with approximately 329000 ESTs from eleven species and genetic maps in eight species, including a dense genetic map of the twelve linkage groups in Pinus taeda. RESULTS: We present here the Sanger sequence and annotation of ten P. taeda BAC clones and Genome Analyzer II whole genome shotgun (WGS) sequences representing 7.5% of the genome. Computational annotation of ten BACs predicts three putative protein-coding genes and at least fifteen likely pseudogenes in nearly one megabase of sequence. We found three conifer-specific LTR retroelements in the BACs, and tentatively identified at least 15 others based on evidence from the distantly related angiosperms. Alignment of WGS sequences to the BACs indicates that 80% of BAC sequences have similar copies (≥ 75% nucleotide identity) elsewhere in the genome, but only 23% have identical copies (99% identity). The three most common repetitive elements in the genome were identified and, when combined, represent less than 5% of the genome. CONCLUSIONS: This study indicates that the majority of repeats in the P. taeda genome are 'novel' and will therefore require additional BAC or genomic sequencing for accurate characterization. The pine genome contains a very large number of diverged and probably defunct repetitive elements. This study also provides new evidence that sequencing a pine genome using a WGS approach is a feasible goal. BioMed Central 2010-07-07 /pmc/articles/PMC2996948/ /pubmed/20609256 http://dx.doi.org/10.1186/1471-2164-11-420 Text en Copyright ©2010 Kovach et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Kovach, Allen Wegrzyn, Jill L Parra, Genis Holt, Carson Bruening, George E Loopstra, Carol A Hartigan, James Yandell, Mark Langley, Charles H Korf, Ian Neale, David B The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences |
title | The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences |
title_full | The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences |
title_fullStr | The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences |
title_full_unstemmed | The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences |
title_short | The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences |
title_sort | pinus taeda genome is characterized by diverse and highly diverged repetitive sequences |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2996948/ https://www.ncbi.nlm.nih.gov/pubmed/20609256 http://dx.doi.org/10.1186/1471-2164-11-420 |
work_keys_str_mv | AT kovachallen thepinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences AT wegrzynjilll thepinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences AT parragenis thepinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences AT holtcarson thepinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences AT brueninggeorgee thepinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences AT loopstracarola thepinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences AT hartiganjames thepinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences AT yandellmark thepinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences AT langleycharlesh thepinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences AT korfian thepinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences AT nealedavidb thepinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences AT kovachallen pinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences AT wegrzynjilll pinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences AT parragenis pinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences AT holtcarson pinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences AT brueninggeorgee pinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences AT loopstracarola pinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences AT hartiganjames pinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences AT yandellmark pinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences AT langleycharlesh pinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences AT korfian pinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences AT nealedavidb pinustaedagenomeischaracterizedbydiverseandhighlydivergedrepetitivesequences |