Cargando…
Long-read assembly of a Great Dane genome highlights the contribution of GC-rich sequence and mobile elements to canine genomes
Technological advances have allowed improvements in genome reference sequence assemblies. Here, we combined long- and short-read sequence resources to assemble the genome of a female Great Dane dog. This assembly has improved continuity compared to the existing Boxer-derived (CanFam3.1) reference ge...
Autores principales: | , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
National Academy of Sciences
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7980453/ https://www.ncbi.nlm.nih.gov/pubmed/33836575 http://dx.doi.org/10.1073/pnas.2016274118 |
_version_ | 1783667439769223168 |
---|---|
author | Halo, Julia V. Pendleton, Amanda L. Shen, Feichen Doucet, Aurélien J. Derrien, Thomas Hitte, Christophe Kirby, Laura E. Myers, Bridget Sliwerska, Elzbieta Emery, Sarah Moran, John V. Boyko, Adam R. Kidd, Jeffrey M. |
author_facet | Halo, Julia V. Pendleton, Amanda L. Shen, Feichen Doucet, Aurélien J. Derrien, Thomas Hitte, Christophe Kirby, Laura E. Myers, Bridget Sliwerska, Elzbieta Emery, Sarah Moran, John V. Boyko, Adam R. Kidd, Jeffrey M. |
author_sort | Halo, Julia V. |
collection | PubMed |
description | Technological advances have allowed improvements in genome reference sequence assemblies. Here, we combined long- and short-read sequence resources to assemble the genome of a female Great Dane dog. This assembly has improved continuity compared to the existing Boxer-derived (CanFam3.1) reference genome. Annotation of the Great Dane assembly identified 22,182 protein-coding gene models and 7,049 long noncoding RNAs, including 49 protein-coding genes not present in the CanFam3.1 reference. The Great Dane assembly spans the majority of sequence gaps in the CanFam3.1 reference and illustrates that 2,151 gaps overlap the transcription start site of a predicted protein-coding gene. Moreover, a subset of the resolved gaps, which have an 80.95% median GC content, localize to transcription start sites and recombination hotspots more often than expected by chance, suggesting the stable canine recombinational landscape has shaped genome architecture. Alignment of the Great Dane and CanFam3.1 assemblies identified 16,834 deletions and 15,621 insertions, as well as 2,665 deletions and 3,493 insertions located on secondary contigs. These structural variants are dominated by retrotransposon insertion/deletion polymorphisms and include 16,221 dimorphic canine short interspersed elements (SINECs) and 1,121 dimorphic long interspersed element-1 sequences (LINE-1_Cfs). Analysis of sequences flanking the 3′ end of LINE-1_Cfs (i.e., LINE-1_Cf 3′-transductions) suggests multiple retrotransposition-competent LINE-1_Cfs segregate among dog populations. Consistent with this conclusion, we demonstrate that a canine LINE-1_Cf element with intact open reading frames can retrotranspose its own RNA and that of a SINEC_Cf consensus sequence in cultured human cells, implicating ongoing retrotransposon activity as a driver of canine genetic variation. |
format | Online Article Text |
id | pubmed-7980453 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | National Academy of Sciences |
record_format | MEDLINE/PubMed |
spelling | pubmed-79804532021-03-26 Long-read assembly of a Great Dane genome highlights the contribution of GC-rich sequence and mobile elements to canine genomes Halo, Julia V. Pendleton, Amanda L. Shen, Feichen Doucet, Aurélien J. Derrien, Thomas Hitte, Christophe Kirby, Laura E. Myers, Bridget Sliwerska, Elzbieta Emery, Sarah Moran, John V. Boyko, Adam R. Kidd, Jeffrey M. Proc Natl Acad Sci U S A Biological Sciences Technological advances have allowed improvements in genome reference sequence assemblies. Here, we combined long- and short-read sequence resources to assemble the genome of a female Great Dane dog. This assembly has improved continuity compared to the existing Boxer-derived (CanFam3.1) reference genome. Annotation of the Great Dane assembly identified 22,182 protein-coding gene models and 7,049 long noncoding RNAs, including 49 protein-coding genes not present in the CanFam3.1 reference. The Great Dane assembly spans the majority of sequence gaps in the CanFam3.1 reference and illustrates that 2,151 gaps overlap the transcription start site of a predicted protein-coding gene. Moreover, a subset of the resolved gaps, which have an 80.95% median GC content, localize to transcription start sites and recombination hotspots more often than expected by chance, suggesting the stable canine recombinational landscape has shaped genome architecture. Alignment of the Great Dane and CanFam3.1 assemblies identified 16,834 deletions and 15,621 insertions, as well as 2,665 deletions and 3,493 insertions located on secondary contigs. These structural variants are dominated by retrotransposon insertion/deletion polymorphisms and include 16,221 dimorphic canine short interspersed elements (SINECs) and 1,121 dimorphic long interspersed element-1 sequences (LINE-1_Cfs). Analysis of sequences flanking the 3′ end of LINE-1_Cfs (i.e., LINE-1_Cf 3′-transductions) suggests multiple retrotransposition-competent LINE-1_Cfs segregate among dog populations. Consistent with this conclusion, we demonstrate that a canine LINE-1_Cf element with intact open reading frames can retrotranspose its own RNA and that of a SINEC_Cf consensus sequence in cultured human cells, implicating ongoing retrotransposon activity as a driver of canine genetic variation. National Academy of Sciences 2021-03-16 2021-03-08 /pmc/articles/PMC7980453/ /pubmed/33836575 http://dx.doi.org/10.1073/pnas.2016274118 Text en Copyright © 2021 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/ https://creativecommons.org/licenses/by-nc-nd/4.0/This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) . |
spellingShingle | Biological Sciences Halo, Julia V. Pendleton, Amanda L. Shen, Feichen Doucet, Aurélien J. Derrien, Thomas Hitte, Christophe Kirby, Laura E. Myers, Bridget Sliwerska, Elzbieta Emery, Sarah Moran, John V. Boyko, Adam R. Kidd, Jeffrey M. Long-read assembly of a Great Dane genome highlights the contribution of GC-rich sequence and mobile elements to canine genomes |
title | Long-read assembly of a Great Dane genome highlights the contribution of GC-rich sequence and mobile elements to canine genomes |
title_full | Long-read assembly of a Great Dane genome highlights the contribution of GC-rich sequence and mobile elements to canine genomes |
title_fullStr | Long-read assembly of a Great Dane genome highlights the contribution of GC-rich sequence and mobile elements to canine genomes |
title_full_unstemmed | Long-read assembly of a Great Dane genome highlights the contribution of GC-rich sequence and mobile elements to canine genomes |
title_short | Long-read assembly of a Great Dane genome highlights the contribution of GC-rich sequence and mobile elements to canine genomes |
title_sort | long-read assembly of a great dane genome highlights the contribution of gc-rich sequence and mobile elements to canine genomes |
topic | Biological Sciences |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7980453/ https://www.ncbi.nlm.nih.gov/pubmed/33836575 http://dx.doi.org/10.1073/pnas.2016274118 |
work_keys_str_mv | AT halojuliav longreadassemblyofagreatdanegenomehighlightsthecontributionofgcrichsequenceandmobileelementstocaninegenomes AT pendletonamandal longreadassemblyofagreatdanegenomehighlightsthecontributionofgcrichsequenceandmobileelementstocaninegenomes AT shenfeichen longreadassemblyofagreatdanegenomehighlightsthecontributionofgcrichsequenceandmobileelementstocaninegenomes AT doucetaurelienj longreadassemblyofagreatdanegenomehighlightsthecontributionofgcrichsequenceandmobileelementstocaninegenomes AT derrienthomas longreadassemblyofagreatdanegenomehighlightsthecontributionofgcrichsequenceandmobileelementstocaninegenomes AT hittechristophe longreadassemblyofagreatdanegenomehighlightsthecontributionofgcrichsequenceandmobileelementstocaninegenomes AT kirbylaurae longreadassemblyofagreatdanegenomehighlightsthecontributionofgcrichsequenceandmobileelementstocaninegenomes AT myersbridget longreadassemblyofagreatdanegenomehighlightsthecontributionofgcrichsequenceandmobileelementstocaninegenomes AT sliwerskaelzbieta longreadassemblyofagreatdanegenomehighlightsthecontributionofgcrichsequenceandmobileelementstocaninegenomes AT emerysarah longreadassemblyofagreatdanegenomehighlightsthecontributionofgcrichsequenceandmobileelementstocaninegenomes AT moranjohnv longreadassemblyofagreatdanegenomehighlightsthecontributionofgcrichsequenceandmobileelementstocaninegenomes AT boykoadamr longreadassemblyofagreatdanegenomehighlightsthecontributionofgcrichsequenceandmobileelementstocaninegenomes AT kiddjeffreym longreadassemblyofagreatdanegenomehighlightsthecontributionofgcrichsequenceandmobileelementstocaninegenomes |