Cargando…

Long-read assembly of a Great Dane genome highlights the contribution of GC-rich sequence and mobile elements to canine genomes

Technological advances have allowed improvements in genome reference sequence assemblies. Here, we combined long- and short-read sequence resources to assemble the genome of a female Great Dane dog. This assembly has improved continuity compared to the existing Boxer-derived (CanFam3.1) reference ge...

Descripción completa

Detalles Bibliográficos
Autores principales: Halo, Julia V., Pendleton, Amanda L., Shen, Feichen, Doucet, Aurélien J., Derrien, Thomas, Hitte, Christophe, Kirby, Laura E., Myers, Bridget, Sliwerska, Elzbieta, Emery, Sarah, Moran, John V., Boyko, Adam R., Kidd, Jeffrey M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: National Academy of Sciences 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7980453/
https://www.ncbi.nlm.nih.gov/pubmed/33836575
http://dx.doi.org/10.1073/pnas.2016274118
_version_ 1783667439769223168
author Halo, Julia V.
Pendleton, Amanda L.
Shen, Feichen
Doucet, Aurélien J.
Derrien, Thomas
Hitte, Christophe
Kirby, Laura E.
Myers, Bridget
Sliwerska, Elzbieta
Emery, Sarah
Moran, John V.
Boyko, Adam R.
Kidd, Jeffrey M.
author_facet Halo, Julia V.
Pendleton, Amanda L.
Shen, Feichen
Doucet, Aurélien J.
Derrien, Thomas
Hitte, Christophe
Kirby, Laura E.
Myers, Bridget
Sliwerska, Elzbieta
Emery, Sarah
Moran, John V.
Boyko, Adam R.
Kidd, Jeffrey M.
author_sort Halo, Julia V.
collection PubMed
description Technological advances have allowed improvements in genome reference sequence assemblies. Here, we combined long- and short-read sequence resources to assemble the genome of a female Great Dane dog. This assembly has improved continuity compared to the existing Boxer-derived (CanFam3.1) reference genome. Annotation of the Great Dane assembly identified 22,182 protein-coding gene models and 7,049 long noncoding RNAs, including 49 protein-coding genes not present in the CanFam3.1 reference. The Great Dane assembly spans the majority of sequence gaps in the CanFam3.1 reference and illustrates that 2,151 gaps overlap the transcription start site of a predicted protein-coding gene. Moreover, a subset of the resolved gaps, which have an 80.95% median GC content, localize to transcription start sites and recombination hotspots more often than expected by chance, suggesting the stable canine recombinational landscape has shaped genome architecture. Alignment of the Great Dane and CanFam3.1 assemblies identified 16,834 deletions and 15,621 insertions, as well as 2,665 deletions and 3,493 insertions located on secondary contigs. These structural variants are dominated by retrotransposon insertion/deletion polymorphisms and include 16,221 dimorphic canine short interspersed elements (SINECs) and 1,121 dimorphic long interspersed element-1 sequences (LINE-1_Cfs). Analysis of sequences flanking the 3′ end of LINE-1_Cfs (i.e., LINE-1_Cf 3′-transductions) suggests multiple retrotransposition-competent LINE-1_Cfs segregate among dog populations. Consistent with this conclusion, we demonstrate that a canine LINE-1_Cf element with intact open reading frames can retrotranspose its own RNA and that of a SINEC_Cf consensus sequence in cultured human cells, implicating ongoing retrotransposon activity as a driver of canine genetic variation.
format Online
Article
Text
id pubmed-7980453
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher National Academy of Sciences
record_format MEDLINE/PubMed
spelling pubmed-79804532021-03-26 Long-read assembly of a Great Dane genome highlights the contribution of GC-rich sequence and mobile elements to canine genomes Halo, Julia V. Pendleton, Amanda L. Shen, Feichen Doucet, Aurélien J. Derrien, Thomas Hitte, Christophe Kirby, Laura E. Myers, Bridget Sliwerska, Elzbieta Emery, Sarah Moran, John V. Boyko, Adam R. Kidd, Jeffrey M. Proc Natl Acad Sci U S A Biological Sciences Technological advances have allowed improvements in genome reference sequence assemblies. Here, we combined long- and short-read sequence resources to assemble the genome of a female Great Dane dog. This assembly has improved continuity compared to the existing Boxer-derived (CanFam3.1) reference genome. Annotation of the Great Dane assembly identified 22,182 protein-coding gene models and 7,049 long noncoding RNAs, including 49 protein-coding genes not present in the CanFam3.1 reference. The Great Dane assembly spans the majority of sequence gaps in the CanFam3.1 reference and illustrates that 2,151 gaps overlap the transcription start site of a predicted protein-coding gene. Moreover, a subset of the resolved gaps, which have an 80.95% median GC content, localize to transcription start sites and recombination hotspots more often than expected by chance, suggesting the stable canine recombinational landscape has shaped genome architecture. Alignment of the Great Dane and CanFam3.1 assemblies identified 16,834 deletions and 15,621 insertions, as well as 2,665 deletions and 3,493 insertions located on secondary contigs. These structural variants are dominated by retrotransposon insertion/deletion polymorphisms and include 16,221 dimorphic canine short interspersed elements (SINECs) and 1,121 dimorphic long interspersed element-1 sequences (LINE-1_Cfs). Analysis of sequences flanking the 3′ end of LINE-1_Cfs (i.e., LINE-1_Cf 3′-transductions) suggests multiple retrotransposition-competent LINE-1_Cfs segregate among dog populations. Consistent with this conclusion, we demonstrate that a canine LINE-1_Cf element with intact open reading frames can retrotranspose its own RNA and that of a SINEC_Cf consensus sequence in cultured human cells, implicating ongoing retrotransposon activity as a driver of canine genetic variation. National Academy of Sciences 2021-03-16 2021-03-08 /pmc/articles/PMC7980453/ /pubmed/33836575 http://dx.doi.org/10.1073/pnas.2016274118 Text en Copyright © 2021 the Author(s). Published by PNAS. https://creativecommons.org/licenses/by-nc-nd/4.0/ https://creativecommons.org/licenses/by-nc-nd/4.0/This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND) (https://creativecommons.org/licenses/by-nc-nd/4.0/) .
spellingShingle Biological Sciences
Halo, Julia V.
Pendleton, Amanda L.
Shen, Feichen
Doucet, Aurélien J.
Derrien, Thomas
Hitte, Christophe
Kirby, Laura E.
Myers, Bridget
Sliwerska, Elzbieta
Emery, Sarah
Moran, John V.
Boyko, Adam R.
Kidd, Jeffrey M.
Long-read assembly of a Great Dane genome highlights the contribution of GC-rich sequence and mobile elements to canine genomes
title Long-read assembly of a Great Dane genome highlights the contribution of GC-rich sequence and mobile elements to canine genomes
title_full Long-read assembly of a Great Dane genome highlights the contribution of GC-rich sequence and mobile elements to canine genomes
title_fullStr Long-read assembly of a Great Dane genome highlights the contribution of GC-rich sequence and mobile elements to canine genomes
title_full_unstemmed Long-read assembly of a Great Dane genome highlights the contribution of GC-rich sequence and mobile elements to canine genomes
title_short Long-read assembly of a Great Dane genome highlights the contribution of GC-rich sequence and mobile elements to canine genomes
title_sort long-read assembly of a great dane genome highlights the contribution of gc-rich sequence and mobile elements to canine genomes
topic Biological Sciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7980453/
https://www.ncbi.nlm.nih.gov/pubmed/33836575
http://dx.doi.org/10.1073/pnas.2016274118
work_keys_str_mv AT halojuliav longreadassemblyofagreatdanegenomehighlightsthecontributionofgcrichsequenceandmobileelementstocaninegenomes
AT pendletonamandal longreadassemblyofagreatdanegenomehighlightsthecontributionofgcrichsequenceandmobileelementstocaninegenomes
AT shenfeichen longreadassemblyofagreatdanegenomehighlightsthecontributionofgcrichsequenceandmobileelementstocaninegenomes
AT doucetaurelienj longreadassemblyofagreatdanegenomehighlightsthecontributionofgcrichsequenceandmobileelementstocaninegenomes
AT derrienthomas longreadassemblyofagreatdanegenomehighlightsthecontributionofgcrichsequenceandmobileelementstocaninegenomes
AT hittechristophe longreadassemblyofagreatdanegenomehighlightsthecontributionofgcrichsequenceandmobileelementstocaninegenomes
AT kirbylaurae longreadassemblyofagreatdanegenomehighlightsthecontributionofgcrichsequenceandmobileelementstocaninegenomes
AT myersbridget longreadassemblyofagreatdanegenomehighlightsthecontributionofgcrichsequenceandmobileelementstocaninegenomes
AT sliwerskaelzbieta longreadassemblyofagreatdanegenomehighlightsthecontributionofgcrichsequenceandmobileelementstocaninegenomes
AT emerysarah longreadassemblyofagreatdanegenomehighlightsthecontributionofgcrichsequenceandmobileelementstocaninegenomes
AT moranjohnv longreadassemblyofagreatdanegenomehighlightsthecontributionofgcrichsequenceandmobileelementstocaninegenomes
AT boykoadamr longreadassemblyofagreatdanegenomehighlightsthecontributionofgcrichsequenceandmobileelementstocaninegenomes
AT kiddjeffreym longreadassemblyofagreatdanegenomehighlightsthecontributionofgcrichsequenceandmobileelementstocaninegenomes