Cargando…

Sequence properties of certain GC rich avian genes, their origins and absence from genome assemblies: case studies

BACKGROUND: More and more eukaryotic genomes are sequenced and assembled, most of them presented as a complete model in which missing chromosomal regions are filled by Ns and where a few chromosomes may be lacking. Avian genomes often contain sequences with high GC content, which has been hypothesiz...

Descripción completa

Detalles Bibliográficos
Autores principales: Beauclair, Linda, Ramé, Christelle, Arensburger, Peter, Piégu, Benoît, Guillou, Florian, Dupont, Joëlle, Bigot, Yves
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6792250/
https://www.ncbi.nlm.nih.gov/pubmed/31610792
http://dx.doi.org/10.1186/s12864-019-6131-1
_version_ 1783459110330564608
author Beauclair, Linda
Ramé, Christelle
Arensburger, Peter
Piégu, Benoît
Guillou, Florian
Dupont, Joëlle
Bigot, Yves
author_facet Beauclair, Linda
Ramé, Christelle
Arensburger, Peter
Piégu, Benoît
Guillou, Florian
Dupont, Joëlle
Bigot, Yves
author_sort Beauclair, Linda
collection PubMed
description BACKGROUND: More and more eukaryotic genomes are sequenced and assembled, most of them presented as a complete model in which missing chromosomal regions are filled by Ns and where a few chromosomes may be lacking. Avian genomes often contain sequences with high GC content, which has been hypothesized to be at the origin of many missing sequences in these genomes. We investigated features of these missing sequences to discover why some may not have been integrated into genomic libraries and/or sequenced. RESULTS: The sequences of five red jungle fowl cDNA models with high GC content were used as queries to search publicly available datasets of Illumina and Pacbio sequencing reads. These were used to reconstruct the leptin, TNFα, MRPL52, PCP2 and PET100 genes, all of which are absent from the red jungle fowl genome model. These gene sequences displayed elevated GC contents, had intron sizes that were sometimes larger than non-avian orthologues, and had non-coding regions that contained numerous tandem and inverted repeat sequences with motifs able to assemble into stable G-quadruplexes and intrastrand dyadic structures. Our results suggest that Illumina technology was unable to sequence the non-coding regions of these genes. On the other hand, PacBio technology was able to sequence these regions, but with dramatically lower efficiency than would typically be expected. CONCLUSIONS: High GC content was not the principal reason why numerous GC-rich regions of avian genomes are missing from genome assembly models. Instead, it is the presence of tandem repeats containing motifs capable of assembling into very stable secondary structures that is likely responsible.
format Online
Article
Text
id pubmed-6792250
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-67922502019-10-21 Sequence properties of certain GC rich avian genes, their origins and absence from genome assemblies: case studies Beauclair, Linda Ramé, Christelle Arensburger, Peter Piégu, Benoît Guillou, Florian Dupont, Joëlle Bigot, Yves BMC Genomics Research Article BACKGROUND: More and more eukaryotic genomes are sequenced and assembled, most of them presented as a complete model in which missing chromosomal regions are filled by Ns and where a few chromosomes may be lacking. Avian genomes often contain sequences with high GC content, which has been hypothesized to be at the origin of many missing sequences in these genomes. We investigated features of these missing sequences to discover why some may not have been integrated into genomic libraries and/or sequenced. RESULTS: The sequences of five red jungle fowl cDNA models with high GC content were used as queries to search publicly available datasets of Illumina and Pacbio sequencing reads. These were used to reconstruct the leptin, TNFα, MRPL52, PCP2 and PET100 genes, all of which are absent from the red jungle fowl genome model. These gene sequences displayed elevated GC contents, had intron sizes that were sometimes larger than non-avian orthologues, and had non-coding regions that contained numerous tandem and inverted repeat sequences with motifs able to assemble into stable G-quadruplexes and intrastrand dyadic structures. Our results suggest that Illumina technology was unable to sequence the non-coding regions of these genes. On the other hand, PacBio technology was able to sequence these regions, but with dramatically lower efficiency than would typically be expected. CONCLUSIONS: High GC content was not the principal reason why numerous GC-rich regions of avian genomes are missing from genome assembly models. Instead, it is the presence of tandem repeats containing motifs capable of assembling into very stable secondary structures that is likely responsible. BioMed Central 2019-10-14 /pmc/articles/PMC6792250/ /pubmed/31610792 http://dx.doi.org/10.1186/s12864-019-6131-1 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Beauclair, Linda
Ramé, Christelle
Arensburger, Peter
Piégu, Benoît
Guillou, Florian
Dupont, Joëlle
Bigot, Yves
Sequence properties of certain GC rich avian genes, their origins and absence from genome assemblies: case studies
title Sequence properties of certain GC rich avian genes, their origins and absence from genome assemblies: case studies
title_full Sequence properties of certain GC rich avian genes, their origins and absence from genome assemblies: case studies
title_fullStr Sequence properties of certain GC rich avian genes, their origins and absence from genome assemblies: case studies
title_full_unstemmed Sequence properties of certain GC rich avian genes, their origins and absence from genome assemblies: case studies
title_short Sequence properties of certain GC rich avian genes, their origins and absence from genome assemblies: case studies
title_sort sequence properties of certain gc rich avian genes, their origins and absence from genome assemblies: case studies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6792250/
https://www.ncbi.nlm.nih.gov/pubmed/31610792
http://dx.doi.org/10.1186/s12864-019-6131-1
work_keys_str_mv AT beauclairlinda sequencepropertiesofcertaingcrichaviangenestheiroriginsandabsencefromgenomeassembliescasestudies
AT ramechristelle sequencepropertiesofcertaingcrichaviangenestheiroriginsandabsencefromgenomeassembliescasestudies
AT arensburgerpeter sequencepropertiesofcertaingcrichaviangenestheiroriginsandabsencefromgenomeassembliescasestudies
AT piegubenoit sequencepropertiesofcertaingcrichaviangenestheiroriginsandabsencefromgenomeassembliescasestudies
AT guillouflorian sequencepropertiesofcertaingcrichaviangenestheiroriginsandabsencefromgenomeassembliescasestudies
AT dupontjoelle sequencepropertiesofcertaingcrichaviangenestheiroriginsandabsencefromgenomeassembliescasestudies
AT bigotyves sequencepropertiesofcertaingcrichaviangenestheiroriginsandabsencefromgenomeassembliescasestudies