Cargando…

Discovery of Novel Sequences in 1,000 Swedish Genomes

Novel sequences (NSs), not present in the human reference genome, are abundant and remain largely unexplored. Here, we utilize de novo assembly to study NS in 1,000 Swedish individuals first sequenced as part of the SweGen project revealing a total of 46 Mb in 61,044 distinct contigs of sequences no...

Descripción completa

Detalles Bibliográficos
Autores principales: Eisfeldt, Jesper, Mårtensson, Gustaf, Ameur, Adam, Nilsson, Daniel, Lindstrand, Anna
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6984370/
https://www.ncbi.nlm.nih.gov/pubmed/31560401
http://dx.doi.org/10.1093/molbev/msz176
_version_ 1783491642590756864
author Eisfeldt, Jesper
Mårtensson, Gustaf
Ameur, Adam
Nilsson, Daniel
Lindstrand, Anna
author_facet Eisfeldt, Jesper
Mårtensson, Gustaf
Ameur, Adam
Nilsson, Daniel
Lindstrand, Anna
author_sort Eisfeldt, Jesper
collection PubMed
description Novel sequences (NSs), not present in the human reference genome, are abundant and remain largely unexplored. Here, we utilize de novo assembly to study NS in 1,000 Swedish individuals first sequenced as part of the SweGen project revealing a total of 46 Mb in 61,044 distinct contigs of sequences not present in GRCh38. The contigs were aligned to recently published catalogs of Icelandic and Pan-African NSs, as well as the chimpanzee genome, revealing a great diversity of shared sequences. Analyzing the positioning of NS across the chimpanzee genome, we find that 2,807 NS align confidently within 143 chimpanzee orthologs of human genes. Aligning the whole genome sequencing data to the chimpanzee genome, we discover ancestral NS common throughout the Swedish population. The NSs were searched for repeats and repeat elements: revealing a majority of repetitive sequence (56%), and enrichment of simple repeats (28%) and satellites (15%). Lastly, we align the unmappable reads of a subset of the thousand genomes data to our collection of NS, as well as the previously published Pan-African NS: revealing that both the Swedish and Pan-African NS are widespread, and that the Swedish NSs are largely a subset of the Pan-African NS. Overall, these results highlight the importance of creating a more diverse reference genome and illustrate that significant amounts of the NS may be of ancestral origin.
format Online
Article
Text
id pubmed-6984370
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-69843702020-01-30 Discovery of Novel Sequences in 1,000 Swedish Genomes Eisfeldt, Jesper Mårtensson, Gustaf Ameur, Adam Nilsson, Daniel Lindstrand, Anna Mol Biol Evol Discoveries Novel sequences (NSs), not present in the human reference genome, are abundant and remain largely unexplored. Here, we utilize de novo assembly to study NS in 1,000 Swedish individuals first sequenced as part of the SweGen project revealing a total of 46 Mb in 61,044 distinct contigs of sequences not present in GRCh38. The contigs were aligned to recently published catalogs of Icelandic and Pan-African NSs, as well as the chimpanzee genome, revealing a great diversity of shared sequences. Analyzing the positioning of NS across the chimpanzee genome, we find that 2,807 NS align confidently within 143 chimpanzee orthologs of human genes. Aligning the whole genome sequencing data to the chimpanzee genome, we discover ancestral NS common throughout the Swedish population. The NSs were searched for repeats and repeat elements: revealing a majority of repetitive sequence (56%), and enrichment of simple repeats (28%) and satellites (15%). Lastly, we align the unmappable reads of a subset of the thousand genomes data to our collection of NS, as well as the previously published Pan-African NS: revealing that both the Swedish and Pan-African NS are widespread, and that the Swedish NSs are largely a subset of the Pan-African NS. Overall, these results highlight the importance of creating a more diverse reference genome and illustrate that significant amounts of the NS may be of ancestral origin. Oxford University Press 2020-01 2019-09-24 /pmc/articles/PMC6984370/ /pubmed/31560401 http://dx.doi.org/10.1093/molbev/msz176 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Discoveries
Eisfeldt, Jesper
Mårtensson, Gustaf
Ameur, Adam
Nilsson, Daniel
Lindstrand, Anna
Discovery of Novel Sequences in 1,000 Swedish Genomes
title Discovery of Novel Sequences in 1,000 Swedish Genomes
title_full Discovery of Novel Sequences in 1,000 Swedish Genomes
title_fullStr Discovery of Novel Sequences in 1,000 Swedish Genomes
title_full_unstemmed Discovery of Novel Sequences in 1,000 Swedish Genomes
title_short Discovery of Novel Sequences in 1,000 Swedish Genomes
title_sort discovery of novel sequences in 1,000 swedish genomes
topic Discoveries
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6984370/
https://www.ncbi.nlm.nih.gov/pubmed/31560401
http://dx.doi.org/10.1093/molbev/msz176
work_keys_str_mv AT eisfeldtjesper discoveryofnovelsequencesin1000swedishgenomes
AT martenssongustaf discoveryofnovelsequencesin1000swedishgenomes
AT ameuradam discoveryofnovelsequencesin1000swedishgenomes
AT nilssondaniel discoveryofnovelsequencesin1000swedishgenomes
AT lindstrandanna discoveryofnovelsequencesin1000swedishgenomes