Cargando…
Discovery of Novel Sequences in 1,000 Swedish Genomes
Novel sequences (NSs), not present in the human reference genome, are abundant and remain largely unexplored. Here, we utilize de novo assembly to study NS in 1,000 Swedish individuals first sequenced as part of the SweGen project revealing a total of 46 Mb in 61,044 distinct contigs of sequences no...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6984370/ https://www.ncbi.nlm.nih.gov/pubmed/31560401 http://dx.doi.org/10.1093/molbev/msz176 |
_version_ | 1783491642590756864 |
---|---|
author | Eisfeldt, Jesper Mårtensson, Gustaf Ameur, Adam Nilsson, Daniel Lindstrand, Anna |
author_facet | Eisfeldt, Jesper Mårtensson, Gustaf Ameur, Adam Nilsson, Daniel Lindstrand, Anna |
author_sort | Eisfeldt, Jesper |
collection | PubMed |
description | Novel sequences (NSs), not present in the human reference genome, are abundant and remain largely unexplored. Here, we utilize de novo assembly to study NS in 1,000 Swedish individuals first sequenced as part of the SweGen project revealing a total of 46 Mb in 61,044 distinct contigs of sequences not present in GRCh38. The contigs were aligned to recently published catalogs of Icelandic and Pan-African NSs, as well as the chimpanzee genome, revealing a great diversity of shared sequences. Analyzing the positioning of NS across the chimpanzee genome, we find that 2,807 NS align confidently within 143 chimpanzee orthologs of human genes. Aligning the whole genome sequencing data to the chimpanzee genome, we discover ancestral NS common throughout the Swedish population. The NSs were searched for repeats and repeat elements: revealing a majority of repetitive sequence (56%), and enrichment of simple repeats (28%) and satellites (15%). Lastly, we align the unmappable reads of a subset of the thousand genomes data to our collection of NS, as well as the previously published Pan-African NS: revealing that both the Swedish and Pan-African NS are widespread, and that the Swedish NSs are largely a subset of the Pan-African NS. Overall, these results highlight the importance of creating a more diverse reference genome and illustrate that significant amounts of the NS may be of ancestral origin. |
format | Online Article Text |
id | pubmed-6984370 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-69843702020-01-30 Discovery of Novel Sequences in 1,000 Swedish Genomes Eisfeldt, Jesper Mårtensson, Gustaf Ameur, Adam Nilsson, Daniel Lindstrand, Anna Mol Biol Evol Discoveries Novel sequences (NSs), not present in the human reference genome, are abundant and remain largely unexplored. Here, we utilize de novo assembly to study NS in 1,000 Swedish individuals first sequenced as part of the SweGen project revealing a total of 46 Mb in 61,044 distinct contigs of sequences not present in GRCh38. The contigs were aligned to recently published catalogs of Icelandic and Pan-African NSs, as well as the chimpanzee genome, revealing a great diversity of shared sequences. Analyzing the positioning of NS across the chimpanzee genome, we find that 2,807 NS align confidently within 143 chimpanzee orthologs of human genes. Aligning the whole genome sequencing data to the chimpanzee genome, we discover ancestral NS common throughout the Swedish population. The NSs were searched for repeats and repeat elements: revealing a majority of repetitive sequence (56%), and enrichment of simple repeats (28%) and satellites (15%). Lastly, we align the unmappable reads of a subset of the thousand genomes data to our collection of NS, as well as the previously published Pan-African NS: revealing that both the Swedish and Pan-African NS are widespread, and that the Swedish NSs are largely a subset of the Pan-African NS. Overall, these results highlight the importance of creating a more diverse reference genome and illustrate that significant amounts of the NS may be of ancestral origin. Oxford University Press 2020-01 2019-09-24 /pmc/articles/PMC6984370/ /pubmed/31560401 http://dx.doi.org/10.1093/molbev/msz176 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Discoveries Eisfeldt, Jesper Mårtensson, Gustaf Ameur, Adam Nilsson, Daniel Lindstrand, Anna Discovery of Novel Sequences in 1,000 Swedish Genomes |
title | Discovery of Novel Sequences in 1,000 Swedish Genomes |
title_full | Discovery of Novel Sequences in 1,000 Swedish Genomes |
title_fullStr | Discovery of Novel Sequences in 1,000 Swedish Genomes |
title_full_unstemmed | Discovery of Novel Sequences in 1,000 Swedish Genomes |
title_short | Discovery of Novel Sequences in 1,000 Swedish Genomes |
title_sort | discovery of novel sequences in 1,000 swedish genomes |
topic | Discoveries |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6984370/ https://www.ncbi.nlm.nih.gov/pubmed/31560401 http://dx.doi.org/10.1093/molbev/msz176 |
work_keys_str_mv | AT eisfeldtjesper discoveryofnovelsequencesin1000swedishgenomes AT martenssongustaf discoveryofnovelsequencesin1000swedishgenomes AT ameuradam discoveryofnovelsequencesin1000swedishgenomes AT nilssondaniel discoveryofnovelsequencesin1000swedishgenomes AT lindstrandanna discoveryofnovelsequencesin1000swedishgenomes |