Cargando…
De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data
The current human reference sequence (GRCh38) is a foundation for large-scale sequencing projects. However, recent studies have suggested that GRCh38 may be incomplete and give a suboptimal representation of specific population groups. Here, we performed a de novo assembly of two Swedish genomes tha...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6210158/ https://www.ncbi.nlm.nih.gov/pubmed/30304863 http://dx.doi.org/10.3390/genes9100486 |
_version_ | 1783367050054336512 |
---|---|
author | Ameur, Adam Che, Huiwen Martin, Marcel Bunikis, Ignas Dahlberg, Johan Höijer, Ida Häggqvist, Susana Vezzi, Francesco Nordlund, Jessica Olason, Pall Feuk, Lars Gyllensten, Ulf |
author_facet | Ameur, Adam Che, Huiwen Martin, Marcel Bunikis, Ignas Dahlberg, Johan Höijer, Ida Häggqvist, Susana Vezzi, Francesco Nordlund, Jessica Olason, Pall Feuk, Lars Gyllensten, Ulf |
author_sort | Ameur, Adam |
collection | PubMed |
description | The current human reference sequence (GRCh38) is a foundation for large-scale sequencing projects. However, recent studies have suggested that GRCh38 may be incomplete and give a suboptimal representation of specific population groups. Here, we performed a de novo assembly of two Swedish genomes that revealed over 10 Mb of sequences absent from the human GRCh38 reference in each individual. Around 6 Mb of these novel sequences (NS) are shared with a Chinese personal genome. The NS are highly repetitive, have an elevated GC-content, and are primarily located in centromeric or telomeric regions. Up to 1 Mb of NS can be assigned to chromosome Y, and large segments are also missing from GRCh38 at chromosomes 14, 17, and 21. Inclusion of NS into the GRCh38 reference radically improves the alignment and variant calling from short-read whole-genome sequencing data at several genomic loci. A re-analysis of a Swedish population-scale sequencing project yields > 75,000 putative novel single nucleotide variants (SNVs) and removes > 10,000 false positive SNV calls per individual, some of which are located in protein coding regions. Our results highlight that the GRCh38 reference is not yet complete and demonstrate that personal genome assemblies from local populations can improve the analysis of short-read whole-genome sequencing data. |
format | Online Article Text |
id | pubmed-6210158 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-62101582018-11-02 De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data Ameur, Adam Che, Huiwen Martin, Marcel Bunikis, Ignas Dahlberg, Johan Höijer, Ida Häggqvist, Susana Vezzi, Francesco Nordlund, Jessica Olason, Pall Feuk, Lars Gyllensten, Ulf Genes (Basel) Article The current human reference sequence (GRCh38) is a foundation for large-scale sequencing projects. However, recent studies have suggested that GRCh38 may be incomplete and give a suboptimal representation of specific population groups. Here, we performed a de novo assembly of two Swedish genomes that revealed over 10 Mb of sequences absent from the human GRCh38 reference in each individual. Around 6 Mb of these novel sequences (NS) are shared with a Chinese personal genome. The NS are highly repetitive, have an elevated GC-content, and are primarily located in centromeric or telomeric regions. Up to 1 Mb of NS can be assigned to chromosome Y, and large segments are also missing from GRCh38 at chromosomes 14, 17, and 21. Inclusion of NS into the GRCh38 reference radically improves the alignment and variant calling from short-read whole-genome sequencing data at several genomic loci. A re-analysis of a Swedish population-scale sequencing project yields > 75,000 putative novel single nucleotide variants (SNVs) and removes > 10,000 false positive SNV calls per individual, some of which are located in protein coding regions. Our results highlight that the GRCh38 reference is not yet complete and demonstrate that personal genome assemblies from local populations can improve the analysis of short-read whole-genome sequencing data. MDPI 2018-10-09 /pmc/articles/PMC6210158/ /pubmed/30304863 http://dx.doi.org/10.3390/genes9100486 Text en © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Ameur, Adam Che, Huiwen Martin, Marcel Bunikis, Ignas Dahlberg, Johan Höijer, Ida Häggqvist, Susana Vezzi, Francesco Nordlund, Jessica Olason, Pall Feuk, Lars Gyllensten, Ulf De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data |
title | De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data |
title_full | De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data |
title_fullStr | De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data |
title_full_unstemmed | De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data |
title_short | De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data |
title_sort | de novo assembly of two swedish genomes reveals missing segments from the human grch38 reference and improves variant calling of population-scale sequencing data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6210158/ https://www.ncbi.nlm.nih.gov/pubmed/30304863 http://dx.doi.org/10.3390/genes9100486 |
work_keys_str_mv | AT ameuradam denovoassemblyoftwoswedishgenomesrevealsmissingsegmentsfromthehumangrch38referenceandimprovesvariantcallingofpopulationscalesequencingdata AT chehuiwen denovoassemblyoftwoswedishgenomesrevealsmissingsegmentsfromthehumangrch38referenceandimprovesvariantcallingofpopulationscalesequencingdata AT martinmarcel denovoassemblyoftwoswedishgenomesrevealsmissingsegmentsfromthehumangrch38referenceandimprovesvariantcallingofpopulationscalesequencingdata AT bunikisignas denovoassemblyoftwoswedishgenomesrevealsmissingsegmentsfromthehumangrch38referenceandimprovesvariantcallingofpopulationscalesequencingdata AT dahlbergjohan denovoassemblyoftwoswedishgenomesrevealsmissingsegmentsfromthehumangrch38referenceandimprovesvariantcallingofpopulationscalesequencingdata AT hoijerida denovoassemblyoftwoswedishgenomesrevealsmissingsegmentsfromthehumangrch38referenceandimprovesvariantcallingofpopulationscalesequencingdata AT haggqvistsusana denovoassemblyoftwoswedishgenomesrevealsmissingsegmentsfromthehumangrch38referenceandimprovesvariantcallingofpopulationscalesequencingdata AT vezzifrancesco denovoassemblyoftwoswedishgenomesrevealsmissingsegmentsfromthehumangrch38referenceandimprovesvariantcallingofpopulationscalesequencingdata AT nordlundjessica denovoassemblyoftwoswedishgenomesrevealsmissingsegmentsfromthehumangrch38referenceandimprovesvariantcallingofpopulationscalesequencingdata AT olasonpall denovoassemblyoftwoswedishgenomesrevealsmissingsegmentsfromthehumangrch38referenceandimprovesvariantcallingofpopulationscalesequencingdata AT feuklars denovoassemblyoftwoswedishgenomesrevealsmissingsegmentsfromthehumangrch38referenceandimprovesvariantcallingofpopulationscalesequencingdata AT gyllenstenulf denovoassemblyoftwoswedishgenomesrevealsmissingsegmentsfromthehumangrch38referenceandimprovesvariantcallingofpopulationscalesequencingdata |