Cargando…

hg19KIndel: ethnicity normalized human reference genome

BACKGROUND: The most widely used human genome reference assembly hg19 harbors minor alleles at 2.18 million positions as revealed by 1000 Genome Phase 3 dataset. Although this is less than 2% of the 89 million variants reported, it has been shown that the minor alleles can result in 30% false positi...

Descripción completa

Detalles Bibliográficos
Autores principales: Shukla, Harsh G., Bawa, Pushpinder Singh, Srinivasan, Subhashini
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6555027/
https://www.ncbi.nlm.nih.gov/pubmed/31170919
http://dx.doi.org/10.1186/s12864-019-5854-3
_version_ 1783425077335818240
author Shukla, Harsh G.
Bawa, Pushpinder Singh
Srinivasan, Subhashini
author_facet Shukla, Harsh G.
Bawa, Pushpinder Singh
Srinivasan, Subhashini
author_sort Shukla, Harsh G.
collection PubMed
description BACKGROUND: The most widely used human genome reference assembly hg19 harbors minor alleles at 2.18 million positions as revealed by 1000 Genome Phase 3 dataset. Although this is less than 2% of the 89 million variants reported, it has been shown that the minor alleles can result in 30% false positives in individual genomes, thus misleading and burdening downstream interpretation. More alarming is the fact that, significant percentage of variants that are homozygous recessive for these minor alleles, with potential disease implications, are masked from reporting. RESULTS: We have demonstrated that the false positives (FP) and false negatives (FN) can be corrected for by simply replacing nucleotides at the minor allele positions in hg19 with corresponding major allele. Here, we have effectively replaced 2.18 million minor alleles Single Nucleotide Polymorphism (SNPs), Insertion and Deletions (INDELs), Multiple Nucleotide Polymorphism (MNPs) in hg19 with the corresponding major alleles to create an ethnically normalized reference genome called hg19KIndel. In doing so, hg19KIndel has both corrected for sequencing errors acknowledged to be present in hg19 and has improved read alignment near the minor alleles in hg19. CONCLUSION: We have created and made available a new version human reference genome called hg19KIndel. It has been shown that variant calling using hg19KIndel, significantly reduces false positives calls, which in-turn reduces the burden from downstream analysis and validation. It also improved false negative variants call, which means that the variants which were getting missed due to the presence of minor alleles in hg19, will now be called using hg19KIndel. Using hg19KIndel, one even gets a better mapping percentage when compared to currently available human reference genome. hg19KIndel reference genome and its auxiliary datasets are available at 10.5281/zenodo.2638113
format Online
Article
Text
id pubmed-6555027
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-65550272019-06-10 hg19KIndel: ethnicity normalized human reference genome Shukla, Harsh G. Bawa, Pushpinder Singh Srinivasan, Subhashini BMC Genomics Research Article BACKGROUND: The most widely used human genome reference assembly hg19 harbors minor alleles at 2.18 million positions as revealed by 1000 Genome Phase 3 dataset. Although this is less than 2% of the 89 million variants reported, it has been shown that the minor alleles can result in 30% false positives in individual genomes, thus misleading and burdening downstream interpretation. More alarming is the fact that, significant percentage of variants that are homozygous recessive for these minor alleles, with potential disease implications, are masked from reporting. RESULTS: We have demonstrated that the false positives (FP) and false negatives (FN) can be corrected for by simply replacing nucleotides at the minor allele positions in hg19 with corresponding major allele. Here, we have effectively replaced 2.18 million minor alleles Single Nucleotide Polymorphism (SNPs), Insertion and Deletions (INDELs), Multiple Nucleotide Polymorphism (MNPs) in hg19 with the corresponding major alleles to create an ethnically normalized reference genome called hg19KIndel. In doing so, hg19KIndel has both corrected for sequencing errors acknowledged to be present in hg19 and has improved read alignment near the minor alleles in hg19. CONCLUSION: We have created and made available a new version human reference genome called hg19KIndel. It has been shown that variant calling using hg19KIndel, significantly reduces false positives calls, which in-turn reduces the burden from downstream analysis and validation. It also improved false negative variants call, which means that the variants which were getting missed due to the presence of minor alleles in hg19, will now be called using hg19KIndel. Using hg19KIndel, one even gets a better mapping percentage when compared to currently available human reference genome. hg19KIndel reference genome and its auxiliary datasets are available at 10.5281/zenodo.2638113 BioMed Central 2019-06-06 /pmc/articles/PMC6555027/ /pubmed/31170919 http://dx.doi.org/10.1186/s12864-019-5854-3 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Shukla, Harsh G.
Bawa, Pushpinder Singh
Srinivasan, Subhashini
hg19KIndel: ethnicity normalized human reference genome
title hg19KIndel: ethnicity normalized human reference genome
title_full hg19KIndel: ethnicity normalized human reference genome
title_fullStr hg19KIndel: ethnicity normalized human reference genome
title_full_unstemmed hg19KIndel: ethnicity normalized human reference genome
title_short hg19KIndel: ethnicity normalized human reference genome
title_sort hg19kindel: ethnicity normalized human reference genome
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6555027/
https://www.ncbi.nlm.nih.gov/pubmed/31170919
http://dx.doi.org/10.1186/s12864-019-5854-3
work_keys_str_mv AT shuklaharshg hg19kindelethnicitynormalizedhumanreferencegenome
AT bawapushpindersingh hg19kindelethnicitynormalizedhumanreferencegenome
AT srinivasansubhashini hg19kindelethnicitynormalizedhumanreferencegenome