Cargando…

hg19K: addressing a significant lacuna in hg19‐based variant calling

BACKGROUND: The hg19 assembly of the human genome is the most heavily annotated and most commonly used reference to make variant calls for individual genomes. Based on the phase 3 report of the 1000 genomes project (1000G), it is now well known that many positions in the hg19 genome represent minor...

Descripción completa

Detalles Bibliográficos
Autores principales: Karthikeyan, Savita, Bawa, Pushpinder S., Srinivasan, Subhashini
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5241214/
https://www.ncbi.nlm.nih.gov/pubmed/28116326
http://dx.doi.org/10.1002/mgg3.251
_version_ 1782496149254438912
author Karthikeyan, Savita
Bawa, Pushpinder S.
Srinivasan, Subhashini
author_facet Karthikeyan, Savita
Bawa, Pushpinder S.
Srinivasan, Subhashini
author_sort Karthikeyan, Savita
collection PubMed
description BACKGROUND: The hg19 assembly of the human genome is the most heavily annotated and most commonly used reference to make variant calls for individual genomes. Based on the phase 3 report of the 1000 genomes project (1000G), it is now well known that many positions in the hg19 genome represent minor alleles. Since commonly used variant call methods are developed under the assumption that hg19 reference harbors major alleles at all the ~3 billion positions, these methods mask the calls whenever an individual is homozygous to the minor allele at the respective positions. Hence, it is important to address the extent and impact of these minor alleles in hg19 from the point of view of individual genomes. METHOD: We have created a reference genome, hg19K, in which all the positions in hg19 reference harboring minor allele were replaced by those from the phase 3 report of the 1000 genomes project. The genomes of five individuals, downloaded from the public repository, were analyzed using both hg19 and hg19K and compared. RESULTS: Out of the 81 million SNPs in phase 3 report from the 1000 genomes project, 1.9 million positions were found to be major alleles compared to hg19 with many having an allele frequency of >0.9. We observed that ~30% of the SNVs found in individual genomes are confined to the 1.9 million positions. Also, there are ~8% unique SNVs predicted using hg19K‐based approach, which are also confined to the 1.9 million positions. CONCLUSION: We report that the presence of minor alleles in hg19 alone results in ~8% false negatives and ~30% false positives during variant calls. Also, among the variant calls unique to hg19K‐based methods, which are missed in individuals homozygous to the minor alleles in hg19‐based prediction, some are deleterious missense mutations at sites conserved across diverse species.
format Online
Article
Text
id pubmed-5241214
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-52412142017-01-23 hg19K: addressing a significant lacuna in hg19‐based variant calling Karthikeyan, Savita Bawa, Pushpinder S. Srinivasan, Subhashini Mol Genet Genomic Med Original Articles BACKGROUND: The hg19 assembly of the human genome is the most heavily annotated and most commonly used reference to make variant calls for individual genomes. Based on the phase 3 report of the 1000 genomes project (1000G), it is now well known that many positions in the hg19 genome represent minor alleles. Since commonly used variant call methods are developed under the assumption that hg19 reference harbors major alleles at all the ~3 billion positions, these methods mask the calls whenever an individual is homozygous to the minor allele at the respective positions. Hence, it is important to address the extent and impact of these minor alleles in hg19 from the point of view of individual genomes. METHOD: We have created a reference genome, hg19K, in which all the positions in hg19 reference harboring minor allele were replaced by those from the phase 3 report of the 1000 genomes project. The genomes of five individuals, downloaded from the public repository, were analyzed using both hg19 and hg19K and compared. RESULTS: Out of the 81 million SNPs in phase 3 report from the 1000 genomes project, 1.9 million positions were found to be major alleles compared to hg19 with many having an allele frequency of >0.9. We observed that ~30% of the SNVs found in individual genomes are confined to the 1.9 million positions. Also, there are ~8% unique SNVs predicted using hg19K‐based approach, which are also confined to the 1.9 million positions. CONCLUSION: We report that the presence of minor alleles in hg19 alone results in ~8% false negatives and ~30% false positives during variant calls. Also, among the variant calls unique to hg19K‐based methods, which are missed in individuals homozygous to the minor alleles in hg19‐based prediction, some are deleterious missense mutations at sites conserved across diverse species. John Wiley and Sons Inc. 2016-11-13 /pmc/articles/PMC5241214/ /pubmed/28116326 http://dx.doi.org/10.1002/mgg3.251 Text en © 2016 The Authors. Molecular Genetics & Genomic Medicine published by Wiley Periodicals, Inc. This is an open access article under the terms of the Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Articles
Karthikeyan, Savita
Bawa, Pushpinder S.
Srinivasan, Subhashini
hg19K: addressing a significant lacuna in hg19‐based variant calling
title hg19K: addressing a significant lacuna in hg19‐based variant calling
title_full hg19K: addressing a significant lacuna in hg19‐based variant calling
title_fullStr hg19K: addressing a significant lacuna in hg19‐based variant calling
title_full_unstemmed hg19K: addressing a significant lacuna in hg19‐based variant calling
title_short hg19K: addressing a significant lacuna in hg19‐based variant calling
title_sort hg19k: addressing a significant lacuna in hg19‐based variant calling
topic Original Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5241214/
https://www.ncbi.nlm.nih.gov/pubmed/28116326
http://dx.doi.org/10.1002/mgg3.251
work_keys_str_mv AT karthikeyansavita hg19kaddressingasignificantlacunainhg19basedvariantcalling
AT bawapushpinders hg19kaddressingasignificantlacunainhg19basedvariantcalling
AT srinivasansubhashini hg19kaddressingasignificantlacunainhg19basedvariantcalling