Cargando…

hg19K: addressing a significant lacuna in hg19‐based variant calling

BACKGROUND: The hg19 assembly of the human genome is the most heavily annotated and most commonly used reference to make variant calls for individual genomes. Based on the phase 3 report of the 1000 genomes project (1000G), it is now well known that many positions in the hg19 genome represent minor...

Descripción completa

Detalles Bibliográficos
Autores principales: Karthikeyan, Savita, Bawa, Pushpinder S., Srinivasan, Subhashini
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5241214/
https://www.ncbi.nlm.nih.gov/pubmed/28116326
http://dx.doi.org/10.1002/mgg3.251
Descripción
Sumario:BACKGROUND: The hg19 assembly of the human genome is the most heavily annotated and most commonly used reference to make variant calls for individual genomes. Based on the phase 3 report of the 1000 genomes project (1000G), it is now well known that many positions in the hg19 genome represent minor alleles. Since commonly used variant call methods are developed under the assumption that hg19 reference harbors major alleles at all the ~3 billion positions, these methods mask the calls whenever an individual is homozygous to the minor allele at the respective positions. Hence, it is important to address the extent and impact of these minor alleles in hg19 from the point of view of individual genomes. METHOD: We have created a reference genome, hg19K, in which all the positions in hg19 reference harboring minor allele were replaced by those from the phase 3 report of the 1000 genomes project. The genomes of five individuals, downloaded from the public repository, were analyzed using both hg19 and hg19K and compared. RESULTS: Out of the 81 million SNPs in phase 3 report from the 1000 genomes project, 1.9 million positions were found to be major alleles compared to hg19 with many having an allele frequency of >0.9. We observed that ~30% of the SNVs found in individual genomes are confined to the 1.9 million positions. Also, there are ~8% unique SNVs predicted using hg19K‐based approach, which are also confined to the 1.9 million positions. CONCLUSION: We report that the presence of minor alleles in hg19 alone results in ~8% false negatives and ~30% false positives during variant calls. Also, among the variant calls unique to hg19K‐based methods, which are missed in individuals homozygous to the minor alleles in hg19‐based prediction, some are deleterious missense mutations at sites conserved across diverse species.