Cargando…
hg19K: addressing a significant lacuna in hg19‐based variant calling
BACKGROUND: The hg19 assembly of the human genome is the most heavily annotated and most commonly used reference to make variant calls for individual genomes. Based on the phase 3 report of the 1000 genomes project (1000G), it is now well known that many positions in the hg19 genome represent minor...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
John Wiley and Sons Inc.
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5241214/ https://www.ncbi.nlm.nih.gov/pubmed/28116326 http://dx.doi.org/10.1002/mgg3.251 |
_version_ | 1782496149254438912 |
---|---|
author | Karthikeyan, Savita Bawa, Pushpinder S. Srinivasan, Subhashini |
author_facet | Karthikeyan, Savita Bawa, Pushpinder S. Srinivasan, Subhashini |
author_sort | Karthikeyan, Savita |
collection | PubMed |
description | BACKGROUND: The hg19 assembly of the human genome is the most heavily annotated and most commonly used reference to make variant calls for individual genomes. Based on the phase 3 report of the 1000 genomes project (1000G), it is now well known that many positions in the hg19 genome represent minor alleles. Since commonly used variant call methods are developed under the assumption that hg19 reference harbors major alleles at all the ~3 billion positions, these methods mask the calls whenever an individual is homozygous to the minor allele at the respective positions. Hence, it is important to address the extent and impact of these minor alleles in hg19 from the point of view of individual genomes. METHOD: We have created a reference genome, hg19K, in which all the positions in hg19 reference harboring minor allele were replaced by those from the phase 3 report of the 1000 genomes project. The genomes of five individuals, downloaded from the public repository, were analyzed using both hg19 and hg19K and compared. RESULTS: Out of the 81 million SNPs in phase 3 report from the 1000 genomes project, 1.9 million positions were found to be major alleles compared to hg19 with many having an allele frequency of >0.9. We observed that ~30% of the SNVs found in individual genomes are confined to the 1.9 million positions. Also, there are ~8% unique SNVs predicted using hg19K‐based approach, which are also confined to the 1.9 million positions. CONCLUSION: We report that the presence of minor alleles in hg19 alone results in ~8% false negatives and ~30% false positives during variant calls. Also, among the variant calls unique to hg19K‐based methods, which are missed in individuals homozygous to the minor alleles in hg19‐based prediction, some are deleterious missense mutations at sites conserved across diverse species. |
format | Online Article Text |
id | pubmed-5241214 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | John Wiley and Sons Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-52412142017-01-23 hg19K: addressing a significant lacuna in hg19‐based variant calling Karthikeyan, Savita Bawa, Pushpinder S. Srinivasan, Subhashini Mol Genet Genomic Med Original Articles BACKGROUND: The hg19 assembly of the human genome is the most heavily annotated and most commonly used reference to make variant calls for individual genomes. Based on the phase 3 report of the 1000 genomes project (1000G), it is now well known that many positions in the hg19 genome represent minor alleles. Since commonly used variant call methods are developed under the assumption that hg19 reference harbors major alleles at all the ~3 billion positions, these methods mask the calls whenever an individual is homozygous to the minor allele at the respective positions. Hence, it is important to address the extent and impact of these minor alleles in hg19 from the point of view of individual genomes. METHOD: We have created a reference genome, hg19K, in which all the positions in hg19 reference harboring minor allele were replaced by those from the phase 3 report of the 1000 genomes project. The genomes of five individuals, downloaded from the public repository, were analyzed using both hg19 and hg19K and compared. RESULTS: Out of the 81 million SNPs in phase 3 report from the 1000 genomes project, 1.9 million positions were found to be major alleles compared to hg19 with many having an allele frequency of >0.9. We observed that ~30% of the SNVs found in individual genomes are confined to the 1.9 million positions. Also, there are ~8% unique SNVs predicted using hg19K‐based approach, which are also confined to the 1.9 million positions. CONCLUSION: We report that the presence of minor alleles in hg19 alone results in ~8% false negatives and ~30% false positives during variant calls. Also, among the variant calls unique to hg19K‐based methods, which are missed in individuals homozygous to the minor alleles in hg19‐based prediction, some are deleterious missense mutations at sites conserved across diverse species. John Wiley and Sons Inc. 2016-11-13 /pmc/articles/PMC5241214/ /pubmed/28116326 http://dx.doi.org/10.1002/mgg3.251 Text en © 2016 The Authors. Molecular Genetics & Genomic Medicine published by Wiley Periodicals, Inc. This is an open access article under the terms of the Creative Commons Attribution (http://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Articles Karthikeyan, Savita Bawa, Pushpinder S. Srinivasan, Subhashini hg19K: addressing a significant lacuna in hg19‐based variant calling |
title | hg19K: addressing a significant lacuna in hg19‐based variant calling |
title_full | hg19K: addressing a significant lacuna in hg19‐based variant calling |
title_fullStr | hg19K: addressing a significant lacuna in hg19‐based variant calling |
title_full_unstemmed | hg19K: addressing a significant lacuna in hg19‐based variant calling |
title_short | hg19K: addressing a significant lacuna in hg19‐based variant calling |
title_sort | hg19k: addressing a significant lacuna in hg19‐based variant calling |
topic | Original Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5241214/ https://www.ncbi.nlm.nih.gov/pubmed/28116326 http://dx.doi.org/10.1002/mgg3.251 |
work_keys_str_mv | AT karthikeyansavita hg19kaddressingasignificantlacunainhg19basedvariantcalling AT bawapushpinders hg19kaddressingasignificantlacunainhg19basedvariantcalling AT srinivasansubhashini hg19kaddressingasignificantlacunainhg19basedvariantcalling |