Cargando…
Direct observation of genomic heterogeneity through local haplotyping analysis
BACKGROUND: It has been an abiding belief among geneticists that multicellular organisms’ genomes can be analyzed under the assumption that a single individual has a uniform genome in all its cells. Despite some evidence to the contrary, this belief has been used as an axiomatic assumption in most g...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053652/ https://www.ncbi.nlm.nih.gov/pubmed/24888354 http://dx.doi.org/10.1186/1471-2164-15-418 |
_version_ | 1782320411787132928 |
---|---|
author | Gulukota, Kamalakar Helseth Jr, Donald L Khandekar, Janardan D |
author_facet | Gulukota, Kamalakar Helseth Jr, Donald L Khandekar, Janardan D |
author_sort | Gulukota, Kamalakar |
collection | PubMed |
description | BACKGROUND: It has been an abiding belief among geneticists that multicellular organisms’ genomes can be analyzed under the assumption that a single individual has a uniform genome in all its cells. Despite some evidence to the contrary, this belief has been used as an axiomatic assumption in most genome analysis software packages. In this paper we present observations in human whole genome data, human whole exome data and in mouse whole genome data to challenge this assumption. We show that heterogeneity is in fact ubiquitous and readily observable in ordinary Next Generation Sequencing (NGS) data. RESULTS: Starting with the assumption that a single NGS read (or read pair) must come from one haplotype, we built a procedure for directly observing haplotypes at a local level by examining 2 or 3 adjacent single nucleotide polymorphisms (SNPs) which are close enough on the genome to be spanned by individual reads. We applied this procedure to NGS data from three different sources: whole genome of a Central European trio from the 1000 genomes project, whole genome data from laboratory-bred strains of mouse, and whole exome data from a set of patients of head and neck tumors. Thousands of loci were found in each genome where reads spanning 2 or 3 SNPs displayed more than two haplotypes, indicating that the locus is heterogeneous. We show that such loci are ubiquitous in the genome and cannot be explained by segmental duplications. We explain them on the basis of cellular heterogeneity at the genomic level. Such heterogeneous loci were found in all normal and tumor genomes examined. CONCLUSIONS: Our results highlight the need for new methods to analyze genomic variation because existing ones do not systematically consider local haplotypes. Identification of cancer somatic mutations is complicated because of tumor heterogeneity. It is further complicated if, as we show, normal tissues are also heterogeneous. Methods for biomarker discovery must consider contextual haplotype information rather than just whether a variant “is present”. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-418) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4053652 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-40536522014-06-17 Direct observation of genomic heterogeneity through local haplotyping analysis Gulukota, Kamalakar Helseth Jr, Donald L Khandekar, Janardan D BMC Genomics Research Article BACKGROUND: It has been an abiding belief among geneticists that multicellular organisms’ genomes can be analyzed under the assumption that a single individual has a uniform genome in all its cells. Despite some evidence to the contrary, this belief has been used as an axiomatic assumption in most genome analysis software packages. In this paper we present observations in human whole genome data, human whole exome data and in mouse whole genome data to challenge this assumption. We show that heterogeneity is in fact ubiquitous and readily observable in ordinary Next Generation Sequencing (NGS) data. RESULTS: Starting with the assumption that a single NGS read (or read pair) must come from one haplotype, we built a procedure for directly observing haplotypes at a local level by examining 2 or 3 adjacent single nucleotide polymorphisms (SNPs) which are close enough on the genome to be spanned by individual reads. We applied this procedure to NGS data from three different sources: whole genome of a Central European trio from the 1000 genomes project, whole genome data from laboratory-bred strains of mouse, and whole exome data from a set of patients of head and neck tumors. Thousands of loci were found in each genome where reads spanning 2 or 3 SNPs displayed more than two haplotypes, indicating that the locus is heterogeneous. We show that such loci are ubiquitous in the genome and cannot be explained by segmental duplications. We explain them on the basis of cellular heterogeneity at the genomic level. Such heterogeneous loci were found in all normal and tumor genomes examined. CONCLUSIONS: Our results highlight the need for new methods to analyze genomic variation because existing ones do not systematically consider local haplotypes. Identification of cancer somatic mutations is complicated because of tumor heterogeneity. It is further complicated if, as we show, normal tissues are also heterogeneous. Methods for biomarker discovery must consider contextual haplotype information rather than just whether a variant “is present”. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-418) contains supplementary material, which is available to authorized users. BioMed Central 2014-06-02 /pmc/articles/PMC4053652/ /pubmed/24888354 http://dx.doi.org/10.1186/1471-2164-15-418 Text en © Gulukota et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Gulukota, Kamalakar Helseth Jr, Donald L Khandekar, Janardan D Direct observation of genomic heterogeneity through local haplotyping analysis |
title | Direct observation of genomic heterogeneity through local haplotyping analysis |
title_full | Direct observation of genomic heterogeneity through local haplotyping analysis |
title_fullStr | Direct observation of genomic heterogeneity through local haplotyping analysis |
title_full_unstemmed | Direct observation of genomic heterogeneity through local haplotyping analysis |
title_short | Direct observation of genomic heterogeneity through local haplotyping analysis |
title_sort | direct observation of genomic heterogeneity through local haplotyping analysis |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053652/ https://www.ncbi.nlm.nih.gov/pubmed/24888354 http://dx.doi.org/10.1186/1471-2164-15-418 |
work_keys_str_mv | AT gulukotakamalakar directobservationofgenomicheterogeneitythroughlocalhaplotypinganalysis AT helsethjrdonaldl directobservationofgenomicheterogeneitythroughlocalhaplotypinganalysis AT khandekarjanardand directobservationofgenomicheterogeneitythroughlocalhaplotypinganalysis |