Cargando…

Direct observation of genomic heterogeneity through local haplotyping analysis

BACKGROUND: It has been an abiding belief among geneticists that multicellular organisms’ genomes can be analyzed under the assumption that a single individual has a uniform genome in all its cells. Despite some evidence to the contrary, this belief has been used as an axiomatic assumption in most g...

Descripción completa

Detalles Bibliográficos
Autores principales: Gulukota, Kamalakar, Helseth Jr, Donald L, Khandekar, Janardan D
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053652/
https://www.ncbi.nlm.nih.gov/pubmed/24888354
http://dx.doi.org/10.1186/1471-2164-15-418
_version_ 1782320411787132928
author Gulukota, Kamalakar
Helseth Jr, Donald L
Khandekar, Janardan D
author_facet Gulukota, Kamalakar
Helseth Jr, Donald L
Khandekar, Janardan D
author_sort Gulukota, Kamalakar
collection PubMed
description BACKGROUND: It has been an abiding belief among geneticists that multicellular organisms’ genomes can be analyzed under the assumption that a single individual has a uniform genome in all its cells. Despite some evidence to the contrary, this belief has been used as an axiomatic assumption in most genome analysis software packages. In this paper we present observations in human whole genome data, human whole exome data and in mouse whole genome data to challenge this assumption. We show that heterogeneity is in fact ubiquitous and readily observable in ordinary Next Generation Sequencing (NGS) data. RESULTS: Starting with the assumption that a single NGS read (or read pair) must come from one haplotype, we built a procedure for directly observing haplotypes at a local level by examining 2 or 3 adjacent single nucleotide polymorphisms (SNPs) which are close enough on the genome to be spanned by individual reads. We applied this procedure to NGS data from three different sources: whole genome of a Central European trio from the 1000 genomes project, whole genome data from laboratory-bred strains of mouse, and whole exome data from a set of patients of head and neck tumors. Thousands of loci were found in each genome where reads spanning 2 or 3 SNPs displayed more than two haplotypes, indicating that the locus is heterogeneous. We show that such loci are ubiquitous in the genome and cannot be explained by segmental duplications. We explain them on the basis of cellular heterogeneity at the genomic level. Such heterogeneous loci were found in all normal and tumor genomes examined. CONCLUSIONS: Our results highlight the need for new methods to analyze genomic variation because existing ones do not systematically consider local haplotypes. Identification of cancer somatic mutations is complicated because of tumor heterogeneity. It is further complicated if, as we show, normal tissues are also heterogeneous. Methods for biomarker discovery must consider contextual haplotype information rather than just whether a variant “is present”. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-418) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4053652
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40536522014-06-17 Direct observation of genomic heterogeneity through local haplotyping analysis Gulukota, Kamalakar Helseth Jr, Donald L Khandekar, Janardan D BMC Genomics Research Article BACKGROUND: It has been an abiding belief among geneticists that multicellular organisms’ genomes can be analyzed under the assumption that a single individual has a uniform genome in all its cells. Despite some evidence to the contrary, this belief has been used as an axiomatic assumption in most genome analysis software packages. In this paper we present observations in human whole genome data, human whole exome data and in mouse whole genome data to challenge this assumption. We show that heterogeneity is in fact ubiquitous and readily observable in ordinary Next Generation Sequencing (NGS) data. RESULTS: Starting with the assumption that a single NGS read (or read pair) must come from one haplotype, we built a procedure for directly observing haplotypes at a local level by examining 2 or 3 adjacent single nucleotide polymorphisms (SNPs) which are close enough on the genome to be spanned by individual reads. We applied this procedure to NGS data from three different sources: whole genome of a Central European trio from the 1000 genomes project, whole genome data from laboratory-bred strains of mouse, and whole exome data from a set of patients of head and neck tumors. Thousands of loci were found in each genome where reads spanning 2 or 3 SNPs displayed more than two haplotypes, indicating that the locus is heterogeneous. We show that such loci are ubiquitous in the genome and cannot be explained by segmental duplications. We explain them on the basis of cellular heterogeneity at the genomic level. Such heterogeneous loci were found in all normal and tumor genomes examined. CONCLUSIONS: Our results highlight the need for new methods to analyze genomic variation because existing ones do not systematically consider local haplotypes. Identification of cancer somatic mutations is complicated because of tumor heterogeneity. It is further complicated if, as we show, normal tissues are also heterogeneous. Methods for biomarker discovery must consider contextual haplotype information rather than just whether a variant “is present”. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-418) contains supplementary material, which is available to authorized users. BioMed Central 2014-06-02 /pmc/articles/PMC4053652/ /pubmed/24888354 http://dx.doi.org/10.1186/1471-2164-15-418 Text en © Gulukota et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Gulukota, Kamalakar
Helseth Jr, Donald L
Khandekar, Janardan D
Direct observation of genomic heterogeneity through local haplotyping analysis
title Direct observation of genomic heterogeneity through local haplotyping analysis
title_full Direct observation of genomic heterogeneity through local haplotyping analysis
title_fullStr Direct observation of genomic heterogeneity through local haplotyping analysis
title_full_unstemmed Direct observation of genomic heterogeneity through local haplotyping analysis
title_short Direct observation of genomic heterogeneity through local haplotyping analysis
title_sort direct observation of genomic heterogeneity through local haplotyping analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4053652/
https://www.ncbi.nlm.nih.gov/pubmed/24888354
http://dx.doi.org/10.1186/1471-2164-15-418
work_keys_str_mv AT gulukotakamalakar directobservationofgenomicheterogeneitythroughlocalhaplotypinganalysis
AT helsethjrdonaldl directobservationofgenomicheterogeneitythroughlocalhaplotypinganalysis
AT khandekarjanardand directobservationofgenomicheterogeneitythroughlocalhaplotypinganalysis