Cargando…

CODEX: a normalization and copy number variation detection method for whole exome sequencing

High-throughput sequencing of DNA coding regions has become a common way of assaying genomic variation in the study of human diseases. Copy number variation (CNV) is an important type of genomic variation, but detecting and characterizing CNV from exome sequencing is challenging due to the high leve...

Descripción completa

Detalles Bibliográficos
Autores principales: Jiang, Yuchao, Oldridge, Derek A., Diskin, Sharon J., Zhang, Nancy R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4381046/
https://www.ncbi.nlm.nih.gov/pubmed/25618849
http://dx.doi.org/10.1093/nar/gku1363
_version_ 1782364385540308992
author Jiang, Yuchao
Oldridge, Derek A.
Diskin, Sharon J.
Zhang, Nancy R.
author_facet Jiang, Yuchao
Oldridge, Derek A.
Diskin, Sharon J.
Zhang, Nancy R.
author_sort Jiang, Yuchao
collection PubMed
description High-throughput sequencing of DNA coding regions has become a common way of assaying genomic variation in the study of human diseases. Copy number variation (CNV) is an important type of genomic variation, but detecting and characterizing CNV from exome sequencing is challenging due to the high level of biases and artifacts. We propose CODEX, a normalization and CNV calling procedure for whole exome sequencing data. The Poisson latent factor model in CODEX includes terms that specifically remove biases due to GC content, exon capture and amplification efficiency, and latent systemic artifacts. CODEX also includes a Poisson likelihood-based recursive segmentation procedure that explicitly models the count-based exome sequencing data. CODEX is compared to existing methods on a population analysis of HapMap samples from the 1000 Genomes Project, and shown to be more accurate on three microarray-based validation data sets. We further evaluate performance on 222 neuroblastoma samples with matched normals and focus on a well-studied rare somatic CNV within the ATRX gene. We show that the cross-sample normalization procedure of CODEX removes more noise than normalizing the tumor against the matched normal and that the segmentation procedure performs well in detecting CNVs with nested structures.
format Online
Article
Text
id pubmed-4381046
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-43810462015-04-03 CODEX: a normalization and copy number variation detection method for whole exome sequencing Jiang, Yuchao Oldridge, Derek A. Diskin, Sharon J. Zhang, Nancy R. Nucleic Acids Res Methods Online High-throughput sequencing of DNA coding regions has become a common way of assaying genomic variation in the study of human diseases. Copy number variation (CNV) is an important type of genomic variation, but detecting and characterizing CNV from exome sequencing is challenging due to the high level of biases and artifacts. We propose CODEX, a normalization and CNV calling procedure for whole exome sequencing data. The Poisson latent factor model in CODEX includes terms that specifically remove biases due to GC content, exon capture and amplification efficiency, and latent systemic artifacts. CODEX also includes a Poisson likelihood-based recursive segmentation procedure that explicitly models the count-based exome sequencing data. CODEX is compared to existing methods on a population analysis of HapMap samples from the 1000 Genomes Project, and shown to be more accurate on three microarray-based validation data sets. We further evaluate performance on 222 neuroblastoma samples with matched normals and focus on a well-studied rare somatic CNV within the ATRX gene. We show that the cross-sample normalization procedure of CODEX removes more noise than normalizing the tumor against the matched normal and that the segmentation procedure performs well in detecting CNVs with nested structures. Oxford University Press 2015-03-31 2015-01-23 /pmc/articles/PMC4381046/ /pubmed/25618849 http://dx.doi.org/10.1093/nar/gku1363 Text en © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Jiang, Yuchao
Oldridge, Derek A.
Diskin, Sharon J.
Zhang, Nancy R.
CODEX: a normalization and copy number variation detection method for whole exome sequencing
title CODEX: a normalization and copy number variation detection method for whole exome sequencing
title_full CODEX: a normalization and copy number variation detection method for whole exome sequencing
title_fullStr CODEX: a normalization and copy number variation detection method for whole exome sequencing
title_full_unstemmed CODEX: a normalization and copy number variation detection method for whole exome sequencing
title_short CODEX: a normalization and copy number variation detection method for whole exome sequencing
title_sort codex: a normalization and copy number variation detection method for whole exome sequencing
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4381046/
https://www.ncbi.nlm.nih.gov/pubmed/25618849
http://dx.doi.org/10.1093/nar/gku1363
work_keys_str_mv AT jiangyuchao codexanormalizationandcopynumbervariationdetectionmethodforwholeexomesequencing
AT oldridgedereka codexanormalizationandcopynumbervariationdetectionmethodforwholeexomesequencing
AT diskinsharonj codexanormalizationandcopynumbervariationdetectionmethodforwholeexomesequencing
AT zhangnancyr codexanormalizationandcopynumbervariationdetectionmethodforwholeexomesequencing