Cargando…

Effective normalization for copy number variation detection from whole genome sequencing

BACKGROUND: Whole genome sequencing enables a high resolution view of the human genome and provides unique insights into genome structure at an unprecedented scale. There have been a number of tools to infer copy number variation in the genome. These tools, while validated, also include a number of...

Descripción completa

Detalles Bibliográficos
Autores principales:	Janevski, Angel, Varadan, Vinay, Kamalakaran, Sitharthan, Banerjee, Nilanjana, Dimitrova, Nevenka
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3481445/ https://www.ncbi.nlm.nih.gov/pubmed/23134596 http://dx.doi.org/10.1186/1471-2164-13-S6-S16

_version_	1782247740093235200
author	Janevski, Angel Varadan, Vinay Kamalakaran, Sitharthan Banerjee, Nilanjana Dimitrova, Nevenka
author_facet	Janevski, Angel Varadan, Vinay Kamalakaran, Sitharthan Banerjee, Nilanjana Dimitrova, Nevenka
author_sort	Janevski, Angel
collection	PubMed
description	BACKGROUND: Whole genome sequencing enables a high resolution view of the human genome and provides unique insights into genome structure at an unprecedented scale. There have been a number of tools to infer copy number variation in the genome. These tools, while validated, also include a number of parameters that are configurable to genome data being analyzed. These algorithms allow for normalization to account for individual and population-specific effects on individual genome CNV estimates but the impact of these changes on the estimated CNVs is not well characterized. We evaluate in detail the effect of normalization methodologies in two CNV algorithms FREEC and CNV-seq using whole genome sequencing data from 8 individuals spanning four populations. METHODS: We apply FREEC and CNV-seq to a sequencing data set consisting of 8 genomes. We use multiple configurations corresponding to different read-count normalization methodologies in FREEC, and statistically characterize the concordance of the CNV calls between FREEC configurations and the analogous output from CNV-seq. The normalization methodologies evaluated in FREEC are: GC content, mappability and control genome. We further stratify the concordance analysis within genic, non-genic, and a collection of validated variant regions. RESULTS: The GC content normalization methodology generates the highest number of altered copy number regions. Both mappability and control genome normalization reduce the total number and length of copy number regions. Mappability normalization yields Jaccard indices in the 0.07 - 0.3 range, whereas using a control genome normalization yields Jaccard index values around 0.4 with normalization based on GC content. The most critical impact of using mappability as a normalization factor is substantial reduction of deletion CNV calls. The output of another method based on control genome normalization, CNV-seq, resulted in comparable CNV call profiles, and substantial agreement in variable gene and CNV region calls. CONCLUSIONS: Choice of read-count normalization methodology has a substantial effect on CNV calls and the use of genomic mappability or an appropriately chosen control genome can optimize the output of CNV analysis.
format	Online Article Text
id	pubmed-3481445
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-34814452012-11-02 Effective normalization for copy number variation detection from whole genome sequencing Janevski, Angel Varadan, Vinay Kamalakaran, Sitharthan Banerjee, Nilanjana Dimitrova, Nevenka BMC Genomics Research BACKGROUND: Whole genome sequencing enables a high resolution view of the human genome and provides unique insights into genome structure at an unprecedented scale. There have been a number of tools to infer copy number variation in the genome. These tools, while validated, also include a number of parameters that are configurable to genome data being analyzed. These algorithms allow for normalization to account for individual and population-specific effects on individual genome CNV estimates but the impact of these changes on the estimated CNVs is not well characterized. We evaluate in detail the effect of normalization methodologies in two CNV algorithms FREEC and CNV-seq using whole genome sequencing data from 8 individuals spanning four populations. METHODS: We apply FREEC and CNV-seq to a sequencing data set consisting of 8 genomes. We use multiple configurations corresponding to different read-count normalization methodologies in FREEC, and statistically characterize the concordance of the CNV calls between FREEC configurations and the analogous output from CNV-seq. The normalization methodologies evaluated in FREEC are: GC content, mappability and control genome. We further stratify the concordance analysis within genic, non-genic, and a collection of validated variant regions. RESULTS: The GC content normalization methodology generates the highest number of altered copy number regions. Both mappability and control genome normalization reduce the total number and length of copy number regions. Mappability normalization yields Jaccard indices in the 0.07 - 0.3 range, whereas using a control genome normalization yields Jaccard index values around 0.4 with normalization based on GC content. The most critical impact of using mappability as a normalization factor is substantial reduction of deletion CNV calls. The output of another method based on control genome normalization, CNV-seq, resulted in comparable CNV call profiles, and substantial agreement in variable gene and CNV region calls. CONCLUSIONS: Choice of read-count normalization methodology has a substantial effect on CNV calls and the use of genomic mappability or an appropriately chosen control genome can optimize the output of CNV analysis. BioMed Central 2012-10-26 /pmc/articles/PMC3481445/ /pubmed/23134596 http://dx.doi.org/10.1186/1471-2164-13-S6-S16 Text en Copyright ©2012 Janevski et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Janevski, Angel Varadan, Vinay Kamalakaran, Sitharthan Banerjee, Nilanjana Dimitrova, Nevenka Effective normalization for copy number variation detection from whole genome sequencing
title	Effective normalization for copy number variation detection from whole genome sequencing
title_full	Effective normalization for copy number variation detection from whole genome sequencing
title_fullStr	Effective normalization for copy number variation detection from whole genome sequencing
title_full_unstemmed	Effective normalization for copy number variation detection from whole genome sequencing
title_short	Effective normalization for copy number variation detection from whole genome sequencing
title_sort	effective normalization for copy number variation detection from whole genome sequencing
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3481445/ https://www.ncbi.nlm.nih.gov/pubmed/23134596 http://dx.doi.org/10.1186/1471-2164-13-S6-S16
work_keys_str_mv	AT janevskiangel effectivenormalizationforcopynumbervariationdetectionfromwholegenomesequencing AT varadanvinay effectivenormalizationforcopynumbervariationdetectionfromwholegenomesequencing AT kamalakaransitharthan effectivenormalizationforcopynumbervariationdetectionfromwholegenomesequencing AT banerjeenilanjana effectivenormalizationforcopynumbervariationdetectionfromwholegenomesequencing AT dimitrovanevenka effectivenormalizationforcopynumbervariationdetectionfromwholegenomesequencing

Effective normalization for copy number variation detection from whole genome sequencing

Ejemplares similares