Cargando…

A Multi-Sample Based Method for Identifying Common CNVs in Normal Human Genomic Structure Using High-Resolution aCGH Data

BACKGROUND: It is difficult to identify copy number variations (CNV) in normal human genomic data due to noise and non-linear relationships between different genomic regions and signal intensity. A high-resolution array comparative genomic hybridization (aCGH) containing 42 million probes, which is...

Descripción completa

Detalles Bibliográficos
Autores principales:	Park, Chihyun, Ahn, Jaegyoon, Yoon, Youngmi, Park, Sanghyun
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2011
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3205051/ https://www.ncbi.nlm.nih.gov/pubmed/22073121 http://dx.doi.org/10.1371/journal.pone.0026975

_version_	1782215285700296704
author	Park, Chihyun Ahn, Jaegyoon Yoon, Youngmi Park, Sanghyun
author_facet	Park, Chihyun Ahn, Jaegyoon Yoon, Youngmi Park, Sanghyun
author_sort	Park, Chihyun
collection	PubMed
description	BACKGROUND: It is difficult to identify copy number variations (CNV) in normal human genomic data due to noise and non-linear relationships between different genomic regions and signal intensity. A high-resolution array comparative genomic hybridization (aCGH) containing 42 million probes, which is very large compared to previous arrays, was recently published. Most existing CNV detection algorithms do not work well because of noise associated with the large amount of input data and because most of the current methods were not designed to analyze normal human samples. Normal human genome analysis often requires a joint approach across multiple samples. However, the majority of existing methods can only identify CNVs from a single sample. METHODOLOGY AND PRINCIPAL FINDINGS: We developed a multi-sample-based genomic variations detector (MGVD) that uses segmentation to identify common breakpoints across multiple samples and a k-means-based clustering strategy. Unlike previous methods, MGVD simultaneously considers multiple samples with different genomic intensities and identifies CNVs and CNV zones (CNVZs); CNVZ is a more precise measure of the location of a genomic variant than the CNV region (CNVR). CONCLUSIONS AND SIGNIFICANCE: We designed a specialized algorithm to detect common CNVs from extremely high-resolution multi-sample aCGH data. MGVD showed high sensitivity and a low false discovery rate for a simulated data set, and outperformed most current methods when real, high-resolution HapMap datasets were analyzed. MGVD also had the fastest runtime compared to the other algorithms evaluated when actual, high-resolution aCGH data were analyzed. The CNVZs identified by MGVD can be used in association studies for revealing relationships between phenotypes and genomic aberrations. Our algorithm was developed with standard C++ and is available in Linux and MS Windows format in the STL library. It is freely available at: http://embio.yonsei.ac.kr/~Park/mgvd.php.
format	Online Article Text
id	pubmed-3205051
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-32050512011-11-09 A Multi-Sample Based Method for Identifying Common CNVs in Normal Human Genomic Structure Using High-Resolution aCGH Data Park, Chihyun Ahn, Jaegyoon Yoon, Youngmi Park, Sanghyun PLoS One Research Article BACKGROUND: It is difficult to identify copy number variations (CNV) in normal human genomic data due to noise and non-linear relationships between different genomic regions and signal intensity. A high-resolution array comparative genomic hybridization (aCGH) containing 42 million probes, which is very large compared to previous arrays, was recently published. Most existing CNV detection algorithms do not work well because of noise associated with the large amount of input data and because most of the current methods were not designed to analyze normal human samples. Normal human genome analysis often requires a joint approach across multiple samples. However, the majority of existing methods can only identify CNVs from a single sample. METHODOLOGY AND PRINCIPAL FINDINGS: We developed a multi-sample-based genomic variations detector (MGVD) that uses segmentation to identify common breakpoints across multiple samples and a k-means-based clustering strategy. Unlike previous methods, MGVD simultaneously considers multiple samples with different genomic intensities and identifies CNVs and CNV zones (CNVZs); CNVZ is a more precise measure of the location of a genomic variant than the CNV region (CNVR). CONCLUSIONS AND SIGNIFICANCE: We designed a specialized algorithm to detect common CNVs from extremely high-resolution multi-sample aCGH data. MGVD showed high sensitivity and a low false discovery rate for a simulated data set, and outperformed most current methods when real, high-resolution HapMap datasets were analyzed. MGVD also had the fastest runtime compared to the other algorithms evaluated when actual, high-resolution aCGH data were analyzed. The CNVZs identified by MGVD can be used in association studies for revealing relationships between phenotypes and genomic aberrations. Our algorithm was developed with standard C++ and is available in Linux and MS Windows format in the STL library. It is freely available at: http://embio.yonsei.ac.kr/~Park/mgvd.php. Public Library of Science 2011-10-31 /pmc/articles/PMC3205051/ /pubmed/22073121 http://dx.doi.org/10.1371/journal.pone.0026975 Text en Park et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Park, Chihyun Ahn, Jaegyoon Yoon, Youngmi Park, Sanghyun A Multi-Sample Based Method for Identifying Common CNVs in Normal Human Genomic Structure Using High-Resolution aCGH Data
title	A Multi-Sample Based Method for Identifying Common CNVs in Normal Human Genomic Structure Using High-Resolution aCGH Data
title_full	A Multi-Sample Based Method for Identifying Common CNVs in Normal Human Genomic Structure Using High-Resolution aCGH Data
title_fullStr	A Multi-Sample Based Method for Identifying Common CNVs in Normal Human Genomic Structure Using High-Resolution aCGH Data
title_full_unstemmed	A Multi-Sample Based Method for Identifying Common CNVs in Normal Human Genomic Structure Using High-Resolution aCGH Data
title_short	A Multi-Sample Based Method for Identifying Common CNVs in Normal Human Genomic Structure Using High-Resolution aCGH Data
title_sort	multi-sample based method for identifying common cnvs in normal human genomic structure using high-resolution acgh data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3205051/ https://www.ncbi.nlm.nih.gov/pubmed/22073121 http://dx.doi.org/10.1371/journal.pone.0026975
work_keys_str_mv	AT parkchihyun amultisamplebasedmethodforidentifyingcommoncnvsinnormalhumangenomicstructureusinghighresolutionacghdata AT ahnjaegyoon amultisamplebasedmethodforidentifyingcommoncnvsinnormalhumangenomicstructureusinghighresolutionacghdata AT yoonyoungmi amultisamplebasedmethodforidentifyingcommoncnvsinnormalhumangenomicstructureusinghighresolutionacghdata AT parksanghyun amultisamplebasedmethodforidentifyingcommoncnvsinnormalhumangenomicstructureusinghighresolutionacghdata AT parkchihyun multisamplebasedmethodforidentifyingcommoncnvsinnormalhumangenomicstructureusinghighresolutionacghdata AT ahnjaegyoon multisamplebasedmethodforidentifyingcommoncnvsinnormalhumangenomicstructureusinghighresolutionacghdata AT yoonyoungmi multisamplebasedmethodforidentifyingcommoncnvsinnormalhumangenomicstructureusinghighresolutionacghdata AT parksanghyun multisamplebasedmethodforidentifyingcommoncnvsinnormalhumangenomicstructureusinghighresolutionacghdata

A Multi-Sample Based Method for Identifying Common CNVs in Normal Human Genomic Structure Using High-Resolution aCGH Data

Ejemplares similares