Cargando…

A model-based circular binary segmentation algorithm for the analysis of array CGH data

BACKGROUND: Circular Binary Segmentation (CBS) is a permutation-based algorithm for array Comparative Genomic Hybridization (aCGH) data analysis. CBS accurately segments data by detecting change-points using a maximal-t test; but extensive computational burden is involved for evaluating the signific...

Descripción completa

Detalles Bibliográficos
Autores principales: Hsu, Fang-Han, Chen, Hung-I H, Tsai, Mong-Hsun, Lai, Liang-Chuan, Huang, Chi-Cheng, Tu, Shih-Hsin, Chuang, Eric Y, Chen, Yidong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3224564/
https://www.ncbi.nlm.nih.gov/pubmed/21985277
http://dx.doi.org/10.1186/1756-0500-4-394
_version_ 1782217410285142016
author Hsu, Fang-Han
Chen, Hung-I H
Tsai, Mong-Hsun
Lai, Liang-Chuan
Huang, Chi-Cheng
Tu, Shih-Hsin
Chuang, Eric Y
Chen, Yidong
author_facet Hsu, Fang-Han
Chen, Hung-I H
Tsai, Mong-Hsun
Lai, Liang-Chuan
Huang, Chi-Cheng
Tu, Shih-Hsin
Chuang, Eric Y
Chen, Yidong
author_sort Hsu, Fang-Han
collection PubMed
description BACKGROUND: Circular Binary Segmentation (CBS) is a permutation-based algorithm for array Comparative Genomic Hybridization (aCGH) data analysis. CBS accurately segments data by detecting change-points using a maximal-t test; but extensive computational burden is involved for evaluating the significance of change-points using permutations. A recent implementation utilizing a hybrid method and early stopping rules (hybrid CBS) to improve the performance in speed was subsequently proposed. However, a time analysis revealed that a major portion of computation time of the hybrid CBS was still spent on permutation. In addition, what the hybrid method provides is an approximation of the significance upper bound or lower bound, not an approximation of the significance of change-points itself. RESULTS: We developed a novel model-based algorithm, extreme-value based CBS (eCBS), which limits permutations and provides robust results without loss of accuracy. Thousands of aCGH data under null hypothesis were simulated in advance based on a variety of non-normal assumptions, and the corresponding maximal-t distribution was modeled by the Generalized Extreme Value (GEV) distribution. The modeling results, which associate characteristics of aCGH data to the GEV parameters, constitute lookup tables (eXtreme model). Using the eXtreme model, the significance of change-points could be evaluated in a constant time complexity through a table lookup process. CONCLUSIONS: A novel algorithm, eCBS, was developed in this study. The current implementation of eCBS consistently outperforms the hybrid CBS 4× to 20× in computation time without loss of accuracy. Source codes, supplementary materials, supplementary figures, and supplementary tables can be found at http://ntumaps.cgm.ntu.edu.tw/eCBSsupplementary.
format Online
Article
Text
id pubmed-3224564
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32245642011-11-30 A model-based circular binary segmentation algorithm for the analysis of array CGH data Hsu, Fang-Han Chen, Hung-I H Tsai, Mong-Hsun Lai, Liang-Chuan Huang, Chi-Cheng Tu, Shih-Hsin Chuang, Eric Y Chen, Yidong BMC Res Notes Research Article BACKGROUND: Circular Binary Segmentation (CBS) is a permutation-based algorithm for array Comparative Genomic Hybridization (aCGH) data analysis. CBS accurately segments data by detecting change-points using a maximal-t test; but extensive computational burden is involved for evaluating the significance of change-points using permutations. A recent implementation utilizing a hybrid method and early stopping rules (hybrid CBS) to improve the performance in speed was subsequently proposed. However, a time analysis revealed that a major portion of computation time of the hybrid CBS was still spent on permutation. In addition, what the hybrid method provides is an approximation of the significance upper bound or lower bound, not an approximation of the significance of change-points itself. RESULTS: We developed a novel model-based algorithm, extreme-value based CBS (eCBS), which limits permutations and provides robust results without loss of accuracy. Thousands of aCGH data under null hypothesis were simulated in advance based on a variety of non-normal assumptions, and the corresponding maximal-t distribution was modeled by the Generalized Extreme Value (GEV) distribution. The modeling results, which associate characteristics of aCGH data to the GEV parameters, constitute lookup tables (eXtreme model). Using the eXtreme model, the significance of change-points could be evaluated in a constant time complexity through a table lookup process. CONCLUSIONS: A novel algorithm, eCBS, was developed in this study. The current implementation of eCBS consistently outperforms the hybrid CBS 4× to 20× in computation time without loss of accuracy. Source codes, supplementary materials, supplementary figures, and supplementary tables can be found at http://ntumaps.cgm.ntu.edu.tw/eCBSsupplementary. BioMed Central 2011-10-10 /pmc/articles/PMC3224564/ /pubmed/21985277 http://dx.doi.org/10.1186/1756-0500-4-394 Text en Copyright ©2011 Hsu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Hsu, Fang-Han
Chen, Hung-I H
Tsai, Mong-Hsun
Lai, Liang-Chuan
Huang, Chi-Cheng
Tu, Shih-Hsin
Chuang, Eric Y
Chen, Yidong
A model-based circular binary segmentation algorithm for the analysis of array CGH data
title A model-based circular binary segmentation algorithm for the analysis of array CGH data
title_full A model-based circular binary segmentation algorithm for the analysis of array CGH data
title_fullStr A model-based circular binary segmentation algorithm for the analysis of array CGH data
title_full_unstemmed A model-based circular binary segmentation algorithm for the analysis of array CGH data
title_short A model-based circular binary segmentation algorithm for the analysis of array CGH data
title_sort model-based circular binary segmentation algorithm for the analysis of array cgh data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3224564/
https://www.ncbi.nlm.nih.gov/pubmed/21985277
http://dx.doi.org/10.1186/1756-0500-4-394
work_keys_str_mv AT hsufanghan amodelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata
AT chenhungih amodelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata
AT tsaimonghsun amodelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata
AT lailiangchuan amodelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata
AT huangchicheng amodelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata
AT tushihhsin amodelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata
AT chuangericy amodelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata
AT chenyidong amodelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata
AT hsufanghan modelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata
AT chenhungih modelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata
AT tsaimonghsun modelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata
AT lailiangchuan modelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata
AT huangchicheng modelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata
AT tushihhsin modelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata
AT chuangericy modelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata
AT chenyidong modelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata