Cargando…
A model-based circular binary segmentation algorithm for the analysis of array CGH data
BACKGROUND: Circular Binary Segmentation (CBS) is a permutation-based algorithm for array Comparative Genomic Hybridization (aCGH) data analysis. CBS accurately segments data by detecting change-points using a maximal-t test; but extensive computational burden is involved for evaluating the signific...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3224564/ https://www.ncbi.nlm.nih.gov/pubmed/21985277 http://dx.doi.org/10.1186/1756-0500-4-394 |
_version_ | 1782217410285142016 |
---|---|
author | Hsu, Fang-Han Chen, Hung-I H Tsai, Mong-Hsun Lai, Liang-Chuan Huang, Chi-Cheng Tu, Shih-Hsin Chuang, Eric Y Chen, Yidong |
author_facet | Hsu, Fang-Han Chen, Hung-I H Tsai, Mong-Hsun Lai, Liang-Chuan Huang, Chi-Cheng Tu, Shih-Hsin Chuang, Eric Y Chen, Yidong |
author_sort | Hsu, Fang-Han |
collection | PubMed |
description | BACKGROUND: Circular Binary Segmentation (CBS) is a permutation-based algorithm for array Comparative Genomic Hybridization (aCGH) data analysis. CBS accurately segments data by detecting change-points using a maximal-t test; but extensive computational burden is involved for evaluating the significance of change-points using permutations. A recent implementation utilizing a hybrid method and early stopping rules (hybrid CBS) to improve the performance in speed was subsequently proposed. However, a time analysis revealed that a major portion of computation time of the hybrid CBS was still spent on permutation. In addition, what the hybrid method provides is an approximation of the significance upper bound or lower bound, not an approximation of the significance of change-points itself. RESULTS: We developed a novel model-based algorithm, extreme-value based CBS (eCBS), which limits permutations and provides robust results without loss of accuracy. Thousands of aCGH data under null hypothesis were simulated in advance based on a variety of non-normal assumptions, and the corresponding maximal-t distribution was modeled by the Generalized Extreme Value (GEV) distribution. The modeling results, which associate characteristics of aCGH data to the GEV parameters, constitute lookup tables (eXtreme model). Using the eXtreme model, the significance of change-points could be evaluated in a constant time complexity through a table lookup process. CONCLUSIONS: A novel algorithm, eCBS, was developed in this study. The current implementation of eCBS consistently outperforms the hybrid CBS 4× to 20× in computation time without loss of accuracy. Source codes, supplementary materials, supplementary figures, and supplementary tables can be found at http://ntumaps.cgm.ntu.edu.tw/eCBSsupplementary. |
format | Online Article Text |
id | pubmed-3224564 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-32245642011-11-30 A model-based circular binary segmentation algorithm for the analysis of array CGH data Hsu, Fang-Han Chen, Hung-I H Tsai, Mong-Hsun Lai, Liang-Chuan Huang, Chi-Cheng Tu, Shih-Hsin Chuang, Eric Y Chen, Yidong BMC Res Notes Research Article BACKGROUND: Circular Binary Segmentation (CBS) is a permutation-based algorithm for array Comparative Genomic Hybridization (aCGH) data analysis. CBS accurately segments data by detecting change-points using a maximal-t test; but extensive computational burden is involved for evaluating the significance of change-points using permutations. A recent implementation utilizing a hybrid method and early stopping rules (hybrid CBS) to improve the performance in speed was subsequently proposed. However, a time analysis revealed that a major portion of computation time of the hybrid CBS was still spent on permutation. In addition, what the hybrid method provides is an approximation of the significance upper bound or lower bound, not an approximation of the significance of change-points itself. RESULTS: We developed a novel model-based algorithm, extreme-value based CBS (eCBS), which limits permutations and provides robust results without loss of accuracy. Thousands of aCGH data under null hypothesis were simulated in advance based on a variety of non-normal assumptions, and the corresponding maximal-t distribution was modeled by the Generalized Extreme Value (GEV) distribution. The modeling results, which associate characteristics of aCGH data to the GEV parameters, constitute lookup tables (eXtreme model). Using the eXtreme model, the significance of change-points could be evaluated in a constant time complexity through a table lookup process. CONCLUSIONS: A novel algorithm, eCBS, was developed in this study. The current implementation of eCBS consistently outperforms the hybrid CBS 4× to 20× in computation time without loss of accuracy. Source codes, supplementary materials, supplementary figures, and supplementary tables can be found at http://ntumaps.cgm.ntu.edu.tw/eCBSsupplementary. BioMed Central 2011-10-10 /pmc/articles/PMC3224564/ /pubmed/21985277 http://dx.doi.org/10.1186/1756-0500-4-394 Text en Copyright ©2011 Hsu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Hsu, Fang-Han Chen, Hung-I H Tsai, Mong-Hsun Lai, Liang-Chuan Huang, Chi-Cheng Tu, Shih-Hsin Chuang, Eric Y Chen, Yidong A model-based circular binary segmentation algorithm for the analysis of array CGH data |
title | A model-based circular binary segmentation algorithm for the analysis of array CGH data |
title_full | A model-based circular binary segmentation algorithm for the analysis of array CGH data |
title_fullStr | A model-based circular binary segmentation algorithm for the analysis of array CGH data |
title_full_unstemmed | A model-based circular binary segmentation algorithm for the analysis of array CGH data |
title_short | A model-based circular binary segmentation algorithm for the analysis of array CGH data |
title_sort | model-based circular binary segmentation algorithm for the analysis of array cgh data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3224564/ https://www.ncbi.nlm.nih.gov/pubmed/21985277 http://dx.doi.org/10.1186/1756-0500-4-394 |
work_keys_str_mv | AT hsufanghan amodelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata AT chenhungih amodelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata AT tsaimonghsun amodelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata AT lailiangchuan amodelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata AT huangchicheng amodelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata AT tushihhsin amodelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata AT chuangericy amodelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata AT chenyidong amodelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata AT hsufanghan modelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata AT chenhungih modelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata AT tsaimonghsun modelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata AT lailiangchuan modelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata AT huangchicheng modelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata AT tushihhsin modelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata AT chuangericy modelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata AT chenyidong modelbasedcircularbinarysegmentationalgorithmfortheanalysisofarraycghdata |