Cargando…
An all-statistics, high-speed algorithm for the analysis of copy number variation in genomes
Detection of copy number variation (CNV) in DNA has recently become an important method for understanding the pathogenesis of cancer. While existing algorithms for extracting CNV from microarray data have worked reasonably well, the trend towards ever larger sample sizes and higher resolution microa...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3141250/ https://www.ncbi.nlm.nih.gov/pubmed/21576227 http://dx.doi.org/10.1093/nar/gkr137 |
_version_ | 1782208646625624064 |
---|---|
author | Chen, Chih-Hao Lee, Hsing-Chung Ling, Qingdong Chen, Hsiao-Rong Ko, Yi-An Tsou, Tsong-Shan Wang, Sun-Chong Wu, Li-Ching Lee, H. C. |
author_facet | Chen, Chih-Hao Lee, Hsing-Chung Ling, Qingdong Chen, Hsiao-Rong Ko, Yi-An Tsou, Tsong-Shan Wang, Sun-Chong Wu, Li-Ching Lee, H. C. |
author_sort | Chen, Chih-Hao |
collection | PubMed |
description | Detection of copy number variation (CNV) in DNA has recently become an important method for understanding the pathogenesis of cancer. While existing algorithms for extracting CNV from microarray data have worked reasonably well, the trend towards ever larger sample sizes and higher resolution microarrays has vastly increased the challenges they face. Here, we present Segmentation analysis of DNA (SAD), a clustering algorithm constructed with a strategy in which all operational decisions are based on simple and rigorous applications of statistical principles, measurement theory and precise mathematical relations. Compared with existing packages, SAD is simpler in formulation, more user friendly, much faster and less thirsty for memory, offers higher accuracy and supplies quantitative statistics for its predictions. Unique among such algorithms, SAD's running time scales linearly with array size; on a typical modern notebook, it completes high-quality CNV analyses for a 250 thousand-probe array in ∼1 s and a 1.8 million-probe array in ∼8 s. |
format | Online Article Text |
id | pubmed-3141250 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-31412502011-07-22 An all-statistics, high-speed algorithm for the analysis of copy number variation in genomes Chen, Chih-Hao Lee, Hsing-Chung Ling, Qingdong Chen, Hsiao-Rong Ko, Yi-An Tsou, Tsong-Shan Wang, Sun-Chong Wu, Li-Ching Lee, H. C. Nucleic Acids Res Methods Online Detection of copy number variation (CNV) in DNA has recently become an important method for understanding the pathogenesis of cancer. While existing algorithms for extracting CNV from microarray data have worked reasonably well, the trend towards ever larger sample sizes and higher resolution microarrays has vastly increased the challenges they face. Here, we present Segmentation analysis of DNA (SAD), a clustering algorithm constructed with a strategy in which all operational decisions are based on simple and rigorous applications of statistical principles, measurement theory and precise mathematical relations. Compared with existing packages, SAD is simpler in formulation, more user friendly, much faster and less thirsty for memory, offers higher accuracy and supplies quantitative statistics for its predictions. Unique among such algorithms, SAD's running time scales linearly with array size; on a typical modern notebook, it completes high-quality CNV analyses for a 250 thousand-probe array in ∼1 s and a 1.8 million-probe array in ∼8 s. Oxford University Press 2011-07 2011-05-14 /pmc/articles/PMC3141250/ /pubmed/21576227 http://dx.doi.org/10.1093/nar/gkr137 Text en © The Author(s) 2011. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Online Chen, Chih-Hao Lee, Hsing-Chung Ling, Qingdong Chen, Hsiao-Rong Ko, Yi-An Tsou, Tsong-Shan Wang, Sun-Chong Wu, Li-Ching Lee, H. C. An all-statistics, high-speed algorithm for the analysis of copy number variation in genomes |
title | An all-statistics, high-speed algorithm for the analysis of copy number variation in genomes |
title_full | An all-statistics, high-speed algorithm for the analysis of copy number variation in genomes |
title_fullStr | An all-statistics, high-speed algorithm for the analysis of copy number variation in genomes |
title_full_unstemmed | An all-statistics, high-speed algorithm for the analysis of copy number variation in genomes |
title_short | An all-statistics, high-speed algorithm for the analysis of copy number variation in genomes |
title_sort | all-statistics, high-speed algorithm for the analysis of copy number variation in genomes |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3141250/ https://www.ncbi.nlm.nih.gov/pubmed/21576227 http://dx.doi.org/10.1093/nar/gkr137 |
work_keys_str_mv | AT chenchihhao anallstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes AT leehsingchung anallstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes AT lingqingdong anallstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes AT chenhsiaorong anallstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes AT koyian anallstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes AT tsoutsongshan anallstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes AT wangsunchong anallstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes AT wuliching anallstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes AT leehc anallstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes AT chenchihhao allstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes AT leehsingchung allstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes AT lingqingdong allstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes AT chenhsiaorong allstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes AT koyian allstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes AT tsoutsongshan allstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes AT wangsunchong allstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes AT wuliching allstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes AT leehc allstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes |