Cargando…

An all-statistics, high-speed algorithm for the analysis of copy number variation in genomes

Detection of copy number variation (CNV) in DNA has recently become an important method for understanding the pathogenesis of cancer. While existing algorithms for extracting CNV from microarray data have worked reasonably well, the trend towards ever larger sample sizes and higher resolution microa...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Chih-Hao, Lee, Hsing-Chung, Ling, Qingdong, Chen, Hsiao-Rong, Ko, Yi-An, Tsou, Tsong-Shan, Wang, Sun-Chong, Wu, Li-Ching, Lee, H. C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3141250/
https://www.ncbi.nlm.nih.gov/pubmed/21576227
http://dx.doi.org/10.1093/nar/gkr137
_version_ 1782208646625624064
author Chen, Chih-Hao
Lee, Hsing-Chung
Ling, Qingdong
Chen, Hsiao-Rong
Ko, Yi-An
Tsou, Tsong-Shan
Wang, Sun-Chong
Wu, Li-Ching
Lee, H. C.
author_facet Chen, Chih-Hao
Lee, Hsing-Chung
Ling, Qingdong
Chen, Hsiao-Rong
Ko, Yi-An
Tsou, Tsong-Shan
Wang, Sun-Chong
Wu, Li-Ching
Lee, H. C.
author_sort Chen, Chih-Hao
collection PubMed
description Detection of copy number variation (CNV) in DNA has recently become an important method for understanding the pathogenesis of cancer. While existing algorithms for extracting CNV from microarray data have worked reasonably well, the trend towards ever larger sample sizes and higher resolution microarrays has vastly increased the challenges they face. Here, we present Segmentation analysis of DNA (SAD), a clustering algorithm constructed with a strategy in which all operational decisions are based on simple and rigorous applications of statistical principles, measurement theory and precise mathematical relations. Compared with existing packages, SAD is simpler in formulation, more user friendly, much faster and less thirsty for memory, offers higher accuracy and supplies quantitative statistics for its predictions. Unique among such algorithms, SAD's running time scales linearly with array size; on a typical modern notebook, it completes high-quality CNV analyses for a 250 thousand-probe array in ∼1 s and a 1.8 million-probe array in ∼8 s.
format Online
Article
Text
id pubmed-3141250
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-31412502011-07-22 An all-statistics, high-speed algorithm for the analysis of copy number variation in genomes Chen, Chih-Hao Lee, Hsing-Chung Ling, Qingdong Chen, Hsiao-Rong Ko, Yi-An Tsou, Tsong-Shan Wang, Sun-Chong Wu, Li-Ching Lee, H. C. Nucleic Acids Res Methods Online Detection of copy number variation (CNV) in DNA has recently become an important method for understanding the pathogenesis of cancer. While existing algorithms for extracting CNV from microarray data have worked reasonably well, the trend towards ever larger sample sizes and higher resolution microarrays has vastly increased the challenges they face. Here, we present Segmentation analysis of DNA (SAD), a clustering algorithm constructed with a strategy in which all operational decisions are based on simple and rigorous applications of statistical principles, measurement theory and precise mathematical relations. Compared with existing packages, SAD is simpler in formulation, more user friendly, much faster and less thirsty for memory, offers higher accuracy and supplies quantitative statistics for its predictions. Unique among such algorithms, SAD's running time scales linearly with array size; on a typical modern notebook, it completes high-quality CNV analyses for a 250 thousand-probe array in ∼1 s and a 1.8 million-probe array in ∼8 s. Oxford University Press 2011-07 2011-05-14 /pmc/articles/PMC3141250/ /pubmed/21576227 http://dx.doi.org/10.1093/nar/gkr137 Text en © The Author(s) 2011. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Chen, Chih-Hao
Lee, Hsing-Chung
Ling, Qingdong
Chen, Hsiao-Rong
Ko, Yi-An
Tsou, Tsong-Shan
Wang, Sun-Chong
Wu, Li-Ching
Lee, H. C.
An all-statistics, high-speed algorithm for the analysis of copy number variation in genomes
title An all-statistics, high-speed algorithm for the analysis of copy number variation in genomes
title_full An all-statistics, high-speed algorithm for the analysis of copy number variation in genomes
title_fullStr An all-statistics, high-speed algorithm for the analysis of copy number variation in genomes
title_full_unstemmed An all-statistics, high-speed algorithm for the analysis of copy number variation in genomes
title_short An all-statistics, high-speed algorithm for the analysis of copy number variation in genomes
title_sort all-statistics, high-speed algorithm for the analysis of copy number variation in genomes
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3141250/
https://www.ncbi.nlm.nih.gov/pubmed/21576227
http://dx.doi.org/10.1093/nar/gkr137
work_keys_str_mv AT chenchihhao anallstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes
AT leehsingchung anallstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes
AT lingqingdong anallstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes
AT chenhsiaorong anallstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes
AT koyian anallstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes
AT tsoutsongshan anallstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes
AT wangsunchong anallstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes
AT wuliching anallstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes
AT leehc anallstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes
AT chenchihhao allstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes
AT leehsingchung allstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes
AT lingqingdong allstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes
AT chenhsiaorong allstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes
AT koyian allstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes
AT tsoutsongshan allstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes
AT wangsunchong allstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes
AT wuliching allstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes
AT leehc allstatisticshighspeedalgorithmfortheanalysisofcopynumbervariationingenomes