Cargando…

Penalized weighted low-rank approximation for robust recovery of recurrent copy number variations

BACKGROUND: Copy number variation (CNV) analysis has become one of the most important research areas for understanding complex disease. With increasing resolution of array-based comparative genomic hybridization (aCGH) arrays, more and more raw copy number data are collected for multiple arrays. It...

Descripción completa

Detalles Bibliográficos
Autor principal: Gao, Xiaoli
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4676147/
https://www.ncbi.nlm.nih.gov/pubmed/26652207
http://dx.doi.org/10.1186/s12859-015-0835-2
_version_ 1782405121639972864
author Gao, Xiaoli
author_facet Gao, Xiaoli
author_sort Gao, Xiaoli
collection PubMed
description BACKGROUND: Copy number variation (CNV) analysis has become one of the most important research areas for understanding complex disease. With increasing resolution of array-based comparative genomic hybridization (aCGH) arrays, more and more raw copy number data are collected for multiple arrays. It is natural to realize the co-existence of both recurrent and individual-specific CNVs, together with the possible data contamination during the data generation process. Therefore, there is a great need for an efficient and robust statistical model for simultaneous recovery of both recurrent and individual-specific CNVs. RESULT: We develop a penalized weighted low-rank approximation method (WPLA) for robust recovery of recurrent CNVs. In particular, we formulate multiple aCGH arrays into a realization of a hidden low-rank matrix with some random noises and let an additional weight matrix account for those individual-specific effects. Thus, we do not restrict the random noise to be normally distributed, or even homogeneous. We show its performance through three real datasets and twelve synthetic datasets from different types of recurrent CNV regions associated with either normal random errors or heavily contaminated errors. CONCLUSION: Our numerical experiments have demonstrated that the WPLA can successfully recover the recurrent CNV patterns from raw data under different scenarios. Compared with two other recent methods, it performs the best regarding its ability to simultaneously detect both recurrent and individual-specific CNVs under normal random errors. More importantly, the WPLA is the only method which can effectively recover the recurrent CNVs region when the data is heavily contaminated. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0835-2) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4676147
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-46761472015-12-12 Penalized weighted low-rank approximation for robust recovery of recurrent copy number variations Gao, Xiaoli BMC Bioinformatics Research Article BACKGROUND: Copy number variation (CNV) analysis has become one of the most important research areas for understanding complex disease. With increasing resolution of array-based comparative genomic hybridization (aCGH) arrays, more and more raw copy number data are collected for multiple arrays. It is natural to realize the co-existence of both recurrent and individual-specific CNVs, together with the possible data contamination during the data generation process. Therefore, there is a great need for an efficient and robust statistical model for simultaneous recovery of both recurrent and individual-specific CNVs. RESULT: We develop a penalized weighted low-rank approximation method (WPLA) for robust recovery of recurrent CNVs. In particular, we formulate multiple aCGH arrays into a realization of a hidden low-rank matrix with some random noises and let an additional weight matrix account for those individual-specific effects. Thus, we do not restrict the random noise to be normally distributed, or even homogeneous. We show its performance through three real datasets and twelve synthetic datasets from different types of recurrent CNV regions associated with either normal random errors or heavily contaminated errors. CONCLUSION: Our numerical experiments have demonstrated that the WPLA can successfully recover the recurrent CNV patterns from raw data under different scenarios. Compared with two other recent methods, it performs the best regarding its ability to simultaneously detect both recurrent and individual-specific CNVs under normal random errors. More importantly, the WPLA is the only method which can effectively recover the recurrent CNVs region when the data is heavily contaminated. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0835-2) contains supplementary material, which is available to authorized users. BioMed Central 2015-12-10 /pmc/articles/PMC4676147/ /pubmed/26652207 http://dx.doi.org/10.1186/s12859-015-0835-2 Text en © Gao 2015 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Gao, Xiaoli
Penalized weighted low-rank approximation for robust recovery of recurrent copy number variations
title Penalized weighted low-rank approximation for robust recovery of recurrent copy number variations
title_full Penalized weighted low-rank approximation for robust recovery of recurrent copy number variations
title_fullStr Penalized weighted low-rank approximation for robust recovery of recurrent copy number variations
title_full_unstemmed Penalized weighted low-rank approximation for robust recovery of recurrent copy number variations
title_short Penalized weighted low-rank approximation for robust recovery of recurrent copy number variations
title_sort penalized weighted low-rank approximation for robust recovery of recurrent copy number variations
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4676147/
https://www.ncbi.nlm.nih.gov/pubmed/26652207
http://dx.doi.org/10.1186/s12859-015-0835-2
work_keys_str_mv AT gaoxiaoli penalizedweightedlowrankapproximationforrobustrecoveryofrecurrentcopynumbervariations