Cargando…

SubPatCNV: approximate subspace pattern mining for mapping copy-number variations

BACKGROUND: Many DNA copy-number variations (CNVs) are known to lead to phenotypic variations and pathogenesis. While CNVs are often only common in a small number of samples in the studied population or patient cohort, previous work has not focused on customized identification of CNV regions that on...

Descripción completa

Detalles Bibliográficos
Autores principales: Johnson, Nicholas, Zhang, Huanan, Fang, Gang, Kumar, Vipin, Kuang, Rui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4305219/
https://www.ncbi.nlm.nih.gov/pubmed/25591662
http://dx.doi.org/10.1186/s12859-014-0426-7
_version_ 1782354196126760960
author Johnson, Nicholas
Zhang, Huanan
Fang, Gang
Kumar, Vipin
Kuang, Rui
author_facet Johnson, Nicholas
Zhang, Huanan
Fang, Gang
Kumar, Vipin
Kuang, Rui
author_sort Johnson, Nicholas
collection PubMed
description BACKGROUND: Many DNA copy-number variations (CNVs) are known to lead to phenotypic variations and pathogenesis. While CNVs are often only common in a small number of samples in the studied population or patient cohort, previous work has not focused on customized identification of CNV regions that only exhibit in subsets of samples with advanced data mining techniques to reliably answer questions such as “Which are all the chromosomal fragments showing nearly identical deletions or insertions in more than 30% of the individuals?”. RESULTS: We introduce a tool for mining CNV subspace patterns, namely SubPatCNV, which is capable of identifying all aberrant CNV regions specific to arbitrary sample subsets larger than a support threshold. By design, SubPatCNV is the implementation of a variation of approximate association pattern mining algorithm under a spatial constraint on the positional CNV probe features. In benchmark test, SubPatCNV was applied to identify population specific germline CNVs from four populations of HapMap samples. In experiments on the TCGA ovarian cancer dataset, SubPatCNV discovered many large aberrant CNV events in patient subgroups, and reported regions enriched with cancer relevant genes. In both HapMap data and TCGA data, it was observed that SubPatCNV employs approximate pattern mining to more effectively identify CNV subspace patterns that are consistent within a subgroup from high-density array data. CONCLUSIONS: SubPatCNV available through http://sourceforge.net/projects/subpatcnv/is a unique scalable open-source software tool that provides the flexibility of identifying CNV regions specific to sample subgroups of different sizes from high-density CNV array data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0426-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4305219
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43052192015-02-03 SubPatCNV: approximate subspace pattern mining for mapping copy-number variations Johnson, Nicholas Zhang, Huanan Fang, Gang Kumar, Vipin Kuang, Rui BMC Bioinformatics Software BACKGROUND: Many DNA copy-number variations (CNVs) are known to lead to phenotypic variations and pathogenesis. While CNVs are often only common in a small number of samples in the studied population or patient cohort, previous work has not focused on customized identification of CNV regions that only exhibit in subsets of samples with advanced data mining techniques to reliably answer questions such as “Which are all the chromosomal fragments showing nearly identical deletions or insertions in more than 30% of the individuals?”. RESULTS: We introduce a tool for mining CNV subspace patterns, namely SubPatCNV, which is capable of identifying all aberrant CNV regions specific to arbitrary sample subsets larger than a support threshold. By design, SubPatCNV is the implementation of a variation of approximate association pattern mining algorithm under a spatial constraint on the positional CNV probe features. In benchmark test, SubPatCNV was applied to identify population specific germline CNVs from four populations of HapMap samples. In experiments on the TCGA ovarian cancer dataset, SubPatCNV discovered many large aberrant CNV events in patient subgroups, and reported regions enriched with cancer relevant genes. In both HapMap data and TCGA data, it was observed that SubPatCNV employs approximate pattern mining to more effectively identify CNV subspace patterns that are consistent within a subgroup from high-density array data. CONCLUSIONS: SubPatCNV available through http://sourceforge.net/projects/subpatcnv/is a unique scalable open-source software tool that provides the flexibility of identifying CNV regions specific to sample subgroups of different sizes from high-density CNV array data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0426-7) contains supplementary material, which is available to authorized users. BioMed Central 2015-01-16 /pmc/articles/PMC4305219/ /pubmed/25591662 http://dx.doi.org/10.1186/s12859-014-0426-7 Text en © Johnson et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Johnson, Nicholas
Zhang, Huanan
Fang, Gang
Kumar, Vipin
Kuang, Rui
SubPatCNV: approximate subspace pattern mining for mapping copy-number variations
title SubPatCNV: approximate subspace pattern mining for mapping copy-number variations
title_full SubPatCNV: approximate subspace pattern mining for mapping copy-number variations
title_fullStr SubPatCNV: approximate subspace pattern mining for mapping copy-number variations
title_full_unstemmed SubPatCNV: approximate subspace pattern mining for mapping copy-number variations
title_short SubPatCNV: approximate subspace pattern mining for mapping copy-number variations
title_sort subpatcnv: approximate subspace pattern mining for mapping copy-number variations
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4305219/
https://www.ncbi.nlm.nih.gov/pubmed/25591662
http://dx.doi.org/10.1186/s12859-014-0426-7
work_keys_str_mv AT johnsonnicholas subpatcnvapproximatesubspacepatternminingformappingcopynumbervariations
AT zhanghuanan subpatcnvapproximatesubspacepatternminingformappingcopynumbervariations
AT fanggang subpatcnvapproximatesubspacepatternminingformappingcopynumbervariations
AT kumarvipin subpatcnvapproximatesubspacepatternminingformappingcopynumbervariations
AT kuangrui subpatcnvapproximatesubspacepatternminingformappingcopynumbervariations