Cargando…
SubPatCNV: approximate subspace pattern mining for mapping copy-number variations
BACKGROUND: Many DNA copy-number variations (CNVs) are known to lead to phenotypic variations and pathogenesis. While CNVs are often only common in a small number of samples in the studied population or patient cohort, previous work has not focused on customized identification of CNV regions that on...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4305219/ https://www.ncbi.nlm.nih.gov/pubmed/25591662 http://dx.doi.org/10.1186/s12859-014-0426-7 |
_version_ | 1782354196126760960 |
---|---|
author | Johnson, Nicholas Zhang, Huanan Fang, Gang Kumar, Vipin Kuang, Rui |
author_facet | Johnson, Nicholas Zhang, Huanan Fang, Gang Kumar, Vipin Kuang, Rui |
author_sort | Johnson, Nicholas |
collection | PubMed |
description | BACKGROUND: Many DNA copy-number variations (CNVs) are known to lead to phenotypic variations and pathogenesis. While CNVs are often only common in a small number of samples in the studied population or patient cohort, previous work has not focused on customized identification of CNV regions that only exhibit in subsets of samples with advanced data mining techniques to reliably answer questions such as “Which are all the chromosomal fragments showing nearly identical deletions or insertions in more than 30% of the individuals?”. RESULTS: We introduce a tool for mining CNV subspace patterns, namely SubPatCNV, which is capable of identifying all aberrant CNV regions specific to arbitrary sample subsets larger than a support threshold. By design, SubPatCNV is the implementation of a variation of approximate association pattern mining algorithm under a spatial constraint on the positional CNV probe features. In benchmark test, SubPatCNV was applied to identify population specific germline CNVs from four populations of HapMap samples. In experiments on the TCGA ovarian cancer dataset, SubPatCNV discovered many large aberrant CNV events in patient subgroups, and reported regions enriched with cancer relevant genes. In both HapMap data and TCGA data, it was observed that SubPatCNV employs approximate pattern mining to more effectively identify CNV subspace patterns that are consistent within a subgroup from high-density array data. CONCLUSIONS: SubPatCNV available through http://sourceforge.net/projects/subpatcnv/is a unique scalable open-source software tool that provides the flexibility of identifying CNV regions specific to sample subgroups of different sizes from high-density CNV array data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0426-7) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4305219 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-43052192015-02-03 SubPatCNV: approximate subspace pattern mining for mapping copy-number variations Johnson, Nicholas Zhang, Huanan Fang, Gang Kumar, Vipin Kuang, Rui BMC Bioinformatics Software BACKGROUND: Many DNA copy-number variations (CNVs) are known to lead to phenotypic variations and pathogenesis. While CNVs are often only common in a small number of samples in the studied population or patient cohort, previous work has not focused on customized identification of CNV regions that only exhibit in subsets of samples with advanced data mining techniques to reliably answer questions such as “Which are all the chromosomal fragments showing nearly identical deletions or insertions in more than 30% of the individuals?”. RESULTS: We introduce a tool for mining CNV subspace patterns, namely SubPatCNV, which is capable of identifying all aberrant CNV regions specific to arbitrary sample subsets larger than a support threshold. By design, SubPatCNV is the implementation of a variation of approximate association pattern mining algorithm under a spatial constraint on the positional CNV probe features. In benchmark test, SubPatCNV was applied to identify population specific germline CNVs from four populations of HapMap samples. In experiments on the TCGA ovarian cancer dataset, SubPatCNV discovered many large aberrant CNV events in patient subgroups, and reported regions enriched with cancer relevant genes. In both HapMap data and TCGA data, it was observed that SubPatCNV employs approximate pattern mining to more effectively identify CNV subspace patterns that are consistent within a subgroup from high-density array data. CONCLUSIONS: SubPatCNV available through http://sourceforge.net/projects/subpatcnv/is a unique scalable open-source software tool that provides the flexibility of identifying CNV regions specific to sample subgroups of different sizes from high-density CNV array data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-014-0426-7) contains supplementary material, which is available to authorized users. BioMed Central 2015-01-16 /pmc/articles/PMC4305219/ /pubmed/25591662 http://dx.doi.org/10.1186/s12859-014-0426-7 Text en © Johnson et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Johnson, Nicholas Zhang, Huanan Fang, Gang Kumar, Vipin Kuang, Rui SubPatCNV: approximate subspace pattern mining for mapping copy-number variations |
title | SubPatCNV: approximate subspace pattern mining for mapping copy-number variations |
title_full | SubPatCNV: approximate subspace pattern mining for mapping copy-number variations |
title_fullStr | SubPatCNV: approximate subspace pattern mining for mapping copy-number variations |
title_full_unstemmed | SubPatCNV: approximate subspace pattern mining for mapping copy-number variations |
title_short | SubPatCNV: approximate subspace pattern mining for mapping copy-number variations |
title_sort | subpatcnv: approximate subspace pattern mining for mapping copy-number variations |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4305219/ https://www.ncbi.nlm.nih.gov/pubmed/25591662 http://dx.doi.org/10.1186/s12859-014-0426-7 |
work_keys_str_mv | AT johnsonnicholas subpatcnvapproximatesubspacepatternminingformappingcopynumbervariations AT zhanghuanan subpatcnvapproximatesubspacepatternminingformappingcopynumbervariations AT fanggang subpatcnvapproximatesubspacepatternminingformappingcopynumbervariations AT kumarvipin subpatcnvapproximatesubspacepatternminingformappingcopynumbervariations AT kuangrui subpatcnvapproximatesubspacepatternminingformappingcopynumbervariations |