Cargando…

A mixture model with a reference-based automatic selection of components for disease classification from protein and/or gene expression levels

BACKGROUND: Bioinformatics data analysis is often using linear mixture model representing samples as additive mixture of components. Properly constrained blind matrix factorization methods extract those components using mixture samples only. However, automatic selection of extracted components to be...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kopriva, Ivica, Filipović, Marko
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3292585/ https://www.ncbi.nlm.nih.gov/pubmed/22208882 http://dx.doi.org/10.1186/1471-2105-12-496

_version_	1782225302612606976
author	Kopriva, Ivica Filipović, Marko
author_facet	Kopriva, Ivica Filipović, Marko
author_sort	Kopriva, Ivica
collection	PubMed
description	BACKGROUND: Bioinformatics data analysis is often using linear mixture model representing samples as additive mixture of components. Properly constrained blind matrix factorization methods extract those components using mixture samples only. However, automatic selection of extracted components to be retained for classification analysis remains an open issue. RESULTS: The method proposed here is applied to well-studied protein and genomic datasets of ovarian, prostate and colon cancers to extract components for disease prediction. It achieves average sensitivities of: 96.2 (sd = 2.7%), 97.6% (sd = 2.8%) and 90.8% (sd = 5.5%) and average specificities of: 93.6% (sd = 4.1%), 99% (sd = 2.2%) and 79.4% (sd = 9.8%) in 100 independent two-fold cross-validations. CONCLUSIONS: We propose an additive mixture model of a sample for feature extraction using, in principle, sparseness constrained factorization on a sample-by-sample basis. As opposed to that, existing methods factorize complete dataset simultaneously. The sample model is composed of a reference sample representing control and/or case (disease) groups and a test sample. Each sample is decomposed into two or more components that are selected automatically (without using label information) as control specific, case specific and not differentially expressed (neutral). The number of components is determined by cross-validation. Automatic assignment of features (m/z ratios or genes) to particular component is based on thresholds estimated from each sample directly. Due to the locality of decomposition, the strength of the expression of each feature across the samples can vary. Yet, they will still be allocated to the related disease and/or control specific component. Since label information is not used in the selection process, case and control specific components can be used for classification. That is not the case with standard factorization methods. Moreover, the component selected by proposed method as disease specific can be interpreted as a sub-mode and retained for further analysis to identify potential biomarkers. As opposed to standard matrix factorization methods this can be achieved on a sample (experiment)-by-sample basis. Postulating one or more components with indifferent features enables their removal from disease and control specific components on a sample-by-sample basis. This yields selected components with reduced complexity and generally, it increases prediction accuracy.
format	Online Article Text
id	pubmed-3292585
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-32925852012-03-05 A mixture model with a reference-based automatic selection of components for disease classification from protein and/or gene expression levels Kopriva, Ivica Filipović, Marko BMC Bioinformatics Methodology Article BACKGROUND: Bioinformatics data analysis is often using linear mixture model representing samples as additive mixture of components. Properly constrained blind matrix factorization methods extract those components using mixture samples only. However, automatic selection of extracted components to be retained for classification analysis remains an open issue. RESULTS: The method proposed here is applied to well-studied protein and genomic datasets of ovarian, prostate and colon cancers to extract components for disease prediction. It achieves average sensitivities of: 96.2 (sd = 2.7%), 97.6% (sd = 2.8%) and 90.8% (sd = 5.5%) and average specificities of: 93.6% (sd = 4.1%), 99% (sd = 2.2%) and 79.4% (sd = 9.8%) in 100 independent two-fold cross-validations. CONCLUSIONS: We propose an additive mixture model of a sample for feature extraction using, in principle, sparseness constrained factorization on a sample-by-sample basis. As opposed to that, existing methods factorize complete dataset simultaneously. The sample model is composed of a reference sample representing control and/or case (disease) groups and a test sample. Each sample is decomposed into two or more components that are selected automatically (without using label information) as control specific, case specific and not differentially expressed (neutral). The number of components is determined by cross-validation. Automatic assignment of features (m/z ratios or genes) to particular component is based on thresholds estimated from each sample directly. Due to the locality of decomposition, the strength of the expression of each feature across the samples can vary. Yet, they will still be allocated to the related disease and/or control specific component. Since label information is not used in the selection process, case and control specific components can be used for classification. That is not the case with standard factorization methods. Moreover, the component selected by proposed method as disease specific can be interpreted as a sub-mode and retained for further analysis to identify potential biomarkers. As opposed to standard matrix factorization methods this can be achieved on a sample (experiment)-by-sample basis. Postulating one or more components with indifferent features enables their removal from disease and control specific components on a sample-by-sample basis. This yields selected components with reduced complexity and generally, it increases prediction accuracy. BioMed Central 2011-12-30 /pmc/articles/PMC3292585/ /pubmed/22208882 http://dx.doi.org/10.1186/1471-2105-12-496 Text en Copyright ©2011 Kopriva and Filipović; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Kopriva, Ivica Filipović, Marko A mixture model with a reference-based automatic selection of components for disease classification from protein and/or gene expression levels
title	A mixture model with a reference-based automatic selection of components for disease classification from protein and/or gene expression levels
title_full	A mixture model with a reference-based automatic selection of components for disease classification from protein and/or gene expression levels
title_fullStr	A mixture model with a reference-based automatic selection of components for disease classification from protein and/or gene expression levels
title_full_unstemmed	A mixture model with a reference-based automatic selection of components for disease classification from protein and/or gene expression levels
title_short	A mixture model with a reference-based automatic selection of components for disease classification from protein and/or gene expression levels
title_sort	mixture model with a reference-based automatic selection of components for disease classification from protein and/or gene expression levels
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3292585/ https://www.ncbi.nlm.nih.gov/pubmed/22208882 http://dx.doi.org/10.1186/1471-2105-12-496
work_keys_str_mv	AT koprivaivica amixturemodelwithareferencebasedautomaticselectionofcomponentsfordiseaseclassificationfromproteinandorgeneexpressionlevels AT filipovicmarko amixturemodelwithareferencebasedautomaticselectionofcomponentsfordiseaseclassificationfromproteinandorgeneexpressionlevels AT koprivaivica mixturemodelwithareferencebasedautomaticselectionofcomponentsfordiseaseclassificationfromproteinandorgeneexpressionlevels AT filipovicmarko mixturemodelwithareferencebasedautomaticselectionofcomponentsfordiseaseclassificationfromproteinandorgeneexpressionlevels

A mixture model with a reference-based automatic selection of components for disease classification from protein and/or gene expression levels

Ejemplares similares