Cargando…

Knowledge Driven Variable Selection (KDVS) – a new approach to enrichment analysis of gene signatures obtained from high–throughput data

BACKGROUND: High–throughput (HT) technologies provide huge amount of gene expression data that can be used to identify biomarkers useful in the clinical practice. The most frequently used approaches first select a set of genes (i.e. gene signature) able to characterize differences between two or mor...

Descripción completa

Detalles Bibliográficos
Autores principales: Zycinski, Grzegorz, Barla, Annalisa, Squillario, Margherita, Sanavia, Tiziana, Camillo, Barbara Di, Verri, Alessandro
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3605163/
https://www.ncbi.nlm.nih.gov/pubmed/23302187
http://dx.doi.org/10.1186/1751-0473-8-2
_version_ 1782263834182942720
author Zycinski, Grzegorz
Barla, Annalisa
Squillario, Margherita
Sanavia, Tiziana
Camillo, Barbara Di
Verri, Alessandro
author_facet Zycinski, Grzegorz
Barla, Annalisa
Squillario, Margherita
Sanavia, Tiziana
Camillo, Barbara Di
Verri, Alessandro
author_sort Zycinski, Grzegorz
collection PubMed
description BACKGROUND: High–throughput (HT) technologies provide huge amount of gene expression data that can be used to identify biomarkers useful in the clinical practice. The most frequently used approaches first select a set of genes (i.e. gene signature) able to characterize differences between two or more phenotypical conditions, and then provide a functional assessment of the selected genes with an a posteriori enrichment analysis, based on biological knowledge. However, this approach comes with some drawbacks. First, gene selection procedure often requires tunable parameters that affect the outcome, typically producing many false hits. Second, a posteriori enrichment analysis is based on mapping between biological concepts and gene expression measurements, which is hard to compute because of constant changes in biological knowledge and genome analysis. Third, such mapping is typically used in the assessment of the coverage of gene signature by biological concepts, that is either score–based or requires tunable parameters as well, limiting its power. RESULTS: We present Knowledge Driven Variable Selection (KDVS), a framework that uses a priori biological knowledge in HT data analysis. The expression data matrix is transformed, according to prior knowledge, into smaller matrices, easier to analyze and to interpret from both computational and biological viewpoints. Therefore KDVS, unlike most approaches, does not exclude a priori any function or process potentially relevant for the biological question under investigation. Differently from the standard approach where gene selection and functional assessment are applied independently, KDVS embeds these two steps into a unified statistical framework, decreasing the variability derived from the threshold–dependent selection, the mapping to the biological concepts, and the signature coverage. We present three case studies to assess the usefulness of the method. CONCLUSIONS: We showed that KDVS not only enables the selection of known biological functionalities with accuracy, but also identification of new ones. An efficient implementation of KDVS was devised to obtain results in a fast and robust way. Computing time is drastically reduced by the effective use of distributed resources. Finally, integrated visualization techniques immediately increase the interpretability of results. Overall, KDVS approach can be considered as a viable alternative to enrichment–based approaches.
format Online
Article
Text
id pubmed-3605163
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-36051632013-03-26 Knowledge Driven Variable Selection (KDVS) – a new approach to enrichment analysis of gene signatures obtained from high–throughput data Zycinski, Grzegorz Barla, Annalisa Squillario, Margherita Sanavia, Tiziana Camillo, Barbara Di Verri, Alessandro Source Code Biol Med Methodology BACKGROUND: High–throughput (HT) technologies provide huge amount of gene expression data that can be used to identify biomarkers useful in the clinical practice. The most frequently used approaches first select a set of genes (i.e. gene signature) able to characterize differences between two or more phenotypical conditions, and then provide a functional assessment of the selected genes with an a posteriori enrichment analysis, based on biological knowledge. However, this approach comes with some drawbacks. First, gene selection procedure often requires tunable parameters that affect the outcome, typically producing many false hits. Second, a posteriori enrichment analysis is based on mapping between biological concepts and gene expression measurements, which is hard to compute because of constant changes in biological knowledge and genome analysis. Third, such mapping is typically used in the assessment of the coverage of gene signature by biological concepts, that is either score–based or requires tunable parameters as well, limiting its power. RESULTS: We present Knowledge Driven Variable Selection (KDVS), a framework that uses a priori biological knowledge in HT data analysis. The expression data matrix is transformed, according to prior knowledge, into smaller matrices, easier to analyze and to interpret from both computational and biological viewpoints. Therefore KDVS, unlike most approaches, does not exclude a priori any function or process potentially relevant for the biological question under investigation. Differently from the standard approach where gene selection and functional assessment are applied independently, KDVS embeds these two steps into a unified statistical framework, decreasing the variability derived from the threshold–dependent selection, the mapping to the biological concepts, and the signature coverage. We present three case studies to assess the usefulness of the method. CONCLUSIONS: We showed that KDVS not only enables the selection of known biological functionalities with accuracy, but also identification of new ones. An efficient implementation of KDVS was devised to obtain results in a fast and robust way. Computing time is drastically reduced by the effective use of distributed resources. Finally, integrated visualization techniques immediately increase the interpretability of results. Overall, KDVS approach can be considered as a viable alternative to enrichment–based approaches. BioMed Central 2013-01-09 /pmc/articles/PMC3605163/ /pubmed/23302187 http://dx.doi.org/10.1186/1751-0473-8-2 Text en Copyright ©2013 Zycinski et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology
Zycinski, Grzegorz
Barla, Annalisa
Squillario, Margherita
Sanavia, Tiziana
Camillo, Barbara Di
Verri, Alessandro
Knowledge Driven Variable Selection (KDVS) – a new approach to enrichment analysis of gene signatures obtained from high–throughput data
title Knowledge Driven Variable Selection (KDVS) – a new approach to enrichment analysis of gene signatures obtained from high–throughput data
title_full Knowledge Driven Variable Selection (KDVS) – a new approach to enrichment analysis of gene signatures obtained from high–throughput data
title_fullStr Knowledge Driven Variable Selection (KDVS) – a new approach to enrichment analysis of gene signatures obtained from high–throughput data
title_full_unstemmed Knowledge Driven Variable Selection (KDVS) – a new approach to enrichment analysis of gene signatures obtained from high–throughput data
title_short Knowledge Driven Variable Selection (KDVS) – a new approach to enrichment analysis of gene signatures obtained from high–throughput data
title_sort knowledge driven variable selection (kdvs) – a new approach to enrichment analysis of gene signatures obtained from high–throughput data
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3605163/
https://www.ncbi.nlm.nih.gov/pubmed/23302187
http://dx.doi.org/10.1186/1751-0473-8-2
work_keys_str_mv AT zycinskigrzegorz knowledgedrivenvariableselectionkdvsanewapproachtoenrichmentanalysisofgenesignaturesobtainedfromhighthroughputdata
AT barlaannalisa knowledgedrivenvariableselectionkdvsanewapproachtoenrichmentanalysisofgenesignaturesobtainedfromhighthroughputdata
AT squillariomargherita knowledgedrivenvariableselectionkdvsanewapproachtoenrichmentanalysisofgenesignaturesobtainedfromhighthroughputdata
AT sanaviatiziana knowledgedrivenvariableselectionkdvsanewapproachtoenrichmentanalysisofgenesignaturesobtainedfromhighthroughputdata
AT camillobarbaradi knowledgedrivenvariableselectionkdvsanewapproachtoenrichmentanalysisofgenesignaturesobtainedfromhighthroughputdata
AT verrialessandro knowledgedrivenvariableselectionkdvsanewapproachtoenrichmentanalysisofgenesignaturesobtainedfromhighthroughputdata