Cargando…
Knowledge Driven Variable Selection (KDVS) – a new approach to enrichment analysis of gene signatures obtained from high–throughput data
BACKGROUND: High–throughput (HT) technologies provide huge amount of gene expression data that can be used to identify biomarkers useful in the clinical practice. The most frequently used approaches first select a set of genes (i.e. gene signature) able to characterize differences between two or mor...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3605163/ https://www.ncbi.nlm.nih.gov/pubmed/23302187 http://dx.doi.org/10.1186/1751-0473-8-2 |
_version_ | 1782263834182942720 |
---|---|
author | Zycinski, Grzegorz Barla, Annalisa Squillario, Margherita Sanavia, Tiziana Camillo, Barbara Di Verri, Alessandro |
author_facet | Zycinski, Grzegorz Barla, Annalisa Squillario, Margherita Sanavia, Tiziana Camillo, Barbara Di Verri, Alessandro |
author_sort | Zycinski, Grzegorz |
collection | PubMed |
description | BACKGROUND: High–throughput (HT) technologies provide huge amount of gene expression data that can be used to identify biomarkers useful in the clinical practice. The most frequently used approaches first select a set of genes (i.e. gene signature) able to characterize differences between two or more phenotypical conditions, and then provide a functional assessment of the selected genes with an a posteriori enrichment analysis, based on biological knowledge. However, this approach comes with some drawbacks. First, gene selection procedure often requires tunable parameters that affect the outcome, typically producing many false hits. Second, a posteriori enrichment analysis is based on mapping between biological concepts and gene expression measurements, which is hard to compute because of constant changes in biological knowledge and genome analysis. Third, such mapping is typically used in the assessment of the coverage of gene signature by biological concepts, that is either score–based or requires tunable parameters as well, limiting its power. RESULTS: We present Knowledge Driven Variable Selection (KDVS), a framework that uses a priori biological knowledge in HT data analysis. The expression data matrix is transformed, according to prior knowledge, into smaller matrices, easier to analyze and to interpret from both computational and biological viewpoints. Therefore KDVS, unlike most approaches, does not exclude a priori any function or process potentially relevant for the biological question under investigation. Differently from the standard approach where gene selection and functional assessment are applied independently, KDVS embeds these two steps into a unified statistical framework, decreasing the variability derived from the threshold–dependent selection, the mapping to the biological concepts, and the signature coverage. We present three case studies to assess the usefulness of the method. CONCLUSIONS: We showed that KDVS not only enables the selection of known biological functionalities with accuracy, but also identification of new ones. An efficient implementation of KDVS was devised to obtain results in a fast and robust way. Computing time is drastically reduced by the effective use of distributed resources. Finally, integrated visualization techniques immediately increase the interpretability of results. Overall, KDVS approach can be considered as a viable alternative to enrichment–based approaches. |
format | Online Article Text |
id | pubmed-3605163 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-36051632013-03-26 Knowledge Driven Variable Selection (KDVS) – a new approach to enrichment analysis of gene signatures obtained from high–throughput data Zycinski, Grzegorz Barla, Annalisa Squillario, Margherita Sanavia, Tiziana Camillo, Barbara Di Verri, Alessandro Source Code Biol Med Methodology BACKGROUND: High–throughput (HT) technologies provide huge amount of gene expression data that can be used to identify biomarkers useful in the clinical practice. The most frequently used approaches first select a set of genes (i.e. gene signature) able to characterize differences between two or more phenotypical conditions, and then provide a functional assessment of the selected genes with an a posteriori enrichment analysis, based on biological knowledge. However, this approach comes with some drawbacks. First, gene selection procedure often requires tunable parameters that affect the outcome, typically producing many false hits. Second, a posteriori enrichment analysis is based on mapping between biological concepts and gene expression measurements, which is hard to compute because of constant changes in biological knowledge and genome analysis. Third, such mapping is typically used in the assessment of the coverage of gene signature by biological concepts, that is either score–based or requires tunable parameters as well, limiting its power. RESULTS: We present Knowledge Driven Variable Selection (KDVS), a framework that uses a priori biological knowledge in HT data analysis. The expression data matrix is transformed, according to prior knowledge, into smaller matrices, easier to analyze and to interpret from both computational and biological viewpoints. Therefore KDVS, unlike most approaches, does not exclude a priori any function or process potentially relevant for the biological question under investigation. Differently from the standard approach where gene selection and functional assessment are applied independently, KDVS embeds these two steps into a unified statistical framework, decreasing the variability derived from the threshold–dependent selection, the mapping to the biological concepts, and the signature coverage. We present three case studies to assess the usefulness of the method. CONCLUSIONS: We showed that KDVS not only enables the selection of known biological functionalities with accuracy, but also identification of new ones. An efficient implementation of KDVS was devised to obtain results in a fast and robust way. Computing time is drastically reduced by the effective use of distributed resources. Finally, integrated visualization techniques immediately increase the interpretability of results. Overall, KDVS approach can be considered as a viable alternative to enrichment–based approaches. BioMed Central 2013-01-09 /pmc/articles/PMC3605163/ /pubmed/23302187 http://dx.doi.org/10.1186/1751-0473-8-2 Text en Copyright ©2013 Zycinski et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Zycinski, Grzegorz Barla, Annalisa Squillario, Margherita Sanavia, Tiziana Camillo, Barbara Di Verri, Alessandro Knowledge Driven Variable Selection (KDVS) – a new approach to enrichment analysis of gene signatures obtained from high–throughput data |
title | Knowledge Driven Variable Selection (KDVS) – a new approach to enrichment analysis of gene signatures obtained from high–throughput data |
title_full | Knowledge Driven Variable Selection (KDVS) – a new approach to enrichment analysis of gene signatures obtained from high–throughput data |
title_fullStr | Knowledge Driven Variable Selection (KDVS) – a new approach to enrichment analysis of gene signatures obtained from high–throughput data |
title_full_unstemmed | Knowledge Driven Variable Selection (KDVS) – a new approach to enrichment analysis of gene signatures obtained from high–throughput data |
title_short | Knowledge Driven Variable Selection (KDVS) – a new approach to enrichment analysis of gene signatures obtained from high–throughput data |
title_sort | knowledge driven variable selection (kdvs) – a new approach to enrichment analysis of gene signatures obtained from high–throughput data |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3605163/ https://www.ncbi.nlm.nih.gov/pubmed/23302187 http://dx.doi.org/10.1186/1751-0473-8-2 |
work_keys_str_mv | AT zycinskigrzegorz knowledgedrivenvariableselectionkdvsanewapproachtoenrichmentanalysisofgenesignaturesobtainedfromhighthroughputdata AT barlaannalisa knowledgedrivenvariableselectionkdvsanewapproachtoenrichmentanalysisofgenesignaturesobtainedfromhighthroughputdata AT squillariomargherita knowledgedrivenvariableselectionkdvsanewapproachtoenrichmentanalysisofgenesignaturesobtainedfromhighthroughputdata AT sanaviatiziana knowledgedrivenvariableselectionkdvsanewapproachtoenrichmentanalysisofgenesignaturesobtainedfromhighthroughputdata AT camillobarbaradi knowledgedrivenvariableselectionkdvsanewapproachtoenrichmentanalysisofgenesignaturesobtainedfromhighthroughputdata AT verrialessandro knowledgedrivenvariableselectionkdvsanewapproachtoenrichmentanalysisofgenesignaturesobtainedfromhighthroughputdata |