Cargando…

Improving biomarker list stability by integration of biological knowledge in the learning process

BACKGROUND: The identification of robust lists of molecular biomarkers related to a disease is a fundamental step for early diagnosis and treatment. However, methodologies for biomarker discovery using microarray data often provide results with limited overlap. It has been suggested that one reason...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sanavia, Tiziana, Aiolli, Fabio, Da San Martino, Giovanni, Bisognin, Andrea, Di Camillo, Barbara
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3314566/ https://www.ncbi.nlm.nih.gov/pubmed/22536969 http://dx.doi.org/10.1186/1471-2105-13-S4-S22

_version_	1782228101917310976
author	Sanavia, Tiziana Aiolli, Fabio Da San Martino, Giovanni Bisognin, Andrea Di Camillo, Barbara
author_facet	Sanavia, Tiziana Aiolli, Fabio Da San Martino, Giovanni Bisognin, Andrea Di Camillo, Barbara
author_sort	Sanavia, Tiziana
collection	PubMed
description	BACKGROUND: The identification of robust lists of molecular biomarkers related to a disease is a fundamental step for early diagnosis and treatment. However, methodologies for biomarker discovery using microarray data often provide results with limited overlap. It has been suggested that one reason for these inconsistencies may be that in complex diseases, such as cancer, multiple genes belonging to one or more physiological pathways are associated with the outcomes. Thus, a possible approach to improve list stability is to integrate biological information from genomic databases in the learning process; however, a comprehensive assessment based on different types of biological information is still lacking in the literature. In this work we have compared the effect of using different biological information in the learning process like functional annotations, protein-protein interactions and expression correlation among genes. RESULTS: Biological knowledge has been codified by means of gene similarity matrices and expression data linearly transformed in such a way that the more similar two features are, the more closely they are mapped. Two semantic similarity matrices, based on Biological Process and Molecular Function Gene Ontology annotation, and geodesic distance applied on protein-protein interaction networks, are the best performers in improving list stability maintaining almost equal prediction accuracy. CONCLUSIONS: The performed analysis supports the idea that when some features are strongly correlated to each other, for example because are close in the protein-protein interaction network, then they might have similar importance and are equally relevant for the task at hand. Obtained results can be a starting point for additional experiments on combining similarity matrices in order to obtain even more stable lists of biomarkers. The implementation of the classification algorithm is available at the link: http://www.math.unipd.it/~dasan/biomarkers.html.
format	Online Article Text
id	pubmed-3314566
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-33145662012-04-02 Improving biomarker list stability by integration of biological knowledge in the learning process Sanavia, Tiziana Aiolli, Fabio Da San Martino, Giovanni Bisognin, Andrea Di Camillo, Barbara BMC Bioinformatics Research BACKGROUND: The identification of robust lists of molecular biomarkers related to a disease is a fundamental step for early diagnosis and treatment. However, methodologies for biomarker discovery using microarray data often provide results with limited overlap. It has been suggested that one reason for these inconsistencies may be that in complex diseases, such as cancer, multiple genes belonging to one or more physiological pathways are associated with the outcomes. Thus, a possible approach to improve list stability is to integrate biological information from genomic databases in the learning process; however, a comprehensive assessment based on different types of biological information is still lacking in the literature. In this work we have compared the effect of using different biological information in the learning process like functional annotations, protein-protein interactions and expression correlation among genes. RESULTS: Biological knowledge has been codified by means of gene similarity matrices and expression data linearly transformed in such a way that the more similar two features are, the more closely they are mapped. Two semantic similarity matrices, based on Biological Process and Molecular Function Gene Ontology annotation, and geodesic distance applied on protein-protein interaction networks, are the best performers in improving list stability maintaining almost equal prediction accuracy. CONCLUSIONS: The performed analysis supports the idea that when some features are strongly correlated to each other, for example because are close in the protein-protein interaction network, then they might have similar importance and are equally relevant for the task at hand. Obtained results can be a starting point for additional experiments on combining similarity matrices in order to obtain even more stable lists of biomarkers. The implementation of the classification algorithm is available at the link: http://www.math.unipd.it/~dasan/biomarkers.html. BioMed Central 2012-03-28 /pmc/articles/PMC3314566/ /pubmed/22536969 http://dx.doi.org/10.1186/1471-2105-13-S4-S22 Text en Copyright ©2012 Sanavia et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Sanavia, Tiziana Aiolli, Fabio Da San Martino, Giovanni Bisognin, Andrea Di Camillo, Barbara Improving biomarker list stability by integration of biological knowledge in the learning process
title	Improving biomarker list stability by integration of biological knowledge in the learning process
title_full	Improving biomarker list stability by integration of biological knowledge in the learning process
title_fullStr	Improving biomarker list stability by integration of biological knowledge in the learning process
title_full_unstemmed	Improving biomarker list stability by integration of biological knowledge in the learning process
title_short	Improving biomarker list stability by integration of biological knowledge in the learning process
title_sort	improving biomarker list stability by integration of biological knowledge in the learning process
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3314566/ https://www.ncbi.nlm.nih.gov/pubmed/22536969 http://dx.doi.org/10.1186/1471-2105-13-S4-S22
work_keys_str_mv	AT sanaviatiziana improvingbiomarkerliststabilitybyintegrationofbiologicalknowledgeinthelearningprocess AT aiollifabio improvingbiomarkerliststabilitybyintegrationofbiologicalknowledgeinthelearningprocess AT dasanmartinogiovanni improvingbiomarkerliststabilitybyintegrationofbiologicalknowledgeinthelearningprocess AT bisogninandrea improvingbiomarkerliststabilitybyintegrationofbiologicalknowledgeinthelearningprocess AT dicamillobarbara improvingbiomarkerliststabilitybyintegrationofbiologicalknowledgeinthelearningprocess

Improving biomarker list stability by integration of biological knowledge in the learning process

Ejemplares similares