Cargando…

A machine learning heuristic to identify biologically relevant and minimal biomarker panels from omics data

BACKGROUND: Investigations into novel biomarkers using omics techniques generate large amounts of data. Due to their size and numbers of attributes, these data are suitable for analysis with machine learning methods. A key component of typical machine learning pipelines for omics data is feature sel...

Descripción completa

Detalles Bibliográficos
Autores principales: Swan, Anna L, Stekel, Dov J, Hodgman, Charlie, Allaway, David, Alqahtani, Mohammed H, Mobasheri, Ali, Bacardit, Jaume
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4315157/
https://www.ncbi.nlm.nih.gov/pubmed/25923811
http://dx.doi.org/10.1186/1471-2164-16-S1-S2
_version_ 1782355436902547456
author Swan, Anna L
Stekel, Dov J
Hodgman, Charlie
Allaway, David
Alqahtani, Mohammed H
Mobasheri, Ali
Bacardit, Jaume
author_facet Swan, Anna L
Stekel, Dov J
Hodgman, Charlie
Allaway, David
Alqahtani, Mohammed H
Mobasheri, Ali
Bacardit, Jaume
author_sort Swan, Anna L
collection PubMed
description BACKGROUND: Investigations into novel biomarkers using omics techniques generate large amounts of data. Due to their size and numbers of attributes, these data are suitable for analysis with machine learning methods. A key component of typical machine learning pipelines for omics data is feature selection, which is used to reduce the raw high-dimensional data into a tractable number of features. Feature selection needs to balance the objective of using as few features as possible, while maintaining high predictive power. This balance is crucial when the goal of data analysis is the identification of highly accurate but small panels of biomarkers with potential clinical utility. In this paper we propose a heuristic for the selection of very small feature subsets, via an iterative feature elimination process that is guided by rule-based machine learning, called RGIFE (Rule-guided Iterative Feature Elimination). We use this heuristic to identify putative biomarkers of osteoarthritis (OA), articular cartilage degradation and synovial inflammation, using both proteomic and transcriptomic datasets. RESULTS AND DISCUSSION: Our RGIFE heuristic increased the classification accuracies achieved for all datasets when no feature selection is used, and performed well in a comparison with other feature selection methods. Using this method the datasets were reduced to a smaller number of genes or proteins, including those known to be relevant to OA, cartilage degradation and joint inflammation. The results have shown the RGIFE feature reduction method to be suitable for analysing both proteomic and transcriptomics data. Methods that generate large ‘omics’ datasets are increasingly being used in the area of rheumatology. CONCLUSIONS: Feature reduction methods are advantageous for the analysis of omics data in the field of rheumatology, as the applications of such techniques are likely to result in improvements in diagnosis, treatment and drug discovery.
format Online
Article
Text
id pubmed-4315157
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43151572015-02-09 A machine learning heuristic to identify biologically relevant and minimal biomarker panels from omics data Swan, Anna L Stekel, Dov J Hodgman, Charlie Allaway, David Alqahtani, Mohammed H Mobasheri, Ali Bacardit, Jaume BMC Genomics Research BACKGROUND: Investigations into novel biomarkers using omics techniques generate large amounts of data. Due to their size and numbers of attributes, these data are suitable for analysis with machine learning methods. A key component of typical machine learning pipelines for omics data is feature selection, which is used to reduce the raw high-dimensional data into a tractable number of features. Feature selection needs to balance the objective of using as few features as possible, while maintaining high predictive power. This balance is crucial when the goal of data analysis is the identification of highly accurate but small panels of biomarkers with potential clinical utility. In this paper we propose a heuristic for the selection of very small feature subsets, via an iterative feature elimination process that is guided by rule-based machine learning, called RGIFE (Rule-guided Iterative Feature Elimination). We use this heuristic to identify putative biomarkers of osteoarthritis (OA), articular cartilage degradation and synovial inflammation, using both proteomic and transcriptomic datasets. RESULTS AND DISCUSSION: Our RGIFE heuristic increased the classification accuracies achieved for all datasets when no feature selection is used, and performed well in a comparison with other feature selection methods. Using this method the datasets were reduced to a smaller number of genes or proteins, including those known to be relevant to OA, cartilage degradation and joint inflammation. The results have shown the RGIFE feature reduction method to be suitable for analysing both proteomic and transcriptomics data. Methods that generate large ‘omics’ datasets are increasingly being used in the area of rheumatology. CONCLUSIONS: Feature reduction methods are advantageous for the analysis of omics data in the field of rheumatology, as the applications of such techniques are likely to result in improvements in diagnosis, treatment and drug discovery. BioMed Central 2015-01-15 /pmc/articles/PMC4315157/ /pubmed/25923811 http://dx.doi.org/10.1186/1471-2164-16-S1-S2 Text en Copyright © 2015 Swan et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Swan, Anna L
Stekel, Dov J
Hodgman, Charlie
Allaway, David
Alqahtani, Mohammed H
Mobasheri, Ali
Bacardit, Jaume
A machine learning heuristic to identify biologically relevant and minimal biomarker panels from omics data
title A machine learning heuristic to identify biologically relevant and minimal biomarker panels from omics data
title_full A machine learning heuristic to identify biologically relevant and minimal biomarker panels from omics data
title_fullStr A machine learning heuristic to identify biologically relevant and minimal biomarker panels from omics data
title_full_unstemmed A machine learning heuristic to identify biologically relevant and minimal biomarker panels from omics data
title_short A machine learning heuristic to identify biologically relevant and minimal biomarker panels from omics data
title_sort machine learning heuristic to identify biologically relevant and minimal biomarker panels from omics data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4315157/
https://www.ncbi.nlm.nih.gov/pubmed/25923811
http://dx.doi.org/10.1186/1471-2164-16-S1-S2
work_keys_str_mv AT swanannal amachinelearningheuristictoidentifybiologicallyrelevantandminimalbiomarkerpanelsfromomicsdata
AT stekeldovj amachinelearningheuristictoidentifybiologicallyrelevantandminimalbiomarkerpanelsfromomicsdata
AT hodgmancharlie amachinelearningheuristictoidentifybiologicallyrelevantandminimalbiomarkerpanelsfromomicsdata
AT allawaydavid amachinelearningheuristictoidentifybiologicallyrelevantandminimalbiomarkerpanelsfromomicsdata
AT alqahtanimohammedh amachinelearningheuristictoidentifybiologicallyrelevantandminimalbiomarkerpanelsfromomicsdata
AT mobasheriali amachinelearningheuristictoidentifybiologicallyrelevantandminimalbiomarkerpanelsfromomicsdata
AT bacarditjaume amachinelearningheuristictoidentifybiologicallyrelevantandminimalbiomarkerpanelsfromomicsdata
AT swanannal machinelearningheuristictoidentifybiologicallyrelevantandminimalbiomarkerpanelsfromomicsdata
AT stekeldovj machinelearningheuristictoidentifybiologicallyrelevantandminimalbiomarkerpanelsfromomicsdata
AT hodgmancharlie machinelearningheuristictoidentifybiologicallyrelevantandminimalbiomarkerpanelsfromomicsdata
AT allawaydavid machinelearningheuristictoidentifybiologicallyrelevantandminimalbiomarkerpanelsfromomicsdata
AT alqahtanimohammedh machinelearningheuristictoidentifybiologicallyrelevantandminimalbiomarkerpanelsfromomicsdata
AT mobasheriali machinelearningheuristictoidentifybiologicallyrelevantandminimalbiomarkerpanelsfromomicsdata
AT bacarditjaume machinelearningheuristictoidentifybiologicallyrelevantandminimalbiomarkerpanelsfromomicsdata