Cargando…

A comparison of graph- and kernel-based –omics data integration algorithms for classifying complex traits

BACKGROUND: High-throughput sequencing data are widely collected and analyzed in the study of complex diseases in quest of improving human health. Well-studied algorithms mostly deal with single data source, and cannot fully utilize the potential of these multi-omics data sources. In order to provid...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yan, Kang K., Zhao, Hongyu, Pang, Herbert
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2017
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6389230/ https://www.ncbi.nlm.nih.gov/pubmed/29212468 http://dx.doi.org/10.1186/s12859-017-1982-4

_version_	1783397921440399360
author	Yan, Kang K. Zhao, Hongyu Pang, Herbert
author_facet	Yan, Kang K. Zhao, Hongyu Pang, Herbert
author_sort	Yan, Kang K.
collection	PubMed
description	BACKGROUND: High-throughput sequencing data are widely collected and analyzed in the study of complex diseases in quest of improving human health. Well-studied algorithms mostly deal with single data source, and cannot fully utilize the potential of these multi-omics data sources. In order to provide a holistic understanding of human health and diseases, it is necessary to integrate multiple data sources. Several algorithms have been proposed so far, however, a comprehensive comparison of data integration algorithms for classification of binary traits is currently lacking. RESULTS: In this paper, we focus on two common classes of integration algorithms, graph-based that depict relationships with subjects denoted by nodes and relationships denoted by edges, and kernel-based that can generate a classifier in feature space. Our paper provides a comprehensive comparison of their performance in terms of various measurements of classification accuracy and computation time. Seven different integration algorithms, including graph-based semi-supervised learning, graph sharpening integration, composite association network, Bayesian network, semi-definite programming-support vector machine (SDP-SVM), relevance vector machine (RVM) and Ada-boost relevance vector machine are compared and evaluated with hypertension and two cancer data sets in our study. In general, kernel-based algorithms create more complex models and require longer computation time, but they tend to perform better than graph-based algorithms. The performance of graph-based algorithms has the advantage of being faster computationally. CONCLUSIONS: The empirical results demonstrate that composite association network, relevance vector machine, and Ada-boost RVM are the better performers. We provide recommendations on how to choose an appropriate algorithm for integrating data from multiple sources. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi: 10.1186/s12859-017-1982-4) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6389230
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-63892302019-03-19 A comparison of graph- and kernel-based –omics data integration algorithms for classifying complex traits Yan, Kang K. Zhao, Hongyu Pang, Herbert BMC Bioinformatics Research Article BACKGROUND: High-throughput sequencing data are widely collected and analyzed in the study of complex diseases in quest of improving human health. Well-studied algorithms mostly deal with single data source, and cannot fully utilize the potential of these multi-omics data sources. In order to provide a holistic understanding of human health and diseases, it is necessary to integrate multiple data sources. Several algorithms have been proposed so far, however, a comprehensive comparison of data integration algorithms for classification of binary traits is currently lacking. RESULTS: In this paper, we focus on two common classes of integration algorithms, graph-based that depict relationships with subjects denoted by nodes and relationships denoted by edges, and kernel-based that can generate a classifier in feature space. Our paper provides a comprehensive comparison of their performance in terms of various measurements of classification accuracy and computation time. Seven different integration algorithms, including graph-based semi-supervised learning, graph sharpening integration, composite association network, Bayesian network, semi-definite programming-support vector machine (SDP-SVM), relevance vector machine (RVM) and Ada-boost relevance vector machine are compared and evaluated with hypertension and two cancer data sets in our study. In general, kernel-based algorithms create more complex models and require longer computation time, but they tend to perform better than graph-based algorithms. The performance of graph-based algorithms has the advantage of being faster computationally. CONCLUSIONS: The empirical results demonstrate that composite association network, relevance vector machine, and Ada-boost RVM are the better performers. We provide recommendations on how to choose an appropriate algorithm for integrating data from multiple sources. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi: 10.1186/s12859-017-1982-4) contains supplementary material, which is available to authorized users. BioMed Central 2017-12-06 /pmc/articles/PMC6389230/ /pubmed/29212468 http://dx.doi.org/10.1186/s12859-017-1982-4 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Yan, Kang K. Zhao, Hongyu Pang, Herbert A comparison of graph- and kernel-based –omics data integration algorithms for classifying complex traits
title	A comparison of graph- and kernel-based –omics data integration algorithms for classifying complex traits
title_full	A comparison of graph- and kernel-based –omics data integration algorithms for classifying complex traits
title_fullStr	A comparison of graph- and kernel-based –omics data integration algorithms for classifying complex traits
title_full_unstemmed	A comparison of graph- and kernel-based –omics data integration algorithms for classifying complex traits
title_short	A comparison of graph- and kernel-based –omics data integration algorithms for classifying complex traits
title_sort	comparison of graph- and kernel-based –omics data integration algorithms for classifying complex traits
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6389230/ https://www.ncbi.nlm.nih.gov/pubmed/29212468 http://dx.doi.org/10.1186/s12859-017-1982-4
work_keys_str_mv	AT yankangk acomparisonofgraphandkernelbasedomicsdataintegrationalgorithmsforclassifyingcomplextraits AT zhaohongyu acomparisonofgraphandkernelbasedomicsdataintegrationalgorithmsforclassifyingcomplextraits AT pangherbert acomparisonofgraphandkernelbasedomicsdataintegrationalgorithmsforclassifyingcomplextraits AT yankangk comparisonofgraphandkernelbasedomicsdataintegrationalgorithmsforclassifyingcomplextraits AT zhaohongyu comparisonofgraphandkernelbasedomicsdataintegrationalgorithmsforclassifyingcomplextraits AT pangherbert comparisonofgraphandkernelbasedomicsdataintegrationalgorithmsforclassifyingcomplextraits

A comparison of graph- and kernel-based –omics data integration algorithms for classifying complex traits

Ejemplares similares