Cargando…
Integration of multi-omics data for prediction of phenotypic traits using random forest
BACKGROUND: In order to find genetic and metabolic pathways related to phenotypic traits of interest, we analyzed gene expression data, metabolite data obtained with GC-MS and LC-MS, proteomics data and a selected set of tuber quality phenotypic data from a diploid segregating mapping population of...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4905610/ https://www.ncbi.nlm.nih.gov/pubmed/27295212 http://dx.doi.org/10.1186/s12859-016-1043-4 |
_version_ | 1782437279769296896 |
---|---|
author | Acharjee, Animesh Kloosterman, Bjorn Visser, Richard G. F. Maliepaard, Chris |
author_facet | Acharjee, Animesh Kloosterman, Bjorn Visser, Richard G. F. Maliepaard, Chris |
author_sort | Acharjee, Animesh |
collection | PubMed |
description | BACKGROUND: In order to find genetic and metabolic pathways related to phenotypic traits of interest, we analyzed gene expression data, metabolite data obtained with GC-MS and LC-MS, proteomics data and a selected set of tuber quality phenotypic data from a diploid segregating mapping population of potato. In this study we present an approach to integrate these ~ omics data sets for the purpose of predicting phenotypic traits. This gives us networks of relatively small sets of interrelated ~ omics variables that can predict, with higher accuracy, a quality trait of interest. RESULTS: We used Random Forest regression for integrating multiple ~ omics data for prediction of four quality traits of potato: tuber flesh colour, DSC onset, tuber shape and enzymatic discoloration. For tuber flesh colour beta-carotene hydroxylase and zeaxanthin epoxidase were ranked first and forty-fourth respectively both of which have previously been associated with flesh colour in potato tubers. Combining all the significant genes, LC-peaks, GC-peaks and proteins, the variation explained was 75 %, only slightly more than what gene expression or LC-MS data explain by themselves which indicates that there are correlations among the variables across data sets. For tuber shape regressed on the gene expression, LC-MS, GC-MS and proteomics data sets separately, only gene expression data was found to explain significant variation. For DSC onset, we found 12 significant gene expression, 5 metabolite levels (GC) and 2 proteins that are associated with the trait. Using those 19 significant variables, the variation explained was 45 %. Expression QTL (eQTL) analyses showed many associations with genomic regions in chromosome 2 with also the highest explained variation compared to other chromosomes. Transcriptomics and metabolomics analysis on enzymatic discoloration after 5 min resulted in 420 significant genes and 8 significant LC metabolites, among which two were putatively identified as caffeoylquinic acid methyl ester and tyrosine. CONCLUSIONS: In this study, we made a strategy for selecting and integrating multiple ~ omics data using random forest method and selected representative individual peaks for networks based on eQTL, mQTL or pQTL information. Network analysis was done to interpret how a particular trait is associated with gene expression, metabolite and protein data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1043-4) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4905610 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-49056102016-06-14 Integration of multi-omics data for prediction of phenotypic traits using random forest Acharjee, Animesh Kloosterman, Bjorn Visser, Richard G. F. Maliepaard, Chris BMC Bioinformatics Research BACKGROUND: In order to find genetic and metabolic pathways related to phenotypic traits of interest, we analyzed gene expression data, metabolite data obtained with GC-MS and LC-MS, proteomics data and a selected set of tuber quality phenotypic data from a diploid segregating mapping population of potato. In this study we present an approach to integrate these ~ omics data sets for the purpose of predicting phenotypic traits. This gives us networks of relatively small sets of interrelated ~ omics variables that can predict, with higher accuracy, a quality trait of interest. RESULTS: We used Random Forest regression for integrating multiple ~ omics data for prediction of four quality traits of potato: tuber flesh colour, DSC onset, tuber shape and enzymatic discoloration. For tuber flesh colour beta-carotene hydroxylase and zeaxanthin epoxidase were ranked first and forty-fourth respectively both of which have previously been associated with flesh colour in potato tubers. Combining all the significant genes, LC-peaks, GC-peaks and proteins, the variation explained was 75 %, only slightly more than what gene expression or LC-MS data explain by themselves which indicates that there are correlations among the variables across data sets. For tuber shape regressed on the gene expression, LC-MS, GC-MS and proteomics data sets separately, only gene expression data was found to explain significant variation. For DSC onset, we found 12 significant gene expression, 5 metabolite levels (GC) and 2 proteins that are associated with the trait. Using those 19 significant variables, the variation explained was 45 %. Expression QTL (eQTL) analyses showed many associations with genomic regions in chromosome 2 with also the highest explained variation compared to other chromosomes. Transcriptomics and metabolomics analysis on enzymatic discoloration after 5 min resulted in 420 significant genes and 8 significant LC metabolites, among which two were putatively identified as caffeoylquinic acid methyl ester and tyrosine. CONCLUSIONS: In this study, we made a strategy for selecting and integrating multiple ~ omics data using random forest method and selected representative individual peaks for networks based on eQTL, mQTL or pQTL information. Network analysis was done to interpret how a particular trait is associated with gene expression, metabolite and protein data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1043-4) contains supplementary material, which is available to authorized users. BioMed Central 2016-06-06 /pmc/articles/PMC4905610/ /pubmed/27295212 http://dx.doi.org/10.1186/s12859-016-1043-4 Text en © Acharjee et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Acharjee, Animesh Kloosterman, Bjorn Visser, Richard G. F. Maliepaard, Chris Integration of multi-omics data for prediction of phenotypic traits using random forest |
title | Integration of multi-omics data for prediction of phenotypic traits using random forest |
title_full | Integration of multi-omics data for prediction of phenotypic traits using random forest |
title_fullStr | Integration of multi-omics data for prediction of phenotypic traits using random forest |
title_full_unstemmed | Integration of multi-omics data for prediction of phenotypic traits using random forest |
title_short | Integration of multi-omics data for prediction of phenotypic traits using random forest |
title_sort | integration of multi-omics data for prediction of phenotypic traits using random forest |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4905610/ https://www.ncbi.nlm.nih.gov/pubmed/27295212 http://dx.doi.org/10.1186/s12859-016-1043-4 |
work_keys_str_mv | AT acharjeeanimesh integrationofmultiomicsdataforpredictionofphenotypictraitsusingrandomforest AT kloostermanbjorn integrationofmultiomicsdataforpredictionofphenotypictraitsusingrandomforest AT visserrichardgf integrationofmultiomicsdataforpredictionofphenotypictraitsusingrandomforest AT maliepaardchris integrationofmultiomicsdataforpredictionofphenotypictraitsusingrandomforest |