Cargando…

Identification of Predictor Genes for Feed Efficiency in Beef Cattle by Applying Machine Learning Methods to Multi-Tissue Transcriptome Data

Machine learning (ML) methods have shown promising results in identifying genes when applied to large transcriptome datasets. However, no attempt has been made to compare the performance of combining different ML methods together in the prediction of high feed efficiency (HFE) and low feed efficienc...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Weihao, Alexandre, Pâmela A., Ribeiro, Gabriela, Fukumasu, Heidge, Sun, Wei, Reverter, Antonio, Li, Yutao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7921797/
https://www.ncbi.nlm.nih.gov/pubmed/33664767
http://dx.doi.org/10.3389/fgene.2021.619857
_version_ 1783658543222620160
author Chen, Weihao
Alexandre, Pâmela A.
Ribeiro, Gabriela
Fukumasu, Heidge
Sun, Wei
Reverter, Antonio
Li, Yutao
author_facet Chen, Weihao
Alexandre, Pâmela A.
Ribeiro, Gabriela
Fukumasu, Heidge
Sun, Wei
Reverter, Antonio
Li, Yutao
author_sort Chen, Weihao
collection PubMed
description Machine learning (ML) methods have shown promising results in identifying genes when applied to large transcriptome datasets. However, no attempt has been made to compare the performance of combining different ML methods together in the prediction of high feed efficiency (HFE) and low feed efficiency (LFE) animals. In this study, using RNA sequencing data of five tissues (adrenal gland, hypothalamus, liver, skeletal muscle, and pituitary) from nine HFE and nine LFE Nellore bulls, we evaluated the prediction accuracies of five analytical methods in classifying FE animals. These included two conventional methods for differential gene expression (DGE) analysis (t-test and edgeR) as benchmarks, and three ML methods: Random Forests (RFs), Extreme Gradient Boosting (XGBoost), and combination of both RF and XGBoost (RX). Utility of a subset of candidate genes selected from each method for classification of FE animals was assessed by support vector machine (SVM). Among all methods, the smallest subsets of genes (117) identified by RX outperformed those chosen by t-test, edgeR, RF, or XGBoost in classification accuracy of animals. Gene co-expression network analysis confirmed the interactivity existing among these genes and their relevance within the network related to their prediction ranking based on ML. The results demonstrate a great potential for applying a combination of ML methods to large transcriptome datasets to identify biologically important genes for accurately classifying FE animals.
format Online
Article
Text
id pubmed-7921797
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-79217972021-03-03 Identification of Predictor Genes for Feed Efficiency in Beef Cattle by Applying Machine Learning Methods to Multi-Tissue Transcriptome Data Chen, Weihao Alexandre, Pâmela A. Ribeiro, Gabriela Fukumasu, Heidge Sun, Wei Reverter, Antonio Li, Yutao Front Genet Genetics Machine learning (ML) methods have shown promising results in identifying genes when applied to large transcriptome datasets. However, no attempt has been made to compare the performance of combining different ML methods together in the prediction of high feed efficiency (HFE) and low feed efficiency (LFE) animals. In this study, using RNA sequencing data of five tissues (adrenal gland, hypothalamus, liver, skeletal muscle, and pituitary) from nine HFE and nine LFE Nellore bulls, we evaluated the prediction accuracies of five analytical methods in classifying FE animals. These included two conventional methods for differential gene expression (DGE) analysis (t-test and edgeR) as benchmarks, and three ML methods: Random Forests (RFs), Extreme Gradient Boosting (XGBoost), and combination of both RF and XGBoost (RX). Utility of a subset of candidate genes selected from each method for classification of FE animals was assessed by support vector machine (SVM). Among all methods, the smallest subsets of genes (117) identified by RX outperformed those chosen by t-test, edgeR, RF, or XGBoost in classification accuracy of animals. Gene co-expression network analysis confirmed the interactivity existing among these genes and their relevance within the network related to their prediction ranking based on ML. The results demonstrate a great potential for applying a combination of ML methods to large transcriptome datasets to identify biologically important genes for accurately classifying FE animals. Frontiers Media S.A. 2021-02-16 /pmc/articles/PMC7921797/ /pubmed/33664767 http://dx.doi.org/10.3389/fgene.2021.619857 Text en Copyright © 2021 Chen, Alexandre, Ribeiro, Fukumasu, Sun, Reverter and Li. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Chen, Weihao
Alexandre, Pâmela A.
Ribeiro, Gabriela
Fukumasu, Heidge
Sun, Wei
Reverter, Antonio
Li, Yutao
Identification of Predictor Genes for Feed Efficiency in Beef Cattle by Applying Machine Learning Methods to Multi-Tissue Transcriptome Data
title Identification of Predictor Genes for Feed Efficiency in Beef Cattle by Applying Machine Learning Methods to Multi-Tissue Transcriptome Data
title_full Identification of Predictor Genes for Feed Efficiency in Beef Cattle by Applying Machine Learning Methods to Multi-Tissue Transcriptome Data
title_fullStr Identification of Predictor Genes for Feed Efficiency in Beef Cattle by Applying Machine Learning Methods to Multi-Tissue Transcriptome Data
title_full_unstemmed Identification of Predictor Genes for Feed Efficiency in Beef Cattle by Applying Machine Learning Methods to Multi-Tissue Transcriptome Data
title_short Identification of Predictor Genes for Feed Efficiency in Beef Cattle by Applying Machine Learning Methods to Multi-Tissue Transcriptome Data
title_sort identification of predictor genes for feed efficiency in beef cattle by applying machine learning methods to multi-tissue transcriptome data
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7921797/
https://www.ncbi.nlm.nih.gov/pubmed/33664767
http://dx.doi.org/10.3389/fgene.2021.619857
work_keys_str_mv AT chenweihao identificationofpredictorgenesforfeedefficiencyinbeefcattlebyapplyingmachinelearningmethodstomultitissuetranscriptomedata
AT alexandrepamelaa identificationofpredictorgenesforfeedefficiencyinbeefcattlebyapplyingmachinelearningmethodstomultitissuetranscriptomedata
AT ribeirogabriela identificationofpredictorgenesforfeedefficiencyinbeefcattlebyapplyingmachinelearningmethodstomultitissuetranscriptomedata
AT fukumasuheidge identificationofpredictorgenesforfeedefficiencyinbeefcattlebyapplyingmachinelearningmethodstomultitissuetranscriptomedata
AT sunwei identificationofpredictorgenesforfeedefficiencyinbeefcattlebyapplyingmachinelearningmethodstomultitissuetranscriptomedata
AT reverterantonio identificationofpredictorgenesforfeedefficiencyinbeefcattlebyapplyingmachinelearningmethodstomultitissuetranscriptomedata
AT liyutao identificationofpredictorgenesforfeedefficiencyinbeefcattlebyapplyingmachinelearningmethodstomultitissuetranscriptomedata