Cargando…

Genome-wide prediction and prioritization of human aging genes by data fusion: a machine learning approach

BACKGROUND: Machine learning can effectively nominate novel genes for various research purposes in the laboratory. On a genome-wide scale, we implemented multiple databases and algorithms to predict and prioritize the human aging genes (PPHAGE). RESULTS: We fused data from 11 databases, and used Naï...

Descripción completa

Detalles Bibliográficos
Autores principales: Arabfard, Masoud, Ohadi, Mina, Rezaei Tabar, Vahid, Delbari, Ahmad, Kavousi, Kaveh
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6842548/
https://www.ncbi.nlm.nih.gov/pubmed/31706268
http://dx.doi.org/10.1186/s12864-019-6140-0
_version_ 1783468064470204416
author Arabfard, Masoud
Ohadi, Mina
Rezaei Tabar, Vahid
Delbari, Ahmad
Kavousi, Kaveh
author_facet Arabfard, Masoud
Ohadi, Mina
Rezaei Tabar, Vahid
Delbari, Ahmad
Kavousi, Kaveh
author_sort Arabfard, Masoud
collection PubMed
description BACKGROUND: Machine learning can effectively nominate novel genes for various research purposes in the laboratory. On a genome-wide scale, we implemented multiple databases and algorithms to predict and prioritize the human aging genes (PPHAGE). RESULTS: We fused data from 11 databases, and used Naïve Bayes classifier and positive unlabeled learning (PUL) methods, NB, Spy, and Rocchio-SVM, to rank human genes in respect with their implication in aging. The PUL methods enabled us to identify a list of negative (non-aging) genes to use alongside the seed (known age-related) genes in the ranking process. Comparison of the PUL algorithms revealed that none of the methods for identifying a negative sample were advantageous over other methods, and their simultaneous use in a form of fusion was critical for obtaining optimal results (PPHAGE is publicly available at https://cbb.ut.ac.ir/pphage). CONCLUSION: We predict and prioritize over 3,000 candidate age-related genes in human, based on significant ranking scores. The identified candidate genes are associated with pathways, ontologies, and diseases that are linked to aging, such as cancer and diabetes. Our data offer a platform for future experimental research on the genetic and biological aspects of aging. Additionally, we demonstrate that fusion of PUL methods and data sources can be successfully used for aging and disease candidate gene prioritization.
format Online
Article
Text
id pubmed-6842548
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68425482019-11-14 Genome-wide prediction and prioritization of human aging genes by data fusion: a machine learning approach Arabfard, Masoud Ohadi, Mina Rezaei Tabar, Vahid Delbari, Ahmad Kavousi, Kaveh BMC Genomics Research Article BACKGROUND: Machine learning can effectively nominate novel genes for various research purposes in the laboratory. On a genome-wide scale, we implemented multiple databases and algorithms to predict and prioritize the human aging genes (PPHAGE). RESULTS: We fused data from 11 databases, and used Naïve Bayes classifier and positive unlabeled learning (PUL) methods, NB, Spy, and Rocchio-SVM, to rank human genes in respect with their implication in aging. The PUL methods enabled us to identify a list of negative (non-aging) genes to use alongside the seed (known age-related) genes in the ranking process. Comparison of the PUL algorithms revealed that none of the methods for identifying a negative sample were advantageous over other methods, and their simultaneous use in a form of fusion was critical for obtaining optimal results (PPHAGE is publicly available at https://cbb.ut.ac.ir/pphage). CONCLUSION: We predict and prioritize over 3,000 candidate age-related genes in human, based on significant ranking scores. The identified candidate genes are associated with pathways, ontologies, and diseases that are linked to aging, such as cancer and diabetes. Our data offer a platform for future experimental research on the genetic and biological aspects of aging. Additionally, we demonstrate that fusion of PUL methods and data sources can be successfully used for aging and disease candidate gene prioritization. BioMed Central 2019-11-09 /pmc/articles/PMC6842548/ /pubmed/31706268 http://dx.doi.org/10.1186/s12864-019-6140-0 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Arabfard, Masoud
Ohadi, Mina
Rezaei Tabar, Vahid
Delbari, Ahmad
Kavousi, Kaveh
Genome-wide prediction and prioritization of human aging genes by data fusion: a machine learning approach
title Genome-wide prediction and prioritization of human aging genes by data fusion: a machine learning approach
title_full Genome-wide prediction and prioritization of human aging genes by data fusion: a machine learning approach
title_fullStr Genome-wide prediction and prioritization of human aging genes by data fusion: a machine learning approach
title_full_unstemmed Genome-wide prediction and prioritization of human aging genes by data fusion: a machine learning approach
title_short Genome-wide prediction and prioritization of human aging genes by data fusion: a machine learning approach
title_sort genome-wide prediction and prioritization of human aging genes by data fusion: a machine learning approach
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6842548/
https://www.ncbi.nlm.nih.gov/pubmed/31706268
http://dx.doi.org/10.1186/s12864-019-6140-0
work_keys_str_mv AT arabfardmasoud genomewidepredictionandprioritizationofhumanaginggenesbydatafusionamachinelearningapproach
AT ohadimina genomewidepredictionandprioritizationofhumanaginggenesbydatafusionamachinelearningapproach
AT rezaeitabarvahid genomewidepredictionandprioritizationofhumanaginggenesbydatafusionamachinelearningapproach
AT delbariahmad genomewidepredictionandprioritizationofhumanaginggenesbydatafusionamachinelearningapproach
AT kavousikaveh genomewidepredictionandprioritizationofhumanaginggenesbydatafusionamachinelearningapproach