Cargando…

Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning

The function of most genes is unknown. The best results in automated function prediction are obtained with machine learning-based methods that combine multiple data sources, typically sequence derived features, protein structure and interaction data. Even though there is ample evidence showing that...

Descripción completa

Detalles Bibliográficos
Autores principales: Pazos Obregón, Flavio, Silvera, Diego, Soto, Pablo, Yankilevich, Patricio, Guerberoff, Gustavo, Cantera, Rafael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9270439/
https://www.ncbi.nlm.nih.gov/pubmed/35803984
http://dx.doi.org/10.1038/s41598-022-15329-w
_version_ 1784744469437874176
author Pazos Obregón, Flavio
Silvera, Diego
Soto, Pablo
Yankilevich, Patricio
Guerberoff, Gustavo
Cantera, Rafael
author_facet Pazos Obregón, Flavio
Silvera, Diego
Soto, Pablo
Yankilevich, Patricio
Guerberoff, Gustavo
Cantera, Rafael
author_sort Pazos Obregón, Flavio
collection PubMed
description The function of most genes is unknown. The best results in automated function prediction are obtained with machine learning-based methods that combine multiple data sources, typically sequence derived features, protein structure and interaction data. Even though there is ample evidence showing that a gene’s function is not independent of its location, the few available examples of gene function prediction based on gene location rely on sequence identity between genes of different organisms and are thus subjected to the limitations of the relationship between sequence and function. Here we predict thousands of gene functions in five model eukaryotes (Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Mus musculus and Homo sapiens) using machine learning models exclusively trained with features derived from the location of genes in the genomes to which they belong. Our aim was not to obtain the best performing method to automated function prediction but to explore the extent to which a gene's location can predict its function in eukaryotes. We found that our models outperform BLAST when predicting terms from Biological Process and Cellular Component Ontologies, showing that, at least in some cases, gene location alone can be more useful than sequence to infer gene function.
format Online
Article
Text
id pubmed-9270439
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-92704392022-07-10 Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning Pazos Obregón, Flavio Silvera, Diego Soto, Pablo Yankilevich, Patricio Guerberoff, Gustavo Cantera, Rafael Sci Rep Article The function of most genes is unknown. The best results in automated function prediction are obtained with machine learning-based methods that combine multiple data sources, typically sequence derived features, protein structure and interaction data. Even though there is ample evidence showing that a gene’s function is not independent of its location, the few available examples of gene function prediction based on gene location rely on sequence identity between genes of different organisms and are thus subjected to the limitations of the relationship between sequence and function. Here we predict thousands of gene functions in five model eukaryotes (Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Mus musculus and Homo sapiens) using machine learning models exclusively trained with features derived from the location of genes in the genomes to which they belong. Our aim was not to obtain the best performing method to automated function prediction but to explore the extent to which a gene's location can predict its function in eukaryotes. We found that our models outperform BLAST when predicting terms from Biological Process and Cellular Component Ontologies, showing that, at least in some cases, gene location alone can be more useful than sequence to infer gene function. Nature Publishing Group UK 2022-07-08 /pmc/articles/PMC9270439/ /pubmed/35803984 http://dx.doi.org/10.1038/s41598-022-15329-w Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Pazos Obregón, Flavio
Silvera, Diego
Soto, Pablo
Yankilevich, Patricio
Guerberoff, Gustavo
Cantera, Rafael
Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning
title Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning
title_full Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning
title_fullStr Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning
title_full_unstemmed Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning
title_short Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning
title_sort gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9270439/
https://www.ncbi.nlm.nih.gov/pubmed/35803984
http://dx.doi.org/10.1038/s41598-022-15329-w
work_keys_str_mv AT pazosobregonflavio genefunctionpredictioninfivemodeleukaryotesexclusivelybasedongenerelativelocationthroughmachinelearning
AT silveradiego genefunctionpredictioninfivemodeleukaryotesexclusivelybasedongenerelativelocationthroughmachinelearning
AT sotopablo genefunctionpredictioninfivemodeleukaryotesexclusivelybasedongenerelativelocationthroughmachinelearning
AT yankilevichpatricio genefunctionpredictioninfivemodeleukaryotesexclusivelybasedongenerelativelocationthroughmachinelearning
AT guerberoffgustavo genefunctionpredictioninfivemodeleukaryotesexclusivelybasedongenerelativelocationthroughmachinelearning
AT canterarafael genefunctionpredictioninfivemodeleukaryotesexclusivelybasedongenerelativelocationthroughmachinelearning