Cargando…

An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach

BACKGROUND: Assembly and function of neuronal synapses require the coordinated expression of a yet undetermined set of genes. Previously, we had trained an ensemble machine learning model to assign a probability of having synaptic function to every protein-coding gene in Drosophila melanogaster. Thi...

Descripción completa

Detalles Bibliográficos
Autores principales: Pazos Obregón, Flavio, Palazzo, Martín, Soto, Pablo, Guerberoff, Gustavo, Yankilevich, Patricio, Cantera, Rafael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929295/
https://www.ncbi.nlm.nih.gov/pubmed/31870293
http://dx.doi.org/10.1186/s12864-019-6380-z
_version_ 1783482671776661504
author Pazos Obregón, Flavio
Palazzo, Martín
Soto, Pablo
Guerberoff, Gustavo
Yankilevich, Patricio
Cantera, Rafael
author_facet Pazos Obregón, Flavio
Palazzo, Martín
Soto, Pablo
Guerberoff, Gustavo
Yankilevich, Patricio
Cantera, Rafael
author_sort Pazos Obregón, Flavio
collection PubMed
description BACKGROUND: Assembly and function of neuronal synapses require the coordinated expression of a yet undetermined set of genes. Previously, we had trained an ensemble machine learning model to assign a probability of having synaptic function to every protein-coding gene in Drosophila melanogaster. This approach resulted in the publication of a catalogue of 893 genes which we postulated to be very enriched in genes with a still undocumented synaptic function. Since then, the scientific community has experimentally identified 79 new synaptic genes. Here we use these new empirical data to evaluate our original prediction. We also implement a series of changes to the training scheme of our model and using the new data we demonstrate that this improves its predictive power. Finally, we added the new synaptic genes to the training set and trained a new model, obtaining a new, enhanced catalogue of putative synaptic genes. RESULTS: The retrospective analysis demonstrate that our original catalogue was significantly enriched in new synaptic genes. When the changes to the training scheme were implemented using the original training set we obtained even higher enrichment. Finally, applying the new training scheme with a training set including the 79 new synaptic genes, resulted in an enhanced catalogue of putative synaptic genes. Here we present this new catalogue and announce that a regularly updated version will be available online at: http://synapticgenes.bnd.edu.uy CONCLUSIONS: We show that training an ensemble of machine learning classifiers solely with the whole-body temporal transcription profiles of known synaptic genes resulted in a catalogue with a significant enrichment in undiscovered synaptic genes. Using new empirical data provided by the scientific community, we validated our original approach, improved our model an obtained an arguably more precise prediction. This approach reduces the number of genes to be tested through hypothesis-driven experimentation and will facilitate our understanding of neuronal function. AVAILABILITY: http://synapticgenes.bnd.edu.uy
format Online
Article
Text
id pubmed-6929295
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69292952019-12-30 An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach Pazos Obregón, Flavio Palazzo, Martín Soto, Pablo Guerberoff, Gustavo Yankilevich, Patricio Cantera, Rafael BMC Genomics Research Article BACKGROUND: Assembly and function of neuronal synapses require the coordinated expression of a yet undetermined set of genes. Previously, we had trained an ensemble machine learning model to assign a probability of having synaptic function to every protein-coding gene in Drosophila melanogaster. This approach resulted in the publication of a catalogue of 893 genes which we postulated to be very enriched in genes with a still undocumented synaptic function. Since then, the scientific community has experimentally identified 79 new synaptic genes. Here we use these new empirical data to evaluate our original prediction. We also implement a series of changes to the training scheme of our model and using the new data we demonstrate that this improves its predictive power. Finally, we added the new synaptic genes to the training set and trained a new model, obtaining a new, enhanced catalogue of putative synaptic genes. RESULTS: The retrospective analysis demonstrate that our original catalogue was significantly enriched in new synaptic genes. When the changes to the training scheme were implemented using the original training set we obtained even higher enrichment. Finally, applying the new training scheme with a training set including the 79 new synaptic genes, resulted in an enhanced catalogue of putative synaptic genes. Here we present this new catalogue and announce that a regularly updated version will be available online at: http://synapticgenes.bnd.edu.uy CONCLUSIONS: We show that training an ensemble of machine learning classifiers solely with the whole-body temporal transcription profiles of known synaptic genes resulted in a catalogue with a significant enrichment in undiscovered synaptic genes. Using new empirical data provided by the scientific community, we validated our original approach, improved our model an obtained an arguably more precise prediction. This approach reduces the number of genes to be tested through hypothesis-driven experimentation and will facilitate our understanding of neuronal function. AVAILABILITY: http://synapticgenes.bnd.edu.uy BioMed Central 2019-12-23 /pmc/articles/PMC6929295/ /pubmed/31870293 http://dx.doi.org/10.1186/s12864-019-6380-z Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Pazos Obregón, Flavio
Palazzo, Martín
Soto, Pablo
Guerberoff, Gustavo
Yankilevich, Patricio
Cantera, Rafael
An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach
title An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach
title_full An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach
title_fullStr An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach
title_full_unstemmed An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach
title_short An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach
title_sort improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929295/
https://www.ncbi.nlm.nih.gov/pubmed/31870293
http://dx.doi.org/10.1186/s12864-019-6380-z
work_keys_str_mv AT pazosobregonflavio animprovedcatalogueofputativesynapticgenesdefinedexclusivelybytemporaltranscriptionprofilesthroughanensemblemachinelearningapproach
AT palazzomartin animprovedcatalogueofputativesynapticgenesdefinedexclusivelybytemporaltranscriptionprofilesthroughanensemblemachinelearningapproach
AT sotopablo animprovedcatalogueofputativesynapticgenesdefinedexclusivelybytemporaltranscriptionprofilesthroughanensemblemachinelearningapproach
AT guerberoffgustavo animprovedcatalogueofputativesynapticgenesdefinedexclusivelybytemporaltranscriptionprofilesthroughanensemblemachinelearningapproach
AT yankilevichpatricio animprovedcatalogueofputativesynapticgenesdefinedexclusivelybytemporaltranscriptionprofilesthroughanensemblemachinelearningapproach
AT canterarafael animprovedcatalogueofputativesynapticgenesdefinedexclusivelybytemporaltranscriptionprofilesthroughanensemblemachinelearningapproach
AT pazosobregonflavio improvedcatalogueofputativesynapticgenesdefinedexclusivelybytemporaltranscriptionprofilesthroughanensemblemachinelearningapproach
AT palazzomartin improvedcatalogueofputativesynapticgenesdefinedexclusivelybytemporaltranscriptionprofilesthroughanensemblemachinelearningapproach
AT sotopablo improvedcatalogueofputativesynapticgenesdefinedexclusivelybytemporaltranscriptionprofilesthroughanensemblemachinelearningapproach
AT guerberoffgustavo improvedcatalogueofputativesynapticgenesdefinedexclusivelybytemporaltranscriptionprofilesthroughanensemblemachinelearningapproach
AT yankilevichpatricio improvedcatalogueofputativesynapticgenesdefinedexclusivelybytemporaltranscriptionprofilesthroughanensemblemachinelearningapproach
AT canterarafael improvedcatalogueofputativesynapticgenesdefinedexclusivelybytemporaltranscriptionprofilesthroughanensemblemachinelearningapproach