Cargando…
An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach
BACKGROUND: Assembly and function of neuronal synapses require the coordinated expression of a yet undetermined set of genes. Previously, we had trained an ensemble machine learning model to assign a probability of having synaptic function to every protein-coding gene in Drosophila melanogaster. Thi...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929295/ https://www.ncbi.nlm.nih.gov/pubmed/31870293 http://dx.doi.org/10.1186/s12864-019-6380-z |
_version_ | 1783482671776661504 |
---|---|
author | Pazos Obregón, Flavio Palazzo, Martín Soto, Pablo Guerberoff, Gustavo Yankilevich, Patricio Cantera, Rafael |
author_facet | Pazos Obregón, Flavio Palazzo, Martín Soto, Pablo Guerberoff, Gustavo Yankilevich, Patricio Cantera, Rafael |
author_sort | Pazos Obregón, Flavio |
collection | PubMed |
description | BACKGROUND: Assembly and function of neuronal synapses require the coordinated expression of a yet undetermined set of genes. Previously, we had trained an ensemble machine learning model to assign a probability of having synaptic function to every protein-coding gene in Drosophila melanogaster. This approach resulted in the publication of a catalogue of 893 genes which we postulated to be very enriched in genes with a still undocumented synaptic function. Since then, the scientific community has experimentally identified 79 new synaptic genes. Here we use these new empirical data to evaluate our original prediction. We also implement a series of changes to the training scheme of our model and using the new data we demonstrate that this improves its predictive power. Finally, we added the new synaptic genes to the training set and trained a new model, obtaining a new, enhanced catalogue of putative synaptic genes. RESULTS: The retrospective analysis demonstrate that our original catalogue was significantly enriched in new synaptic genes. When the changes to the training scheme were implemented using the original training set we obtained even higher enrichment. Finally, applying the new training scheme with a training set including the 79 new synaptic genes, resulted in an enhanced catalogue of putative synaptic genes. Here we present this new catalogue and announce that a regularly updated version will be available online at: http://synapticgenes.bnd.edu.uy CONCLUSIONS: We show that training an ensemble of machine learning classifiers solely with the whole-body temporal transcription profiles of known synaptic genes resulted in a catalogue with a significant enrichment in undiscovered synaptic genes. Using new empirical data provided by the scientific community, we validated our original approach, improved our model an obtained an arguably more precise prediction. This approach reduces the number of genes to be tested through hypothesis-driven experimentation and will facilitate our understanding of neuronal function. AVAILABILITY: http://synapticgenes.bnd.edu.uy |
format | Online Article Text |
id | pubmed-6929295 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-69292952019-12-30 An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach Pazos Obregón, Flavio Palazzo, Martín Soto, Pablo Guerberoff, Gustavo Yankilevich, Patricio Cantera, Rafael BMC Genomics Research Article BACKGROUND: Assembly and function of neuronal synapses require the coordinated expression of a yet undetermined set of genes. Previously, we had trained an ensemble machine learning model to assign a probability of having synaptic function to every protein-coding gene in Drosophila melanogaster. This approach resulted in the publication of a catalogue of 893 genes which we postulated to be very enriched in genes with a still undocumented synaptic function. Since then, the scientific community has experimentally identified 79 new synaptic genes. Here we use these new empirical data to evaluate our original prediction. We also implement a series of changes to the training scheme of our model and using the new data we demonstrate that this improves its predictive power. Finally, we added the new synaptic genes to the training set and trained a new model, obtaining a new, enhanced catalogue of putative synaptic genes. RESULTS: The retrospective analysis demonstrate that our original catalogue was significantly enriched in new synaptic genes. When the changes to the training scheme were implemented using the original training set we obtained even higher enrichment. Finally, applying the new training scheme with a training set including the 79 new synaptic genes, resulted in an enhanced catalogue of putative synaptic genes. Here we present this new catalogue and announce that a regularly updated version will be available online at: http://synapticgenes.bnd.edu.uy CONCLUSIONS: We show that training an ensemble of machine learning classifiers solely with the whole-body temporal transcription profiles of known synaptic genes resulted in a catalogue with a significant enrichment in undiscovered synaptic genes. Using new empirical data provided by the scientific community, we validated our original approach, improved our model an obtained an arguably more precise prediction. This approach reduces the number of genes to be tested through hypothesis-driven experimentation and will facilitate our understanding of neuronal function. AVAILABILITY: http://synapticgenes.bnd.edu.uy BioMed Central 2019-12-23 /pmc/articles/PMC6929295/ /pubmed/31870293 http://dx.doi.org/10.1186/s12864-019-6380-z Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Pazos Obregón, Flavio Palazzo, Martín Soto, Pablo Guerberoff, Gustavo Yankilevich, Patricio Cantera, Rafael An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach |
title | An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach |
title_full | An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach |
title_fullStr | An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach |
title_full_unstemmed | An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach |
title_short | An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach |
title_sort | improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929295/ https://www.ncbi.nlm.nih.gov/pubmed/31870293 http://dx.doi.org/10.1186/s12864-019-6380-z |
work_keys_str_mv | AT pazosobregonflavio animprovedcatalogueofputativesynapticgenesdefinedexclusivelybytemporaltranscriptionprofilesthroughanensemblemachinelearningapproach AT palazzomartin animprovedcatalogueofputativesynapticgenesdefinedexclusivelybytemporaltranscriptionprofilesthroughanensemblemachinelearningapproach AT sotopablo animprovedcatalogueofputativesynapticgenesdefinedexclusivelybytemporaltranscriptionprofilesthroughanensemblemachinelearningapproach AT guerberoffgustavo animprovedcatalogueofputativesynapticgenesdefinedexclusivelybytemporaltranscriptionprofilesthroughanensemblemachinelearningapproach AT yankilevichpatricio animprovedcatalogueofputativesynapticgenesdefinedexclusivelybytemporaltranscriptionprofilesthroughanensemblemachinelearningapproach AT canterarafael animprovedcatalogueofputativesynapticgenesdefinedexclusivelybytemporaltranscriptionprofilesthroughanensemblemachinelearningapproach AT pazosobregonflavio improvedcatalogueofputativesynapticgenesdefinedexclusivelybytemporaltranscriptionprofilesthroughanensemblemachinelearningapproach AT palazzomartin improvedcatalogueofputativesynapticgenesdefinedexclusivelybytemporaltranscriptionprofilesthroughanensemblemachinelearningapproach AT sotopablo improvedcatalogueofputativesynapticgenesdefinedexclusivelybytemporaltranscriptionprofilesthroughanensemblemachinelearningapproach AT guerberoffgustavo improvedcatalogueofputativesynapticgenesdefinedexclusivelybytemporaltranscriptionprofilesthroughanensemblemachinelearningapproach AT yankilevichpatricio improvedcatalogueofputativesynapticgenesdefinedexclusivelybytemporaltranscriptionprofilesthroughanensemblemachinelearningapproach AT canterarafael improvedcatalogueofputativesynapticgenesdefinedexclusivelybytemporaltranscriptionprofilesthroughanensemblemachinelearningapproach |