Cargando…

Cracking the genetic code with neural networks

The genetic code is textbook scientific knowledge that was soundly established without resorting to Artificial Intelligence (AI). The goal of our study was to check whether a neural network could re-discover, on its own, the mapping links between codons and amino acids and build the complete deciphe...

Descripción completa

Detalles Bibliográficos
Autores principales:	Joiret, Marc, Leclercq, Marine, Lambrechts, Gaspard, Rapino, Francesca, Close, Pierre, Louppe, Gilles, Geris, Liesbet
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2023
Materias:	Artificial Intelligence
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10117997/ https://www.ncbi.nlm.nih.gov/pubmed/37091301 http://dx.doi.org/10.3389/frai.2023.1128153

_version_	1785028713062072320
author	Joiret, Marc Leclercq, Marine Lambrechts, Gaspard Rapino, Francesca Close, Pierre Louppe, Gilles Geris, Liesbet
author_facet	Joiret, Marc Leclercq, Marine Lambrechts, Gaspard Rapino, Francesca Close, Pierre Louppe, Gilles Geris, Liesbet
author_sort	Joiret, Marc
collection	PubMed
description	The genetic code is textbook scientific knowledge that was soundly established without resorting to Artificial Intelligence (AI). The goal of our study was to check whether a neural network could re-discover, on its own, the mapping links between codons and amino acids and build the complete deciphering dictionary upon presentation of transcripts proteins data training pairs. We compared different Deep Learning neural network architectures and estimated quantitatively the size of the required human transcriptomic training set to achieve the best possible accuracy in the codon-to-amino-acid mapping. We also investigated the effect of a codon embedding layer assessing the semantic similarity between codons on the rate of increase of the training accuracy. We further investigated the benefit of quantifying and using the unbalanced representations of amino acids within real human proteins for a faster deciphering of rare amino acids codons. Deep neural networks require huge amount of data to train them. Deciphering the genetic code by a neural network is no exception. A test accuracy of 100% and the unequivocal deciphering of rare codons such as the tryptophan codon or the stop codons require a training dataset of the order of 4–22 millions cumulated pairs of codons with their associated amino acids presented to the neural network over around 7–40 training epochs, depending on the architecture and settings. We confirm that the wide generic capacities and modularity of deep neural networks allow them to be customized easily to learn the deciphering task of the genetic code efficiently.
format	Online Article Text
id	pubmed-10117997
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-101179972023-04-21 Cracking the genetic code with neural networks Joiret, Marc Leclercq, Marine Lambrechts, Gaspard Rapino, Francesca Close, Pierre Louppe, Gilles Geris, Liesbet Front Artif Intell Artificial Intelligence The genetic code is textbook scientific knowledge that was soundly established without resorting to Artificial Intelligence (AI). The goal of our study was to check whether a neural network could re-discover, on its own, the mapping links between codons and amino acids and build the complete deciphering dictionary upon presentation of transcripts proteins data training pairs. We compared different Deep Learning neural network architectures and estimated quantitatively the size of the required human transcriptomic training set to achieve the best possible accuracy in the codon-to-amino-acid mapping. We also investigated the effect of a codon embedding layer assessing the semantic similarity between codons on the rate of increase of the training accuracy. We further investigated the benefit of quantifying and using the unbalanced representations of amino acids within real human proteins for a faster deciphering of rare amino acids codons. Deep neural networks require huge amount of data to train them. Deciphering the genetic code by a neural network is no exception. A test accuracy of 100% and the unequivocal deciphering of rare codons such as the tryptophan codon or the stop codons require a training dataset of the order of 4–22 millions cumulated pairs of codons with their associated amino acids presented to the neural network over around 7–40 training epochs, depending on the architecture and settings. We confirm that the wide generic capacities and modularity of deep neural networks allow them to be customized easily to learn the deciphering task of the genetic code efficiently. Frontiers Media S.A. 2023-04-06 /pmc/articles/PMC10117997/ /pubmed/37091301 http://dx.doi.org/10.3389/frai.2023.1128153 Text en Copyright © 2023 Joiret, Leclercq, Lambrechts, Rapino, Close, Louppe and Geris. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Artificial Intelligence Joiret, Marc Leclercq, Marine Lambrechts, Gaspard Rapino, Francesca Close, Pierre Louppe, Gilles Geris, Liesbet Cracking the genetic code with neural networks
title	Cracking the genetic code with neural networks
title_full	Cracking the genetic code with neural networks
title_fullStr	Cracking the genetic code with neural networks
title_full_unstemmed	Cracking the genetic code with neural networks
title_short	Cracking the genetic code with neural networks
title_sort	cracking the genetic code with neural networks
topic	Artificial Intelligence
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10117997/ https://www.ncbi.nlm.nih.gov/pubmed/37091301 http://dx.doi.org/10.3389/frai.2023.1128153
work_keys_str_mv	AT joiretmarc crackingthegeneticcodewithneuralnetworks AT leclercqmarine crackingthegeneticcodewithneuralnetworks AT lambrechtsgaspard crackingthegeneticcodewithneuralnetworks AT rapinofrancesca crackingthegeneticcodewithneuralnetworks AT closepierre crackingthegeneticcodewithneuralnetworks AT louppegilles crackingthegeneticcodewithneuralnetworks AT gerisliesbet crackingthegeneticcodewithneuralnetworks

Cracking the genetic code with neural networks

Ejemplares similares