Cargando…
ICOR: improving codon optimization with recurrent neural networks
BACKGROUND: In protein sequences—as there are 61 sense codons but only 20 standard amino acids—most amino acids are encoded by more than one codon. Although such synonymous codons do not alter the encoded amino acid sequence, their selection can dramatically affect the expression of the resulting pr...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10074884/ https://www.ncbi.nlm.nih.gov/pubmed/37016283 http://dx.doi.org/10.1186/s12859-023-05246-8 |
_version_ | 1785019823394127872 |
---|---|
author | Jain, Rishab Jain, Aditya Mauro, Elizabeth LeShane, Kevin Densmore, Douglas |
author_facet | Jain, Rishab Jain, Aditya Mauro, Elizabeth LeShane, Kevin Densmore, Douglas |
author_sort | Jain, Rishab |
collection | PubMed |
description | BACKGROUND: In protein sequences—as there are 61 sense codons but only 20 standard amino acids—most amino acids are encoded by more than one codon. Although such synonymous codons do not alter the encoded amino acid sequence, their selection can dramatically affect the expression of the resulting protein. Codon optimization of synthetic DNA sequences is important for heterologous expression. However, existing solutions are primarily based on choosing high-frequency codons only, neglecting the important effects of rare codons. In this paper, we propose a novel recurrent-neural-network based codon optimization tool, ICOR, that aims to learn codon usage bias on a genomic dataset of Escherichia coli. We compile a dataset of over 7,000 non-redundant, high-expression, robust genes which are used for deep learning. The model uses a bidirectional long short-term memory-based architecture, allowing for the sequential context of codon usage in genes to be learned. Our tool can predict synonymous codons for synthetic genes toward optimal expression in Escherichia coli. RESULTS: We demonstrate that sequential context achieved via RNN may yield codon selection that is more similar to the host genome. Based on computational metrics that predict protein expression, ICOR theoretically optimizes protein expression more than frequency-based approaches. ICOR is evaluated on 1,481 Escherichia coli genes as well as a benchmark set of 40 select DNA sequences whose heterologous expression has been previously characterized. ICOR’s performance is measured across five metrics: the Codon Adaptation Index, GC-content, negative repeat elements, negative cis-regulatory elements, and codon frequency distribution. CONCLUSIONS: The results, based on in silico metrics, indicate that ICOR codon optimization is theoretically more effective in enhancing recombinant expression of proteins over other established codon optimization techniques. Our tool is provided as an open-source software package that includes the benchmark set of sequences used in this study. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05246-8. |
format | Online Article Text |
id | pubmed-10074884 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-100748842023-04-06 ICOR: improving codon optimization with recurrent neural networks Jain, Rishab Jain, Aditya Mauro, Elizabeth LeShane, Kevin Densmore, Douglas BMC Bioinformatics Software BACKGROUND: In protein sequences—as there are 61 sense codons but only 20 standard amino acids—most amino acids are encoded by more than one codon. Although such synonymous codons do not alter the encoded amino acid sequence, their selection can dramatically affect the expression of the resulting protein. Codon optimization of synthetic DNA sequences is important for heterologous expression. However, existing solutions are primarily based on choosing high-frequency codons only, neglecting the important effects of rare codons. In this paper, we propose a novel recurrent-neural-network based codon optimization tool, ICOR, that aims to learn codon usage bias on a genomic dataset of Escherichia coli. We compile a dataset of over 7,000 non-redundant, high-expression, robust genes which are used for deep learning. The model uses a bidirectional long short-term memory-based architecture, allowing for the sequential context of codon usage in genes to be learned. Our tool can predict synonymous codons for synthetic genes toward optimal expression in Escherichia coli. RESULTS: We demonstrate that sequential context achieved via RNN may yield codon selection that is more similar to the host genome. Based on computational metrics that predict protein expression, ICOR theoretically optimizes protein expression more than frequency-based approaches. ICOR is evaluated on 1,481 Escherichia coli genes as well as a benchmark set of 40 select DNA sequences whose heterologous expression has been previously characterized. ICOR’s performance is measured across five metrics: the Codon Adaptation Index, GC-content, negative repeat elements, negative cis-regulatory elements, and codon frequency distribution. CONCLUSIONS: The results, based on in silico metrics, indicate that ICOR codon optimization is theoretically more effective in enhancing recombinant expression of proteins over other established codon optimization techniques. Our tool is provided as an open-source software package that includes the benchmark set of sequences used in this study. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05246-8. BioMed Central 2023-04-04 /pmc/articles/PMC10074884/ /pubmed/37016283 http://dx.doi.org/10.1186/s12859-023-05246-8 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Software Jain, Rishab Jain, Aditya Mauro, Elizabeth LeShane, Kevin Densmore, Douglas ICOR: improving codon optimization with recurrent neural networks |
title | ICOR: improving codon optimization with recurrent neural networks |
title_full | ICOR: improving codon optimization with recurrent neural networks |
title_fullStr | ICOR: improving codon optimization with recurrent neural networks |
title_full_unstemmed | ICOR: improving codon optimization with recurrent neural networks |
title_short | ICOR: improving codon optimization with recurrent neural networks |
title_sort | icor: improving codon optimization with recurrent neural networks |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10074884/ https://www.ncbi.nlm.nih.gov/pubmed/37016283 http://dx.doi.org/10.1186/s12859-023-05246-8 |
work_keys_str_mv | AT jainrishab icorimprovingcodonoptimizationwithrecurrentneuralnetworks AT jainaditya icorimprovingcodonoptimizationwithrecurrentneuralnetworks AT mauroelizabeth icorimprovingcodonoptimizationwithrecurrentneuralnetworks AT leshanekevin icorimprovingcodonoptimizationwithrecurrentneuralnetworks AT densmoredouglas icorimprovingcodonoptimizationwithrecurrentneuralnetworks |