Cargando…
TIS Transformer: remapping the human proteome using deep learning
The correct mapping of the proteome is an important step towards advancing our understanding of biological systems and cellular mechanisms. Methods that provide better mappings can fuel important processes such as drug discovery and disease understanding. Currently, true determination of translation...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9985340/ https://www.ncbi.nlm.nih.gov/pubmed/36879896 http://dx.doi.org/10.1093/nargab/lqad021 |
_version_ | 1784900931455811584 |
---|---|
author | Clauwaert, Jim McVey, Zahra Gupta, Ramneek Menschaert, Gerben |
author_facet | Clauwaert, Jim McVey, Zahra Gupta, Ramneek Menschaert, Gerben |
author_sort | Clauwaert, Jim |
collection | PubMed |
description | The correct mapping of the proteome is an important step towards advancing our understanding of biological systems and cellular mechanisms. Methods that provide better mappings can fuel important processes such as drug discovery and disease understanding. Currently, true determination of translation initiation sites is primarily achieved by in vivo experiments. Here, we propose TIS Transformer, a deep learning model for the determination of translation start sites solely utilizing the information embedded in the transcript nucleotide sequence. The method is built upon deep learning techniques first designed for natural language processing. We prove this approach to be best suited for learning the semantics of translation, outperforming previous approaches by a large margin. We demonstrate that limitations in the model performance are primarily due to the presence of low-quality annotations against which the model is evaluated against. Advantages of the method are its ability to detect key features of the translation process and multiple coding sequences on a transcript. These include micropeptides encoded by short Open Reading Frames, either alongside a canonical coding sequence or within long non-coding RNAs. To demonstrate the use of our methods, we applied TIS Transformer to remap the full human proteome. |
format | Online Article Text |
id | pubmed-9985340 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-99853402023-03-05 TIS Transformer: remapping the human proteome using deep learning Clauwaert, Jim McVey, Zahra Gupta, Ramneek Menschaert, Gerben NAR Genom Bioinform Methods Article The correct mapping of the proteome is an important step towards advancing our understanding of biological systems and cellular mechanisms. Methods that provide better mappings can fuel important processes such as drug discovery and disease understanding. Currently, true determination of translation initiation sites is primarily achieved by in vivo experiments. Here, we propose TIS Transformer, a deep learning model for the determination of translation start sites solely utilizing the information embedded in the transcript nucleotide sequence. The method is built upon deep learning techniques first designed for natural language processing. We prove this approach to be best suited for learning the semantics of translation, outperforming previous approaches by a large margin. We demonstrate that limitations in the model performance are primarily due to the presence of low-quality annotations against which the model is evaluated against. Advantages of the method are its ability to detect key features of the translation process and multiple coding sequences on a transcript. These include micropeptides encoded by short Open Reading Frames, either alongside a canonical coding sequence or within long non-coding RNAs. To demonstrate the use of our methods, we applied TIS Transformer to remap the full human proteome. Oxford University Press 2023-03-03 /pmc/articles/PMC9985340/ /pubmed/36879896 http://dx.doi.org/10.1093/nargab/lqad021 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Article Clauwaert, Jim McVey, Zahra Gupta, Ramneek Menschaert, Gerben TIS Transformer: remapping the human proteome using deep learning |
title | TIS Transformer: remapping the human proteome using deep learning |
title_full | TIS Transformer: remapping the human proteome using deep learning |
title_fullStr | TIS Transformer: remapping the human proteome using deep learning |
title_full_unstemmed | TIS Transformer: remapping the human proteome using deep learning |
title_short | TIS Transformer: remapping the human proteome using deep learning |
title_sort | tis transformer: remapping the human proteome using deep learning |
topic | Methods Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9985340/ https://www.ncbi.nlm.nih.gov/pubmed/36879896 http://dx.doi.org/10.1093/nargab/lqad021 |
work_keys_str_mv | AT clauwaertjim tistransformerremappingthehumanproteomeusingdeeplearning AT mcveyzahra tistransformerremappingthehumanproteomeusingdeeplearning AT guptaramneek tistransformerremappingthehumanproteomeusingdeeplearning AT menschaertgerben tistransformerremappingthehumanproteomeusingdeeplearning |