Cargando…

TIS Transformer: remapping the human proteome using deep learning

The correct mapping of the proteome is an important step towards advancing our understanding of biological systems and cellular mechanisms. Methods that provide better mappings can fuel important processes such as drug discovery and disease understanding. Currently, true determination of translation...

Descripción completa

Detalles Bibliográficos
Autores principales: Clauwaert, Jim, McVey, Zahra, Gupta, Ramneek, Menschaert, Gerben
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9985340/
https://www.ncbi.nlm.nih.gov/pubmed/36879896
http://dx.doi.org/10.1093/nargab/lqad021
_version_ 1784900931455811584
author Clauwaert, Jim
McVey, Zahra
Gupta, Ramneek
Menschaert, Gerben
author_facet Clauwaert, Jim
McVey, Zahra
Gupta, Ramneek
Menschaert, Gerben
author_sort Clauwaert, Jim
collection PubMed
description The correct mapping of the proteome is an important step towards advancing our understanding of biological systems and cellular mechanisms. Methods that provide better mappings can fuel important processes such as drug discovery and disease understanding. Currently, true determination of translation initiation sites is primarily achieved by in vivo experiments. Here, we propose TIS Transformer, a deep learning model for the determination of translation start sites solely utilizing the information embedded in the transcript nucleotide sequence. The method is built upon deep learning techniques first designed for natural language processing. We prove this approach to be best suited for learning the semantics of translation, outperforming previous approaches by a large margin. We demonstrate that limitations in the model performance are primarily due to the presence of low-quality annotations against which the model is evaluated against. Advantages of the method are its ability to detect key features of the translation process and multiple coding sequences on a transcript. These include micropeptides encoded by short Open Reading Frames, either alongside a canonical coding sequence or within long non-coding RNAs. To demonstrate the use of our methods, we applied TIS Transformer to remap the full human proteome.
format Online
Article
Text
id pubmed-9985340
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-99853402023-03-05 TIS Transformer: remapping the human proteome using deep learning Clauwaert, Jim McVey, Zahra Gupta, Ramneek Menschaert, Gerben NAR Genom Bioinform Methods Article The correct mapping of the proteome is an important step towards advancing our understanding of biological systems and cellular mechanisms. Methods that provide better mappings can fuel important processes such as drug discovery and disease understanding. Currently, true determination of translation initiation sites is primarily achieved by in vivo experiments. Here, we propose TIS Transformer, a deep learning model for the determination of translation start sites solely utilizing the information embedded in the transcript nucleotide sequence. The method is built upon deep learning techniques first designed for natural language processing. We prove this approach to be best suited for learning the semantics of translation, outperforming previous approaches by a large margin. We demonstrate that limitations in the model performance are primarily due to the presence of low-quality annotations against which the model is evaluated against. Advantages of the method are its ability to detect key features of the translation process and multiple coding sequences on a transcript. These include micropeptides encoded by short Open Reading Frames, either alongside a canonical coding sequence or within long non-coding RNAs. To demonstrate the use of our methods, we applied TIS Transformer to remap the full human proteome. Oxford University Press 2023-03-03 /pmc/articles/PMC9985340/ /pubmed/36879896 http://dx.doi.org/10.1093/nargab/lqad021 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Article
Clauwaert, Jim
McVey, Zahra
Gupta, Ramneek
Menschaert, Gerben
TIS Transformer: remapping the human proteome using deep learning
title TIS Transformer: remapping the human proteome using deep learning
title_full TIS Transformer: remapping the human proteome using deep learning
title_fullStr TIS Transformer: remapping the human proteome using deep learning
title_full_unstemmed TIS Transformer: remapping the human proteome using deep learning
title_short TIS Transformer: remapping the human proteome using deep learning
title_sort tis transformer: remapping the human proteome using deep learning
topic Methods Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9985340/
https://www.ncbi.nlm.nih.gov/pubmed/36879896
http://dx.doi.org/10.1093/nargab/lqad021
work_keys_str_mv AT clauwaertjim tistransformerremappingthehumanproteomeusingdeeplearning
AT mcveyzahra tistransformerremappingthehumanproteomeusingdeeplearning
AT guptaramneek tistransformerremappingthehumanproteomeusingdeeplearning
AT menschaertgerben tistransformerremappingthehumanproteomeusingdeeplearning