Cargando…

OncoRTT: Predicting novel oncology-related therapeutic targets using BERT embeddings and omics features

Late-stage drug development failures are usually a consequence of ineffective targets. Thus, proper target identification is needed, which may be possible using computational approaches. The reason being, effective targets have disease-relevant biological functions, and omics data unveil the protein...

Descripción completa

Detalles Bibliográficos
Autores principales: Thafar, Maha A., Albaradei, Somayah, Uludag, Mahmut, Alshahrani, Mona, Gojobori, Takashi, Essack, Magbubah, Gao, Xin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10117673/
https://www.ncbi.nlm.nih.gov/pubmed/37091791
http://dx.doi.org/10.3389/fgene.2023.1139626
_version_ 1785028642495004672
author Thafar, Maha A.
Albaradei, Somayah
Uludag, Mahmut
Alshahrani, Mona
Gojobori, Takashi
Essack, Magbubah
Gao, Xin
author_facet Thafar, Maha A.
Albaradei, Somayah
Uludag, Mahmut
Alshahrani, Mona
Gojobori, Takashi
Essack, Magbubah
Gao, Xin
author_sort Thafar, Maha A.
collection PubMed
description Late-stage drug development failures are usually a consequence of ineffective targets. Thus, proper target identification is needed, which may be possible using computational approaches. The reason being, effective targets have disease-relevant biological functions, and omics data unveil the proteins involved in these functions. Also, properties that favor the existence of binding between drug and target are deducible from the protein’s amino acid sequence. In this work, we developed OncoRTT, a deep learning (DL)-based method for predicting novel therapeutic targets. OncoRTT is designed to reduce suboptimal target selection by identifying novel targets based on features of known effective targets using DL approaches. First, we created the “OncologyTT” datasets, which include genes/proteins associated with ten prevalent cancer types. Then, we generated three sets of features for all genes: omics features, the proteins’ amino-acid sequence BERT embeddings, and the integrated features to train and test the DL classifiers separately. The models achieved high prediction performances in terms of area under the curve (AUC), i.e., AUC greater than 0.88 for all cancer types, with a maximum of 0.95 for leukemia. Also, OncoRTT outperformed the state-of-the-art method using their data in five out of seven cancer types commonly assessed by both methods. Furthermore, OncoRTT predicts novel therapeutic targets using new test data related to the seven cancer types. We further corroborated these results with other validation evidence using the Open Targets Platform and a case study focused on the top-10 predicted therapeutic targets for lung cancer.
format Online
Article
Text
id pubmed-10117673
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-101176732023-04-21 OncoRTT: Predicting novel oncology-related therapeutic targets using BERT embeddings and omics features Thafar, Maha A. Albaradei, Somayah Uludag, Mahmut Alshahrani, Mona Gojobori, Takashi Essack, Magbubah Gao, Xin Front Genet Genetics Late-stage drug development failures are usually a consequence of ineffective targets. Thus, proper target identification is needed, which may be possible using computational approaches. The reason being, effective targets have disease-relevant biological functions, and omics data unveil the proteins involved in these functions. Also, properties that favor the existence of binding between drug and target are deducible from the protein’s amino acid sequence. In this work, we developed OncoRTT, a deep learning (DL)-based method for predicting novel therapeutic targets. OncoRTT is designed to reduce suboptimal target selection by identifying novel targets based on features of known effective targets using DL approaches. First, we created the “OncologyTT” datasets, which include genes/proteins associated with ten prevalent cancer types. Then, we generated three sets of features for all genes: omics features, the proteins’ amino-acid sequence BERT embeddings, and the integrated features to train and test the DL classifiers separately. The models achieved high prediction performances in terms of area under the curve (AUC), i.e., AUC greater than 0.88 for all cancer types, with a maximum of 0.95 for leukemia. Also, OncoRTT outperformed the state-of-the-art method using their data in five out of seven cancer types commonly assessed by both methods. Furthermore, OncoRTT predicts novel therapeutic targets using new test data related to the seven cancer types. We further corroborated these results with other validation evidence using the Open Targets Platform and a case study focused on the top-10 predicted therapeutic targets for lung cancer. Frontiers Media S.A. 2023-04-06 /pmc/articles/PMC10117673/ /pubmed/37091791 http://dx.doi.org/10.3389/fgene.2023.1139626 Text en Copyright © 2023 Thafar, Albaradei, Uludag, Alshahrani, Gojobori, Essack and Gao. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Thafar, Maha A.
Albaradei, Somayah
Uludag, Mahmut
Alshahrani, Mona
Gojobori, Takashi
Essack, Magbubah
Gao, Xin
OncoRTT: Predicting novel oncology-related therapeutic targets using BERT embeddings and omics features
title OncoRTT: Predicting novel oncology-related therapeutic targets using BERT embeddings and omics features
title_full OncoRTT: Predicting novel oncology-related therapeutic targets using BERT embeddings and omics features
title_fullStr OncoRTT: Predicting novel oncology-related therapeutic targets using BERT embeddings and omics features
title_full_unstemmed OncoRTT: Predicting novel oncology-related therapeutic targets using BERT embeddings and omics features
title_short OncoRTT: Predicting novel oncology-related therapeutic targets using BERT embeddings and omics features
title_sort oncortt: predicting novel oncology-related therapeutic targets using bert embeddings and omics features
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10117673/
https://www.ncbi.nlm.nih.gov/pubmed/37091791
http://dx.doi.org/10.3389/fgene.2023.1139626
work_keys_str_mv AT thafarmahaa oncorttpredictingnoveloncologyrelatedtherapeutictargetsusingbertembeddingsandomicsfeatures
AT albaradeisomayah oncorttpredictingnoveloncologyrelatedtherapeutictargetsusingbertembeddingsandomicsfeatures
AT uludagmahmut oncorttpredictingnoveloncologyrelatedtherapeutictargetsusingbertembeddingsandomicsfeatures
AT alshahranimona oncorttpredictingnoveloncologyrelatedtherapeutictargetsusingbertembeddingsandomicsfeatures
AT gojoboritakashi oncorttpredictingnoveloncologyrelatedtherapeutictargetsusingbertembeddingsandomicsfeatures
AT essackmagbubah oncorttpredictingnoveloncologyrelatedtherapeutictargetsusingbertembeddingsandomicsfeatures
AT gaoxin oncorttpredictingnoveloncologyrelatedtherapeutictargetsusingbertembeddingsandomicsfeatures