Cargando…

IRESpy: an XGBoost model for prediction of internal ribosome entry sites

BACKGROUND: Internal ribosome entry sites (IRES) are segments of mRNA found in untranslated regions that can recruit the ribosome and initiate translation independently of the 5′ cap-dependent translation initiation mechanism. IRES usually function when 5′ cap-dependent translation initiation has be...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Junhui, Gribskov, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6664791/
https://www.ncbi.nlm.nih.gov/pubmed/31362694
http://dx.doi.org/10.1186/s12859-019-2999-7
_version_ 1783439959798054912
author Wang, Junhui
Gribskov, Michael
author_facet Wang, Junhui
Gribskov, Michael
author_sort Wang, Junhui
collection PubMed
description BACKGROUND: Internal ribosome entry sites (IRES) are segments of mRNA found in untranslated regions that can recruit the ribosome and initiate translation independently of the 5′ cap-dependent translation initiation mechanism. IRES usually function when 5′ cap-dependent translation initiation has been blocked or repressed. They have been widely found to play important roles in viral infections and cellular processes. However, a limited number of confirmed IRES have been reported due to the requirement for highly labor intensive, slow, and low efficiency laboratory experiments. Bioinformatics tools have been developed, but there is no reliable online tool. RESULTS: This paper systematically examines the features that can distinguish IRES from non-IRES sequences. Sequence features such as kmer words, structural features such as Q(MFE), and sequence/structure hybrid features are evaluated as possible discriminators. They are incorporated into an IRES classifier based on XGBoost. The XGBoost model performs better than previous classifiers, with higher accuracy and much shorter computational time. The number of features in the model has been greatly reduced, compared to previous predictors, by including global kmer and structural features. The contributions of model features are well explained by LIME and SHapley Additive exPlanations. The trained XGBoost model has been implemented as a bioinformatics tool for IRES prediction, IRESpy (https://irespy.shinyapps.io/IRESpy/), which has been applied to scan the human 5′ UTR and find novel IRES segments. CONCLUSIONS: IRESpy is a fast, reliable, high-throughput IRES online prediction tool. It provides a publicly available tool for all IRES researchers, and can be used in other genomics applications such as gene annotation and analysis of differential gene expression. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2999-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6664791
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-66647912019-08-05 IRESpy: an XGBoost model for prediction of internal ribosome entry sites Wang, Junhui Gribskov, Michael BMC Bioinformatics Research Article BACKGROUND: Internal ribosome entry sites (IRES) are segments of mRNA found in untranslated regions that can recruit the ribosome and initiate translation independently of the 5′ cap-dependent translation initiation mechanism. IRES usually function when 5′ cap-dependent translation initiation has been blocked or repressed. They have been widely found to play important roles in viral infections and cellular processes. However, a limited number of confirmed IRES have been reported due to the requirement for highly labor intensive, slow, and low efficiency laboratory experiments. Bioinformatics tools have been developed, but there is no reliable online tool. RESULTS: This paper systematically examines the features that can distinguish IRES from non-IRES sequences. Sequence features such as kmer words, structural features such as Q(MFE), and sequence/structure hybrid features are evaluated as possible discriminators. They are incorporated into an IRES classifier based on XGBoost. The XGBoost model performs better than previous classifiers, with higher accuracy and much shorter computational time. The number of features in the model has been greatly reduced, compared to previous predictors, by including global kmer and structural features. The contributions of model features are well explained by LIME and SHapley Additive exPlanations. The trained XGBoost model has been implemented as a bioinformatics tool for IRES prediction, IRESpy (https://irespy.shinyapps.io/IRESpy/), which has been applied to scan the human 5′ UTR and find novel IRES segments. CONCLUSIONS: IRESpy is a fast, reliable, high-throughput IRES online prediction tool. It provides a publicly available tool for all IRES researchers, and can be used in other genomics applications such as gene annotation and analysis of differential gene expression. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2999-7) contains supplementary material, which is available to authorized users. BioMed Central 2019-07-30 /pmc/articles/PMC6664791/ /pubmed/31362694 http://dx.doi.org/10.1186/s12859-019-2999-7 Text en © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Wang, Junhui
Gribskov, Michael
IRESpy: an XGBoost model for prediction of internal ribosome entry sites
title IRESpy: an XGBoost model for prediction of internal ribosome entry sites
title_full IRESpy: an XGBoost model for prediction of internal ribosome entry sites
title_fullStr IRESpy: an XGBoost model for prediction of internal ribosome entry sites
title_full_unstemmed IRESpy: an XGBoost model for prediction of internal ribosome entry sites
title_short IRESpy: an XGBoost model for prediction of internal ribosome entry sites
title_sort irespy: an xgboost model for prediction of internal ribosome entry sites
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6664791/
https://www.ncbi.nlm.nih.gov/pubmed/31362694
http://dx.doi.org/10.1186/s12859-019-2999-7
work_keys_str_mv AT wangjunhui irespyanxgboostmodelforpredictionofinternalribosomeentrysites
AT gribskovmichael irespyanxgboostmodelforpredictionofinternalribosomeentrysites