Cargando…

Improving candidate Biosynthetic Gene Clusters in fungi through reinforcement learning

MOTIVATION: Precise identification of Biosynthetic Gene Clusters (BGCs) is a challenging task. Performance of BGC discovery tools is limited by their capacity to accurately predict components belonging to candidate BGCs, often overestimating cluster boundaries. To support optimizing the composition...

Descripción completa

Detalles Bibliográficos
Autores principales: Almeida, Hayda, Tsang, Adrian, Diallo, Abdoulaye Baniré
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9364373/
https://www.ncbi.nlm.nih.gov/pubmed/35762945
http://dx.doi.org/10.1093/bioinformatics/btac420
_version_ 1784765133634928640
author Almeida, Hayda
Tsang, Adrian
Diallo, Abdoulaye Baniré
author_facet Almeida, Hayda
Tsang, Adrian
Diallo, Abdoulaye Baniré
author_sort Almeida, Hayda
collection PubMed
description MOTIVATION: Precise identification of Biosynthetic Gene Clusters (BGCs) is a challenging task. Performance of BGC discovery tools is limited by their capacity to accurately predict components belonging to candidate BGCs, often overestimating cluster boundaries. To support optimizing the composition and boundaries of candidate BGCs, we propose reinforcement learning approach relying on protein domains and functional annotations from expert curated BGCs. RESULTS: The proposed reinforcement learning method aims to improve candidate BGCs obtained with state-of-the-art tools. It was evaluated on candidate BGCs obtained for two fungal genomes, Aspergillus niger and Aspergillus nidulans. The results highlight an improvement of the gene precision by above 15% for TOUCAN, fungiSMASH and DeepBGC; and cluster precision by above 25% for fungiSMASH and DeepBCG, allowing these tools to obtain almost perfect precision in cluster prediction. This can pave the way of optimizing current prediction of candidate BGCs in fungi, while minimizing the curation effort required by domain experts. AVAILABILITY AND IMPLEMENTATION: https://github.com/bioinfoUQAM/RL-bgc-components. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9364373
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-93643732022-08-11 Improving candidate Biosynthetic Gene Clusters in fungi through reinforcement learning Almeida, Hayda Tsang, Adrian Diallo, Abdoulaye Baniré Bioinformatics Original Papers MOTIVATION: Precise identification of Biosynthetic Gene Clusters (BGCs) is a challenging task. Performance of BGC discovery tools is limited by their capacity to accurately predict components belonging to candidate BGCs, often overestimating cluster boundaries. To support optimizing the composition and boundaries of candidate BGCs, we propose reinforcement learning approach relying on protein domains and functional annotations from expert curated BGCs. RESULTS: The proposed reinforcement learning method aims to improve candidate BGCs obtained with state-of-the-art tools. It was evaluated on candidate BGCs obtained for two fungal genomes, Aspergillus niger and Aspergillus nidulans. The results highlight an improvement of the gene precision by above 15% for TOUCAN, fungiSMASH and DeepBGC; and cluster precision by above 25% for fungiSMASH and DeepBCG, allowing these tools to obtain almost perfect precision in cluster prediction. This can pave the way of optimizing current prediction of candidate BGCs in fungi, while minimizing the curation effort required by domain experts. AVAILABILITY AND IMPLEMENTATION: https://github.com/bioinfoUQAM/RL-bgc-components. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-06-28 /pmc/articles/PMC9364373/ /pubmed/35762945 http://dx.doi.org/10.1093/bioinformatics/btac420 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Almeida, Hayda
Tsang, Adrian
Diallo, Abdoulaye Baniré
Improving candidate Biosynthetic Gene Clusters in fungi through reinforcement learning
title Improving candidate Biosynthetic Gene Clusters in fungi through reinforcement learning
title_full Improving candidate Biosynthetic Gene Clusters in fungi through reinforcement learning
title_fullStr Improving candidate Biosynthetic Gene Clusters in fungi through reinforcement learning
title_full_unstemmed Improving candidate Biosynthetic Gene Clusters in fungi through reinforcement learning
title_short Improving candidate Biosynthetic Gene Clusters in fungi through reinforcement learning
title_sort improving candidate biosynthetic gene clusters in fungi through reinforcement learning
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9364373/
https://www.ncbi.nlm.nih.gov/pubmed/35762945
http://dx.doi.org/10.1093/bioinformatics/btac420
work_keys_str_mv AT almeidahayda improvingcandidatebiosyntheticgeneclustersinfungithroughreinforcementlearning
AT tsangadrian improvingcandidatebiosyntheticgeneclustersinfungithroughreinforcementlearning
AT dialloabdoulayebanire improvingcandidatebiosyntheticgeneclustersinfungithroughreinforcementlearning