Cargando…
TOUCAN: a framework for fungal biosynthetic gene cluster discovery
Fungal secondary metabolites (SMs) are an important source of numerous bioactive compounds largely applied in the pharmaceutical industry, as in the production of antibiotics and anticancer medications. The discovery of novel fungal SMs can potentially benefit human health. Identifying biosynthetic...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7694738/ https://www.ncbi.nlm.nih.gov/pubmed/33575642 http://dx.doi.org/10.1093/nargab/lqaa098 |
_version_ | 1783615043803283456 |
---|---|
author | Almeida, Hayda Palys, Sylvester Tsang, Adrian Diallo, Abdoulaye Baniré |
author_facet | Almeida, Hayda Palys, Sylvester Tsang, Adrian Diallo, Abdoulaye Baniré |
author_sort | Almeida, Hayda |
collection | PubMed |
description | Fungal secondary metabolites (SMs) are an important source of numerous bioactive compounds largely applied in the pharmaceutical industry, as in the production of antibiotics and anticancer medications. The discovery of novel fungal SMs can potentially benefit human health. Identifying biosynthetic gene clusters (BGCs) involved in the biosynthesis of SMs can be a costly and complex task, especially due to the genomic diversity of fungal BGCs. Previous studies on fungal BGC discovery present limited scope and can restrict the discovery of new BGCs. In this work, we introduce TOUCAN, a supervised learning framework for fungal BGC discovery. Unlike previous methods, TOUCAN is capable of predicting BGCs on amino acid sequences, facilitating its use on newly sequenced and not yet curated data. It relies on three main pillars: rigorous selection of datasets by BGC experts; combination of functional, evolutionary and compositional features coupled with outperforming classifiers; and robust post-processing methods. TOUCAN best-performing model yields 0.982 F-measure on BGC regions in the Aspergillus niger genome. Overall results show that TOUCAN outperforms previous approaches. TOUCAN focuses on fungal BGCs but can be easily adapted to expand its scope to process other species or include new features. |
format | Online Article Text |
id | pubmed-7694738 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-76947382021-02-10 TOUCAN: a framework for fungal biosynthetic gene cluster discovery Almeida, Hayda Palys, Sylvester Tsang, Adrian Diallo, Abdoulaye Baniré NAR Genom Bioinform Standard Article Fungal secondary metabolites (SMs) are an important source of numerous bioactive compounds largely applied in the pharmaceutical industry, as in the production of antibiotics and anticancer medications. The discovery of novel fungal SMs can potentially benefit human health. Identifying biosynthetic gene clusters (BGCs) involved in the biosynthesis of SMs can be a costly and complex task, especially due to the genomic diversity of fungal BGCs. Previous studies on fungal BGC discovery present limited scope and can restrict the discovery of new BGCs. In this work, we introduce TOUCAN, a supervised learning framework for fungal BGC discovery. Unlike previous methods, TOUCAN is capable of predicting BGCs on amino acid sequences, facilitating its use on newly sequenced and not yet curated data. It relies on three main pillars: rigorous selection of datasets by BGC experts; combination of functional, evolutionary and compositional features coupled with outperforming classifiers; and robust post-processing methods. TOUCAN best-performing model yields 0.982 F-measure on BGC regions in the Aspergillus niger genome. Overall results show that TOUCAN outperforms previous approaches. TOUCAN focuses on fungal BGCs but can be easily adapted to expand its scope to process other species or include new features. Oxford University Press 2020-11-27 /pmc/articles/PMC7694738/ /pubmed/33575642 http://dx.doi.org/10.1093/nargab/lqaa098 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Standard Article Almeida, Hayda Palys, Sylvester Tsang, Adrian Diallo, Abdoulaye Baniré TOUCAN: a framework for fungal biosynthetic gene cluster discovery |
title | TOUCAN: a framework for fungal biosynthetic gene cluster discovery |
title_full | TOUCAN: a framework for fungal biosynthetic gene cluster discovery |
title_fullStr | TOUCAN: a framework for fungal biosynthetic gene cluster discovery |
title_full_unstemmed | TOUCAN: a framework for fungal biosynthetic gene cluster discovery |
title_short | TOUCAN: a framework for fungal biosynthetic gene cluster discovery |
title_sort | toucan: a framework for fungal biosynthetic gene cluster discovery |
topic | Standard Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7694738/ https://www.ncbi.nlm.nih.gov/pubmed/33575642 http://dx.doi.org/10.1093/nargab/lqaa098 |
work_keys_str_mv | AT almeidahayda toucanaframeworkforfungalbiosyntheticgeneclusterdiscovery AT palyssylvester toucanaframeworkforfungalbiosyntheticgeneclusterdiscovery AT tsangadrian toucanaframeworkforfungalbiosyntheticgeneclusterdiscovery AT dialloabdoulayebanire toucanaframeworkforfungalbiosyntheticgeneclusterdiscovery |