Cargando…

TOUCAN: a framework for fungal biosynthetic gene cluster discovery

Fungal secondary metabolites (SMs) are an important source of numerous bioactive compounds largely applied in the pharmaceutical industry, as in the production of antibiotics and anticancer medications. The discovery of novel fungal SMs can potentially benefit human health. Identifying biosynthetic...

Descripción completa

Detalles Bibliográficos
Autores principales: Almeida, Hayda, Palys, Sylvester, Tsang, Adrian, Diallo, Abdoulaye Baniré
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7694738/
https://www.ncbi.nlm.nih.gov/pubmed/33575642
http://dx.doi.org/10.1093/nargab/lqaa098
_version_ 1783615043803283456
author Almeida, Hayda
Palys, Sylvester
Tsang, Adrian
Diallo, Abdoulaye Baniré
author_facet Almeida, Hayda
Palys, Sylvester
Tsang, Adrian
Diallo, Abdoulaye Baniré
author_sort Almeida, Hayda
collection PubMed
description Fungal secondary metabolites (SMs) are an important source of numerous bioactive compounds largely applied in the pharmaceutical industry, as in the production of antibiotics and anticancer medications. The discovery of novel fungal SMs can potentially benefit human health. Identifying biosynthetic gene clusters (BGCs) involved in the biosynthesis of SMs can be a costly and complex task, especially due to the genomic diversity of fungal BGCs. Previous studies on fungal BGC discovery present limited scope and can restrict the discovery of new BGCs. In this work, we introduce TOUCAN, a supervised learning framework for fungal BGC discovery. Unlike previous methods, TOUCAN is capable of predicting BGCs on amino acid sequences, facilitating its use on newly sequenced and not yet curated data. It relies on three main pillars: rigorous selection of datasets by BGC experts; combination of functional, evolutionary and compositional features coupled with outperforming classifiers; and robust post-processing methods. TOUCAN best-performing model yields 0.982 F-measure on BGC regions in the Aspergillus niger genome. Overall results show that TOUCAN outperforms previous approaches. TOUCAN focuses on fungal BGCs but can be easily adapted to expand its scope to process other species or include new features.
format Online
Article
Text
id pubmed-7694738
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-76947382021-02-10 TOUCAN: a framework for fungal biosynthetic gene cluster discovery Almeida, Hayda Palys, Sylvester Tsang, Adrian Diallo, Abdoulaye Baniré NAR Genom Bioinform Standard Article Fungal secondary metabolites (SMs) are an important source of numerous bioactive compounds largely applied in the pharmaceutical industry, as in the production of antibiotics and anticancer medications. The discovery of novel fungal SMs can potentially benefit human health. Identifying biosynthetic gene clusters (BGCs) involved in the biosynthesis of SMs can be a costly and complex task, especially due to the genomic diversity of fungal BGCs. Previous studies on fungal BGC discovery present limited scope and can restrict the discovery of new BGCs. In this work, we introduce TOUCAN, a supervised learning framework for fungal BGC discovery. Unlike previous methods, TOUCAN is capable of predicting BGCs on amino acid sequences, facilitating its use on newly sequenced and not yet curated data. It relies on three main pillars: rigorous selection of datasets by BGC experts; combination of functional, evolutionary and compositional features coupled with outperforming classifiers; and robust post-processing methods. TOUCAN best-performing model yields 0.982 F-measure on BGC regions in the Aspergillus niger genome. Overall results show that TOUCAN outperforms previous approaches. TOUCAN focuses on fungal BGCs but can be easily adapted to expand its scope to process other species or include new features. Oxford University Press 2020-11-27 /pmc/articles/PMC7694738/ /pubmed/33575642 http://dx.doi.org/10.1093/nargab/lqaa098 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Standard Article
Almeida, Hayda
Palys, Sylvester
Tsang, Adrian
Diallo, Abdoulaye Baniré
TOUCAN: a framework for fungal biosynthetic gene cluster discovery
title TOUCAN: a framework for fungal biosynthetic gene cluster discovery
title_full TOUCAN: a framework for fungal biosynthetic gene cluster discovery
title_fullStr TOUCAN: a framework for fungal biosynthetic gene cluster discovery
title_full_unstemmed TOUCAN: a framework for fungal biosynthetic gene cluster discovery
title_short TOUCAN: a framework for fungal biosynthetic gene cluster discovery
title_sort toucan: a framework for fungal biosynthetic gene cluster discovery
topic Standard Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7694738/
https://www.ncbi.nlm.nih.gov/pubmed/33575642
http://dx.doi.org/10.1093/nargab/lqaa098
work_keys_str_mv AT almeidahayda toucanaframeworkforfungalbiosyntheticgeneclusterdiscovery
AT palyssylvester toucanaframeworkforfungalbiosyntheticgeneclusterdiscovery
AT tsangadrian toucanaframeworkforfungalbiosyntheticgeneclusterdiscovery
AT dialloabdoulayebanire toucanaframeworkforfungalbiosyntheticgeneclusterdiscovery