Cargando…

geneRFinder: gene finding in distinct metagenomic data complexities

BACKGROUND: Microbes perform a fundamental economic, social, and environmental role in our society. Metagenomics makes it possible to investigate microbes in their natural environments (the complex communities) and their interactions. The way they act is usually estimated by looking at the functions...

Descripción completa

Detalles Bibliográficos
Autores principales: Silva, Raíssa, Padovani, Kleber, Góes, Fabiana, Alves, Ronnie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7905635/
https://www.ncbi.nlm.nih.gov/pubmed/33632132
http://dx.doi.org/10.1186/s12859-021-03997-w
_version_ 1783655146414145536
author Silva, Raíssa
Padovani, Kleber
Góes, Fabiana
Alves, Ronnie
author_facet Silva, Raíssa
Padovani, Kleber
Góes, Fabiana
Alves, Ronnie
author_sort Silva, Raíssa
collection PubMed
description BACKGROUND: Microbes perform a fundamental economic, social, and environmental role in our society. Metagenomics makes it possible to investigate microbes in their natural environments (the complex communities) and their interactions. The way they act is usually estimated by looking at the functions they play in those environments and their responsibility is measured by their genes. The advances of next-generation sequencing technology have facilitated metagenomics research however it also creates a heavy computational burden. Large and complex biological datasets are available as never before. There are many gene predictors available that can aid the gene annotation process though they lack handling appropriately metagenomic data complexities. There is no standard metagenomic benchmark data for gene prediction. Thus, gene predictors may inflate their results by obfuscating low false discovery rates. RESULTS: We introduce geneRFinder, an ML-based gene predictor able to outperform state-of-the-art gene prediction tools across this benchmark by using only one pre-trained Random Forest model. Average prediction rates of geneRFinder differed in percentage terms by 54% and 64%, respectively, against Prodigal and FragGeneScan while handling high complexity metagenomes. The specificity rate of geneRFinder had the largest distance against FragGeneScan, 79 percentage points, and 66 more than Prodigal. According to McNemar’s test, all percentual differences between predictors performances are statistically significant for all datasets with a 99% confidence interval. CONCLUSIONS: We provide geneRFinder, an approach for gene prediction in distinct metagenomic complexities, available at gitlab.com/r.lorenna/generfinder and https://osf.io/w2yd6/, and also we provide a novel, comprehensive benchmark data for gene prediction—which is based on The Critical Assessment of Metagenome Interpretation (CAMI) challenge, and contains labeled data from gene regions—available at https://sourceforge.net/p/generfinder-benchmark.
format Online
Article
Text
id pubmed-7905635
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-79056352021-02-25 geneRFinder: gene finding in distinct metagenomic data complexities Silva, Raíssa Padovani, Kleber Góes, Fabiana Alves, Ronnie BMC Bioinformatics Software BACKGROUND: Microbes perform a fundamental economic, social, and environmental role in our society. Metagenomics makes it possible to investigate microbes in their natural environments (the complex communities) and their interactions. The way they act is usually estimated by looking at the functions they play in those environments and their responsibility is measured by their genes. The advances of next-generation sequencing technology have facilitated metagenomics research however it also creates a heavy computational burden. Large and complex biological datasets are available as never before. There are many gene predictors available that can aid the gene annotation process though they lack handling appropriately metagenomic data complexities. There is no standard metagenomic benchmark data for gene prediction. Thus, gene predictors may inflate their results by obfuscating low false discovery rates. RESULTS: We introduce geneRFinder, an ML-based gene predictor able to outperform state-of-the-art gene prediction tools across this benchmark by using only one pre-trained Random Forest model. Average prediction rates of geneRFinder differed in percentage terms by 54% and 64%, respectively, against Prodigal and FragGeneScan while handling high complexity metagenomes. The specificity rate of geneRFinder had the largest distance against FragGeneScan, 79 percentage points, and 66 more than Prodigal. According to McNemar’s test, all percentual differences between predictors performances are statistically significant for all datasets with a 99% confidence interval. CONCLUSIONS: We provide geneRFinder, an approach for gene prediction in distinct metagenomic complexities, available at gitlab.com/r.lorenna/generfinder and https://osf.io/w2yd6/, and also we provide a novel, comprehensive benchmark data for gene prediction—which is based on The Critical Assessment of Metagenome Interpretation (CAMI) challenge, and contains labeled data from gene regions—available at https://sourceforge.net/p/generfinder-benchmark. BioMed Central 2021-02-25 /pmc/articles/PMC7905635/ /pubmed/33632132 http://dx.doi.org/10.1186/s12859-021-03997-w Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Silva, Raíssa
Padovani, Kleber
Góes, Fabiana
Alves, Ronnie
geneRFinder: gene finding in distinct metagenomic data complexities
title geneRFinder: gene finding in distinct metagenomic data complexities
title_full geneRFinder: gene finding in distinct metagenomic data complexities
title_fullStr geneRFinder: gene finding in distinct metagenomic data complexities
title_full_unstemmed geneRFinder: gene finding in distinct metagenomic data complexities
title_short geneRFinder: gene finding in distinct metagenomic data complexities
title_sort generfinder: gene finding in distinct metagenomic data complexities
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7905635/
https://www.ncbi.nlm.nih.gov/pubmed/33632132
http://dx.doi.org/10.1186/s12859-021-03997-w
work_keys_str_mv AT silvaraissa generfindergenefindingindistinctmetagenomicdatacomplexities
AT padovanikleber generfindergenefindingindistinctmetagenomicdatacomplexities
AT goesfabiana generfindergenefindingindistinctmetagenomicdatacomplexities
AT alvesronnie generfindergenefindingindistinctmetagenomicdatacomplexities