Cargando…

GeneMark-HM: improving gene prediction in DNA sequences of human microbiome

Computational reconstruction of nearly complete genomes from metagenomic reads may identify thousands of new uncultured candidate bacterial species. We have shown that reconstructed prokaryotic genomes along with genomes of sequenced microbial isolates can be used to support more accurate gene predi...

Descripción completa

Detalles Bibliográficos
Autores principales: Lomsadze, Alexandre, Bonny, Christophe, Strozzi, Francesco, Borodovsky, Mark
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8153819/
https://www.ncbi.nlm.nih.gov/pubmed/34056597
http://dx.doi.org/10.1093/nargab/lqab047
_version_ 1783698881740013568
author Lomsadze, Alexandre
Bonny, Christophe
Strozzi, Francesco
Borodovsky, Mark
author_facet Lomsadze, Alexandre
Bonny, Christophe
Strozzi, Francesco
Borodovsky, Mark
author_sort Lomsadze, Alexandre
collection PubMed
description Computational reconstruction of nearly complete genomes from metagenomic reads may identify thousands of new uncultured candidate bacterial species. We have shown that reconstructed prokaryotic genomes along with genomes of sequenced microbial isolates can be used to support more accurate gene prediction in novel metagenomic sequences. We have proposed an approach that used three types of gene prediction algorithms and found for all contigs in a metagenome nearly optimal models of protein-coding regions either in libraries of pre-computed models or constructed de novo. The model selection process and gene annotation were done by the new GeneMark-HM pipeline. We have created a database of the species level pan-genomes for the human microbiome. To create a library of models representing each pan-genome we used a self-training algorithm GeneMarkS-2. Genes initially predicted in each contig served as queries for a fast similarity search through the pan-genome database. The best matches led to selection of the model for gene prediction. Contigs not assigned to pan-genomes were analyzed by crude, but still accurate models designed for sequences with particular GC compositions. Tests of GeneMark-HM on simulated metagenomes demonstrated improvement in gene annotation of human metagenomic sequences in comparison with the current state-of-the-art gene prediction tools.
format Online
Article
Text
id pubmed-8153819
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-81538192021-05-28 GeneMark-HM: improving gene prediction in DNA sequences of human microbiome Lomsadze, Alexandre Bonny, Christophe Strozzi, Francesco Borodovsky, Mark NAR Genom Bioinform Standard Article Computational reconstruction of nearly complete genomes from metagenomic reads may identify thousands of new uncultured candidate bacterial species. We have shown that reconstructed prokaryotic genomes along with genomes of sequenced microbial isolates can be used to support more accurate gene prediction in novel metagenomic sequences. We have proposed an approach that used three types of gene prediction algorithms and found for all contigs in a metagenome nearly optimal models of protein-coding regions either in libraries of pre-computed models or constructed de novo. The model selection process and gene annotation were done by the new GeneMark-HM pipeline. We have created a database of the species level pan-genomes for the human microbiome. To create a library of models representing each pan-genome we used a self-training algorithm GeneMarkS-2. Genes initially predicted in each contig served as queries for a fast similarity search through the pan-genome database. The best matches led to selection of the model for gene prediction. Contigs not assigned to pan-genomes were analyzed by crude, but still accurate models designed for sequences with particular GC compositions. Tests of GeneMark-HM on simulated metagenomes demonstrated improvement in gene annotation of human metagenomic sequences in comparison with the current state-of-the-art gene prediction tools. Oxford University Press 2021-05-26 /pmc/articles/PMC8153819/ /pubmed/34056597 http://dx.doi.org/10.1093/nargab/lqab047 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Standard Article
Lomsadze, Alexandre
Bonny, Christophe
Strozzi, Francesco
Borodovsky, Mark
GeneMark-HM: improving gene prediction in DNA sequences of human microbiome
title GeneMark-HM: improving gene prediction in DNA sequences of human microbiome
title_full GeneMark-HM: improving gene prediction in DNA sequences of human microbiome
title_fullStr GeneMark-HM: improving gene prediction in DNA sequences of human microbiome
title_full_unstemmed GeneMark-HM: improving gene prediction in DNA sequences of human microbiome
title_short GeneMark-HM: improving gene prediction in DNA sequences of human microbiome
title_sort genemark-hm: improving gene prediction in dna sequences of human microbiome
topic Standard Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8153819/
https://www.ncbi.nlm.nih.gov/pubmed/34056597
http://dx.doi.org/10.1093/nargab/lqab047
work_keys_str_mv AT lomsadzealexandre genemarkhmimprovinggenepredictionindnasequencesofhumanmicrobiome
AT bonnychristophe genemarkhmimprovinggenepredictionindnasequencesofhumanmicrobiome
AT strozzifrancesco genemarkhmimprovinggenepredictionindnasequencesofhumanmicrobiome
AT borodovskymark genemarkhmimprovinggenepredictionindnasequencesofhumanmicrobiome