Cargando…
GeneMark-HM: improving gene prediction in DNA sequences of human microbiome
Computational reconstruction of nearly complete genomes from metagenomic reads may identify thousands of new uncultured candidate bacterial species. We have shown that reconstructed prokaryotic genomes along with genomes of sequenced microbial isolates can be used to support more accurate gene predi...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8153819/ https://www.ncbi.nlm.nih.gov/pubmed/34056597 http://dx.doi.org/10.1093/nargab/lqab047 |
_version_ | 1783698881740013568 |
---|---|
author | Lomsadze, Alexandre Bonny, Christophe Strozzi, Francesco Borodovsky, Mark |
author_facet | Lomsadze, Alexandre Bonny, Christophe Strozzi, Francesco Borodovsky, Mark |
author_sort | Lomsadze, Alexandre |
collection | PubMed |
description | Computational reconstruction of nearly complete genomes from metagenomic reads may identify thousands of new uncultured candidate bacterial species. We have shown that reconstructed prokaryotic genomes along with genomes of sequenced microbial isolates can be used to support more accurate gene prediction in novel metagenomic sequences. We have proposed an approach that used three types of gene prediction algorithms and found for all contigs in a metagenome nearly optimal models of protein-coding regions either in libraries of pre-computed models or constructed de novo. The model selection process and gene annotation were done by the new GeneMark-HM pipeline. We have created a database of the species level pan-genomes for the human microbiome. To create a library of models representing each pan-genome we used a self-training algorithm GeneMarkS-2. Genes initially predicted in each contig served as queries for a fast similarity search through the pan-genome database. The best matches led to selection of the model for gene prediction. Contigs not assigned to pan-genomes were analyzed by crude, but still accurate models designed for sequences with particular GC compositions. Tests of GeneMark-HM on simulated metagenomes demonstrated improvement in gene annotation of human metagenomic sequences in comparison with the current state-of-the-art gene prediction tools. |
format | Online Article Text |
id | pubmed-8153819 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-81538192021-05-28 GeneMark-HM: improving gene prediction in DNA sequences of human microbiome Lomsadze, Alexandre Bonny, Christophe Strozzi, Francesco Borodovsky, Mark NAR Genom Bioinform Standard Article Computational reconstruction of nearly complete genomes from metagenomic reads may identify thousands of new uncultured candidate bacterial species. We have shown that reconstructed prokaryotic genomes along with genomes of sequenced microbial isolates can be used to support more accurate gene prediction in novel metagenomic sequences. We have proposed an approach that used three types of gene prediction algorithms and found for all contigs in a metagenome nearly optimal models of protein-coding regions either in libraries of pre-computed models or constructed de novo. The model selection process and gene annotation were done by the new GeneMark-HM pipeline. We have created a database of the species level pan-genomes for the human microbiome. To create a library of models representing each pan-genome we used a self-training algorithm GeneMarkS-2. Genes initially predicted in each contig served as queries for a fast similarity search through the pan-genome database. The best matches led to selection of the model for gene prediction. Contigs not assigned to pan-genomes were analyzed by crude, but still accurate models designed for sequences with particular GC compositions. Tests of GeneMark-HM on simulated metagenomes demonstrated improvement in gene annotation of human metagenomic sequences in comparison with the current state-of-the-art gene prediction tools. Oxford University Press 2021-05-26 /pmc/articles/PMC8153819/ /pubmed/34056597 http://dx.doi.org/10.1093/nargab/lqab047 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Standard Article Lomsadze, Alexandre Bonny, Christophe Strozzi, Francesco Borodovsky, Mark GeneMark-HM: improving gene prediction in DNA sequences of human microbiome |
title | GeneMark-HM: improving gene prediction in DNA sequences of human microbiome |
title_full | GeneMark-HM: improving gene prediction in DNA sequences of human microbiome |
title_fullStr | GeneMark-HM: improving gene prediction in DNA sequences of human microbiome |
title_full_unstemmed | GeneMark-HM: improving gene prediction in DNA sequences of human microbiome |
title_short | GeneMark-HM: improving gene prediction in DNA sequences of human microbiome |
title_sort | genemark-hm: improving gene prediction in dna sequences of human microbiome |
topic | Standard Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8153819/ https://www.ncbi.nlm.nih.gov/pubmed/34056597 http://dx.doi.org/10.1093/nargab/lqab047 |
work_keys_str_mv | AT lomsadzealexandre genemarkhmimprovinggenepredictionindnasequencesofhumanmicrobiome AT bonnychristophe genemarkhmimprovinggenepredictionindnasequencesofhumanmicrobiome AT strozzifrancesco genemarkhmimprovinggenepredictionindnasequencesofhumanmicrobiome AT borodovskymark genemarkhmimprovinggenepredictionindnasequencesofhumanmicrobiome |