Cargando…
A machine learning-based service for estimating quality of genomes using PATRIC
BACKGROUND: Recent advances in high-volume sequencing technology and mining of genomes from metagenomic samples call for rapid and reliable genome quality evaluation. The current release of the PATRIC database contains over 220,000 genomes, and current metagenomic technology supports assemblies of m...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6775668/ https://www.ncbi.nlm.nih.gov/pubmed/31581946 http://dx.doi.org/10.1186/s12859-019-3068-y |
_version_ | 1783456297044148224 |
---|---|
author | Parrello, Bruce Butler, Rory Chlenski, Philippe Olson, Robert Overbeek, Jamie Pusch, Gordon D. Vonstein, Veronika Overbeek, Ross |
author_facet | Parrello, Bruce Butler, Rory Chlenski, Philippe Olson, Robert Overbeek, Jamie Pusch, Gordon D. Vonstein, Veronika Overbeek, Ross |
author_sort | Parrello, Bruce |
collection | PubMed |
description | BACKGROUND: Recent advances in high-volume sequencing technology and mining of genomes from metagenomic samples call for rapid and reliable genome quality evaluation. The current release of the PATRIC database contains over 220,000 genomes, and current metagenomic technology supports assemblies of many draft-quality genomes from a single sample, most of which will be novel. DESCRIPTION: We have added two quality assessment tools to the PATRIC annotation pipeline. EvalCon uses supervised machine learning to calculate an annotation consistency score. EvalG implements a variant of the CheckM algorithm to estimate contamination and completeness of an annotated genome.We report on the performance of these tools and the potential utility of the consistency score. Additionally, we provide contamination, completeness, and consistency measures for all genomes in PATRIC and in a recent set of metagenomic assemblies. CONCLUSION: EvalG and EvalCon facilitate the rapid quality control and exploration of PATRIC-annotated draft genomes. |
format | Online Article Text |
id | pubmed-6775668 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-67756682019-10-07 A machine learning-based service for estimating quality of genomes using PATRIC Parrello, Bruce Butler, Rory Chlenski, Philippe Olson, Robert Overbeek, Jamie Pusch, Gordon D. Vonstein, Veronika Overbeek, Ross BMC Bioinformatics Database BACKGROUND: Recent advances in high-volume sequencing technology and mining of genomes from metagenomic samples call for rapid and reliable genome quality evaluation. The current release of the PATRIC database contains over 220,000 genomes, and current metagenomic technology supports assemblies of many draft-quality genomes from a single sample, most of which will be novel. DESCRIPTION: We have added two quality assessment tools to the PATRIC annotation pipeline. EvalCon uses supervised machine learning to calculate an annotation consistency score. EvalG implements a variant of the CheckM algorithm to estimate contamination and completeness of an annotated genome.We report on the performance of these tools and the potential utility of the consistency score. Additionally, we provide contamination, completeness, and consistency measures for all genomes in PATRIC and in a recent set of metagenomic assemblies. CONCLUSION: EvalG and EvalCon facilitate the rapid quality control and exploration of PATRIC-annotated draft genomes. BioMed Central 2019-10-03 /pmc/articles/PMC6775668/ /pubmed/31581946 http://dx.doi.org/10.1186/s12859-019-3068-y Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Database Parrello, Bruce Butler, Rory Chlenski, Philippe Olson, Robert Overbeek, Jamie Pusch, Gordon D. Vonstein, Veronika Overbeek, Ross A machine learning-based service for estimating quality of genomes using PATRIC |
title | A machine learning-based service for estimating quality of genomes using PATRIC |
title_full | A machine learning-based service for estimating quality of genomes using PATRIC |
title_fullStr | A machine learning-based service for estimating quality of genomes using PATRIC |
title_full_unstemmed | A machine learning-based service for estimating quality of genomes using PATRIC |
title_short | A machine learning-based service for estimating quality of genomes using PATRIC |
title_sort | machine learning-based service for estimating quality of genomes using patric |
topic | Database |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6775668/ https://www.ncbi.nlm.nih.gov/pubmed/31581946 http://dx.doi.org/10.1186/s12859-019-3068-y |
work_keys_str_mv | AT parrellobruce amachinelearningbasedserviceforestimatingqualityofgenomesusingpatric AT butlerrory amachinelearningbasedserviceforestimatingqualityofgenomesusingpatric AT chlenskiphilippe amachinelearningbasedserviceforestimatingqualityofgenomesusingpatric AT olsonrobert amachinelearningbasedserviceforestimatingqualityofgenomesusingpatric AT overbeekjamie amachinelearningbasedserviceforestimatingqualityofgenomesusingpatric AT puschgordond amachinelearningbasedserviceforestimatingqualityofgenomesusingpatric AT vonsteinveronika amachinelearningbasedserviceforestimatingqualityofgenomesusingpatric AT overbeekross amachinelearningbasedserviceforestimatingqualityofgenomesusingpatric AT parrellobruce machinelearningbasedserviceforestimatingqualityofgenomesusingpatric AT butlerrory machinelearningbasedserviceforestimatingqualityofgenomesusingpatric AT chlenskiphilippe machinelearningbasedserviceforestimatingqualityofgenomesusingpatric AT olsonrobert machinelearningbasedserviceforestimatingqualityofgenomesusingpatric AT overbeekjamie machinelearningbasedserviceforestimatingqualityofgenomesusingpatric AT puschgordond machinelearningbasedserviceforestimatingqualityofgenomesusingpatric AT vonsteinveronika machinelearningbasedserviceforestimatingqualityofgenomesusingpatric AT overbeekross machinelearningbasedserviceforestimatingqualityofgenomesusingpatric |