Cargando…

A machine learning-based service for estimating quality of genomes using PATRIC

BACKGROUND: Recent advances in high-volume sequencing technology and mining of genomes from metagenomic samples call for rapid and reliable genome quality evaluation. The current release of the PATRIC database contains over 220,000 genomes, and current metagenomic technology supports assemblies of m...

Descripción completa

Detalles Bibliográficos
Autores principales: Parrello, Bruce, Butler, Rory, Chlenski, Philippe, Olson, Robert, Overbeek, Jamie, Pusch, Gordon D., Vonstein, Veronika, Overbeek, Ross
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6775668/
https://www.ncbi.nlm.nih.gov/pubmed/31581946
http://dx.doi.org/10.1186/s12859-019-3068-y
_version_ 1783456297044148224
author Parrello, Bruce
Butler, Rory
Chlenski, Philippe
Olson, Robert
Overbeek, Jamie
Pusch, Gordon D.
Vonstein, Veronika
Overbeek, Ross
author_facet Parrello, Bruce
Butler, Rory
Chlenski, Philippe
Olson, Robert
Overbeek, Jamie
Pusch, Gordon D.
Vonstein, Veronika
Overbeek, Ross
author_sort Parrello, Bruce
collection PubMed
description BACKGROUND: Recent advances in high-volume sequencing technology and mining of genomes from metagenomic samples call for rapid and reliable genome quality evaluation. The current release of the PATRIC database contains over 220,000 genomes, and current metagenomic technology supports assemblies of many draft-quality genomes from a single sample, most of which will be novel. DESCRIPTION: We have added two quality assessment tools to the PATRIC annotation pipeline. EvalCon uses supervised machine learning to calculate an annotation consistency score. EvalG implements a variant of the CheckM algorithm to estimate contamination and completeness of an annotated genome.We report on the performance of these tools and the potential utility of the consistency score. Additionally, we provide contamination, completeness, and consistency measures for all genomes in PATRIC and in a recent set of metagenomic assemblies. CONCLUSION: EvalG and EvalCon facilitate the rapid quality control and exploration of PATRIC-annotated draft genomes.
format Online
Article
Text
id pubmed-6775668
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-67756682019-10-07 A machine learning-based service for estimating quality of genomes using PATRIC Parrello, Bruce Butler, Rory Chlenski, Philippe Olson, Robert Overbeek, Jamie Pusch, Gordon D. Vonstein, Veronika Overbeek, Ross BMC Bioinformatics Database BACKGROUND: Recent advances in high-volume sequencing technology and mining of genomes from metagenomic samples call for rapid and reliable genome quality evaluation. The current release of the PATRIC database contains over 220,000 genomes, and current metagenomic technology supports assemblies of many draft-quality genomes from a single sample, most of which will be novel. DESCRIPTION: We have added two quality assessment tools to the PATRIC annotation pipeline. EvalCon uses supervised machine learning to calculate an annotation consistency score. EvalG implements a variant of the CheckM algorithm to estimate contamination and completeness of an annotated genome.We report on the performance of these tools and the potential utility of the consistency score. Additionally, we provide contamination, completeness, and consistency measures for all genomes in PATRIC and in a recent set of metagenomic assemblies. CONCLUSION: EvalG and EvalCon facilitate the rapid quality control and exploration of PATRIC-annotated draft genomes. BioMed Central 2019-10-03 /pmc/articles/PMC6775668/ /pubmed/31581946 http://dx.doi.org/10.1186/s12859-019-3068-y Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Database
Parrello, Bruce
Butler, Rory
Chlenski, Philippe
Olson, Robert
Overbeek, Jamie
Pusch, Gordon D.
Vonstein, Veronika
Overbeek, Ross
A machine learning-based service for estimating quality of genomes using PATRIC
title A machine learning-based service for estimating quality of genomes using PATRIC
title_full A machine learning-based service for estimating quality of genomes using PATRIC
title_fullStr A machine learning-based service for estimating quality of genomes using PATRIC
title_full_unstemmed A machine learning-based service for estimating quality of genomes using PATRIC
title_short A machine learning-based service for estimating quality of genomes using PATRIC
title_sort machine learning-based service for estimating quality of genomes using patric
topic Database
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6775668/
https://www.ncbi.nlm.nih.gov/pubmed/31581946
http://dx.doi.org/10.1186/s12859-019-3068-y
work_keys_str_mv AT parrellobruce amachinelearningbasedserviceforestimatingqualityofgenomesusingpatric
AT butlerrory amachinelearningbasedserviceforestimatingqualityofgenomesusingpatric
AT chlenskiphilippe amachinelearningbasedserviceforestimatingqualityofgenomesusingpatric
AT olsonrobert amachinelearningbasedserviceforestimatingqualityofgenomesusingpatric
AT overbeekjamie amachinelearningbasedserviceforestimatingqualityofgenomesusingpatric
AT puschgordond amachinelearningbasedserviceforestimatingqualityofgenomesusingpatric
AT vonsteinveronika amachinelearningbasedserviceforestimatingqualityofgenomesusingpatric
AT overbeekross amachinelearningbasedserviceforestimatingqualityofgenomesusingpatric
AT parrellobruce machinelearningbasedserviceforestimatingqualityofgenomesusingpatric
AT butlerrory machinelearningbasedserviceforestimatingqualityofgenomesusingpatric
AT chlenskiphilippe machinelearningbasedserviceforestimatingqualityofgenomesusingpatric
AT olsonrobert machinelearningbasedserviceforestimatingqualityofgenomesusingpatric
AT overbeekjamie machinelearningbasedserviceforestimatingqualityofgenomesusingpatric
AT puschgordond machinelearningbasedserviceforestimatingqualityofgenomesusingpatric
AT vonsteinveronika machinelearningbasedserviceforestimatingqualityofgenomesusingpatric
AT overbeekross machinelearningbasedserviceforestimatingqualityofgenomesusingpatric