Cargando…

Higher recall in metagenomic sequence classification exploiting overlapping reads

BACKGROUND: In recent years several different fields, such as ecology, medicine and microbiology, have experienced an unprecedented development due to the possibility of direct sequencing of microbioimic samples. Among problems that researchers in the field have to deal with, taxonomic classificatio...

Descripción completa

Detalles Bibliográficos
Autores principales:	Girotto, Samuele, Comin, Matteo, Pizzi, Cinzia
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2017
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5731601/ https://www.ncbi.nlm.nih.gov/pubmed/29244002 http://dx.doi.org/10.1186/s12864-017-4273-6

_version_	1783286533473697792
author	Girotto, Samuele Comin, Matteo Pizzi, Cinzia
author_facet	Girotto, Samuele Comin, Matteo Pizzi, Cinzia
author_sort	Girotto, Samuele
collection	PubMed
description	BACKGROUND: In recent years several different fields, such as ecology, medicine and microbiology, have experienced an unprecedented development due to the possibility of direct sequencing of microbioimic samples. Among problems that researchers in the field have to deal with, taxonomic classification of metagenomic reads is one of the most challenging. State of the art methods classify single reads with almost 100% precision. However, very often, the performance in terms of recall falls at about 50%. As a consequence, state-of-the-art methods are indeed capable of correctly classify only half of the reads in the sample. How to achieve better performances in terms of overall quality of classification remains a largely unsolved problem. RESULTS: In this paper we propose a method for metagenomics CLassification Improvement with Overlapping Reads (CLIOR), that exploits the information carried by the overlapping reads graph of the input read dataset to improve recall, f-measure, and the estimated abundance of species. In this work, we applied CLIOR on top of the classification produced by the classifier Clark-l. Experiments on simulated and synthetic metagenomes show that CLIOR can lead to substantial improvement of the recall rate, sometimes doubling it. On average, on simulated datasets, the increase of recall is paired with an higher precision too, while on synthetic datasets it comes at expenses of a small loss of precision. On experiments on real metagenomes CLIOR is able to assign many more reads while keeping the abundance ratios in line with previous studies. CONCLUSIONS: Our results showed that with CLIOR is possible to boost the recall of a state-of-the-art metagenomic classifier by inferring and/or correcting the assignment of reads with missing or erroneous labeling. CLIOR is not restricted to the reads classification algorithm used in our experiments, but it may be applied to other methods too. Finally, CLIOR does not need large computational resources, and it can be run on a laptop. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-4273-6) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5731601
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-57316012017-12-19 Higher recall in metagenomic sequence classification exploiting overlapping reads Girotto, Samuele Comin, Matteo Pizzi, Cinzia BMC Genomics Research BACKGROUND: In recent years several different fields, such as ecology, medicine and microbiology, have experienced an unprecedented development due to the possibility of direct sequencing of microbioimic samples. Among problems that researchers in the field have to deal with, taxonomic classification of metagenomic reads is one of the most challenging. State of the art methods classify single reads with almost 100% precision. However, very often, the performance in terms of recall falls at about 50%. As a consequence, state-of-the-art methods are indeed capable of correctly classify only half of the reads in the sample. How to achieve better performances in terms of overall quality of classification remains a largely unsolved problem. RESULTS: In this paper we propose a method for metagenomics CLassification Improvement with Overlapping Reads (CLIOR), that exploits the information carried by the overlapping reads graph of the input read dataset to improve recall, f-measure, and the estimated abundance of species. In this work, we applied CLIOR on top of the classification produced by the classifier Clark-l. Experiments on simulated and synthetic metagenomes show that CLIOR can lead to substantial improvement of the recall rate, sometimes doubling it. On average, on simulated datasets, the increase of recall is paired with an higher precision too, while on synthetic datasets it comes at expenses of a small loss of precision. On experiments on real metagenomes CLIOR is able to assign many more reads while keeping the abundance ratios in line with previous studies. CONCLUSIONS: Our results showed that with CLIOR is possible to boost the recall of a state-of-the-art metagenomic classifier by inferring and/or correcting the assignment of reads with missing or erroneous labeling. CLIOR is not restricted to the reads classification algorithm used in our experiments, but it may be applied to other methods too. Finally, CLIOR does not need large computational resources, and it can be run on a laptop. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-4273-6) contains supplementary material, which is available to authorized users. BioMed Central 2017-12-06 /pmc/articles/PMC5731601/ /pubmed/29244002 http://dx.doi.org/10.1186/s12864-017-4273-6 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Girotto, Samuele Comin, Matteo Pizzi, Cinzia Higher recall in metagenomic sequence classification exploiting overlapping reads
title	Higher recall in metagenomic sequence classification exploiting overlapping reads
title_full	Higher recall in metagenomic sequence classification exploiting overlapping reads
title_fullStr	Higher recall in metagenomic sequence classification exploiting overlapping reads
title_full_unstemmed	Higher recall in metagenomic sequence classification exploiting overlapping reads
title_short	Higher recall in metagenomic sequence classification exploiting overlapping reads
title_sort	higher recall in metagenomic sequence classification exploiting overlapping reads
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5731601/ https://www.ncbi.nlm.nih.gov/pubmed/29244002 http://dx.doi.org/10.1186/s12864-017-4273-6
work_keys_str_mv	AT girottosamuele higherrecallinmetagenomicsequenceclassificationexploitingoverlappingreads AT cominmatteo higherrecallinmetagenomicsequenceclassificationexploitingoverlappingreads AT pizzicinzia higherrecallinmetagenomicsequenceclassificationexploitingoverlappingreads

Higher recall in metagenomic sequence classification exploiting overlapping reads

Ejemplares similares