Cargando…

A big data approach to metagenomics for all-food-sequencing

BACKGROUND: All-Food-Sequencing (AFS) is an untargeted metagenomic sequencing method that allows for the detection and quantification of food ingredients including animals, plants, and microbiota. While this approach avoids some of the shortcomings of targeted PCR-based methods, it requires the comp...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kobus, Robin, Abuín, José M., Müller, André, Hellmann, Sören Lukas, Pichel, Juan C., Pena, Tomás F., Hildebrandt, Andreas, Hankeln, Thomas, Schmidt, Bertil
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2020
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7069206/ https://www.ncbi.nlm.nih.gov/pubmed/32164527 http://dx.doi.org/10.1186/s12859-020-3429-6

_version_	1783505736492384256
author	Kobus, Robin Abuín, José M. Müller, André Hellmann, Sören Lukas Pichel, Juan C. Pena, Tomás F. Hildebrandt, Andreas Hankeln, Thomas Schmidt, Bertil
author_facet	Kobus, Robin Abuín, José M. Müller, André Hellmann, Sören Lukas Pichel, Juan C. Pena, Tomás F. Hildebrandt, Andreas Hankeln, Thomas Schmidt, Bertil
author_sort	Kobus, Robin
collection	PubMed
description	BACKGROUND: All-Food-Sequencing (AFS) is an untargeted metagenomic sequencing method that allows for the detection and quantification of food ingredients including animals, plants, and microbiota. While this approach avoids some of the shortcomings of targeted PCR-based methods, it requires the comparison of sequence reads to large collections of reference genomes. The steadily increasing amount of available reference genomes establishes the need for efficient big data approaches. RESULTS: We introduce an alignment-free k-mer based method for detection and quantification of species composition in food and other complex biological matters. It is orders-of-magnitude faster than our previous alignment-based AFS pipeline. In comparison to the established tools CLARK, Kraken2, and Kraken2+Bracken it is superior in terms of false-positive rate and quantification accuracy. Furthermore, the usage of an efficient database partitioning scheme allows for the processing of massive collections of reference genomes with reduced memory requirements on a workstation (AFS-MetaCache) or on a Spark-based compute cluster (MetaCacheSpark). CONCLUSIONS: We present a fast yet accurate screening method for whole genome shotgun sequencing-based biosurveillance applications such as food testing. By relying on a big data approach it can scale efficiently towards large-scale collections of complex eukaryotic and bacterial reference genomes. AFS-MetaCache and MetaCacheSpark are suitable tools for broad-scale metagenomic screening applications. They are available at https://muellan.github.io/metacache/afs.html (C++ version for a workstation) and https://github.com/jmabuin/MetaCacheSpark (Spark version for big data clusters).
format	Online Article Text
id	pubmed-7069206
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-70692062020-03-18 A big data approach to metagenomics for all-food-sequencing Kobus, Robin Abuín, José M. Müller, André Hellmann, Sören Lukas Pichel, Juan C. Pena, Tomás F. Hildebrandt, Andreas Hankeln, Thomas Schmidt, Bertil BMC Bioinformatics Software BACKGROUND: All-Food-Sequencing (AFS) is an untargeted metagenomic sequencing method that allows for the detection and quantification of food ingredients including animals, plants, and microbiota. While this approach avoids some of the shortcomings of targeted PCR-based methods, it requires the comparison of sequence reads to large collections of reference genomes. The steadily increasing amount of available reference genomes establishes the need for efficient big data approaches. RESULTS: We introduce an alignment-free k-mer based method for detection and quantification of species composition in food and other complex biological matters. It is orders-of-magnitude faster than our previous alignment-based AFS pipeline. In comparison to the established tools CLARK, Kraken2, and Kraken2+Bracken it is superior in terms of false-positive rate and quantification accuracy. Furthermore, the usage of an efficient database partitioning scheme allows for the processing of massive collections of reference genomes with reduced memory requirements on a workstation (AFS-MetaCache) or on a Spark-based compute cluster (MetaCacheSpark). CONCLUSIONS: We present a fast yet accurate screening method for whole genome shotgun sequencing-based biosurveillance applications such as food testing. By relying on a big data approach it can scale efficiently towards large-scale collections of complex eukaryotic and bacterial reference genomes. AFS-MetaCache and MetaCacheSpark are suitable tools for broad-scale metagenomic screening applications. They are available at https://muellan.github.io/metacache/afs.html (C++ version for a workstation) and https://github.com/jmabuin/MetaCacheSpark (Spark version for big data clusters). BioMed Central 2020-03-12 /pmc/articles/PMC7069206/ /pubmed/32164527 http://dx.doi.org/10.1186/s12859-020-3429-6 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Software Kobus, Robin Abuín, José M. Müller, André Hellmann, Sören Lukas Pichel, Juan C. Pena, Tomás F. Hildebrandt, Andreas Hankeln, Thomas Schmidt, Bertil A big data approach to metagenomics for all-food-sequencing
title	A big data approach to metagenomics for all-food-sequencing
title_full	A big data approach to metagenomics for all-food-sequencing
title_fullStr	A big data approach to metagenomics for all-food-sequencing
title_full_unstemmed	A big data approach to metagenomics for all-food-sequencing
title_short	A big data approach to metagenomics for all-food-sequencing
title_sort	big data approach to metagenomics for all-food-sequencing
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7069206/ https://www.ncbi.nlm.nih.gov/pubmed/32164527 http://dx.doi.org/10.1186/s12859-020-3429-6
work_keys_str_mv	AT kobusrobin abigdataapproachtometagenomicsforallfoodsequencing AT abuinjosem abigdataapproachtometagenomicsforallfoodsequencing AT mullerandre abigdataapproachtometagenomicsforallfoodsequencing AT hellmannsorenlukas abigdataapproachtometagenomicsforallfoodsequencing AT picheljuanc abigdataapproachtometagenomicsforallfoodsequencing AT penatomasf abigdataapproachtometagenomicsforallfoodsequencing AT hildebrandtandreas abigdataapproachtometagenomicsforallfoodsequencing AT hankelnthomas abigdataapproachtometagenomicsforallfoodsequencing AT schmidtbertil abigdataapproachtometagenomicsforallfoodsequencing AT kobusrobin bigdataapproachtometagenomicsforallfoodsequencing AT abuinjosem bigdataapproachtometagenomicsforallfoodsequencing AT mullerandre bigdataapproachtometagenomicsforallfoodsequencing AT hellmannsorenlukas bigdataapproachtometagenomicsforallfoodsequencing AT picheljuanc bigdataapproachtometagenomicsforallfoodsequencing AT penatomasf bigdataapproachtometagenomicsforallfoodsequencing AT hildebrandtandreas bigdataapproachtometagenomicsforallfoodsequencing AT hankelnthomas bigdataapproachtometagenomicsforallfoodsequencing AT schmidtbertil bigdataapproachtometagenomicsforallfoodsequencing

A big data approach to metagenomics for all-food-sequencing

Ejemplares similares