Cargando…

Low-Bandwidth and Non-Compute Intensive Remote Identification of Microbes from Raw Sequencing Reads

Cheap DNA sequencing may soon become routine not only for human genomes but also for practically anything requiring the identification of living organisms from their DNA: tracking of infectious agents, control of food products, bioreactors, or environmental samples. We propose a novel general approa...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gautier, Laurent, Lund, Ole
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2013
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3877093/ https://www.ncbi.nlm.nih.gov/pubmed/24391826 http://dx.doi.org/10.1371/journal.pone.0083784

_version_	1782297590447996928
author	Gautier, Laurent Lund, Ole
author_facet	Gautier, Laurent Lund, Ole
author_sort	Gautier, Laurent
collection	PubMed
description	Cheap DNA sequencing may soon become routine not only for human genomes but also for practically anything requiring the identification of living organisms from their DNA: tracking of infectious agents, control of food products, bioreactors, or environmental samples. We propose a novel general approach to the analysis of sequencing data where a reference genome does not have to be specified. Using a distributed architecture we are able to query a remote server for hints about what the reference might be, transferring a relatively small amount of data. Our system consists of a server with known reference DNA indexed, and a client with raw sequencing reads. The client sends a sample of unidentified reads, and in return receives a list of matching references. Sequences for the references can be retrieved and used for exhaustive computation on the reads, such as alignment. To demonstrate this approach we have implemented a web server, indexing tens of thousands of publicly available genomes and genomic regions from various organisms and returning lists of matching hits from query sequencing reads. We have also implemented two clients: one running in a web browser, and one as a python script. Both are able to handle a large number of sequencing reads and from portable devices (the browser-based running on a tablet), perform its task within seconds, and consume an amount of bandwidth compatible with mobile broadband networks. Such client-server approaches could develop in the future, allowing a fully automated processing of sequencing data and routine instant quality check of sequencing runs from desktop sequencers. A web access is available at http://tapir.cbs.dtu.dk. The source code for a python command-line client, a server, and supplementary data are available at http://bit.ly/1aURxkc.
format	Online Article Text
id	pubmed-3877093
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-38770932014-01-03 Low-Bandwidth and Non-Compute Intensive Remote Identification of Microbes from Raw Sequencing Reads Gautier, Laurent Lund, Ole PLoS One Research Article Cheap DNA sequencing may soon become routine not only for human genomes but also for practically anything requiring the identification of living organisms from their DNA: tracking of infectious agents, control of food products, bioreactors, or environmental samples. We propose a novel general approach to the analysis of sequencing data where a reference genome does not have to be specified. Using a distributed architecture we are able to query a remote server for hints about what the reference might be, transferring a relatively small amount of data. Our system consists of a server with known reference DNA indexed, and a client with raw sequencing reads. The client sends a sample of unidentified reads, and in return receives a list of matching references. Sequences for the references can be retrieved and used for exhaustive computation on the reads, such as alignment. To demonstrate this approach we have implemented a web server, indexing tens of thousands of publicly available genomes and genomic regions from various organisms and returning lists of matching hits from query sequencing reads. We have also implemented two clients: one running in a web browser, and one as a python script. Both are able to handle a large number of sequencing reads and from portable devices (the browser-based running on a tablet), perform its task within seconds, and consume an amount of bandwidth compatible with mobile broadband networks. Such client-server approaches could develop in the future, allowing a fully automated processing of sequencing data and routine instant quality check of sequencing runs from desktop sequencers. A web access is available at http://tapir.cbs.dtu.dk. The source code for a python command-line client, a server, and supplementary data are available at http://bit.ly/1aURxkc. Public Library of Science 2013-12-31 /pmc/articles/PMC3877093/ /pubmed/24391826 http://dx.doi.org/10.1371/journal.pone.0083784 Text en © 2013 Gautier, Lund http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Gautier, Laurent Lund, Ole Low-Bandwidth and Non-Compute Intensive Remote Identification of Microbes from Raw Sequencing Reads
title	Low-Bandwidth and Non-Compute Intensive Remote Identification of Microbes from Raw Sequencing Reads
title_full	Low-Bandwidth and Non-Compute Intensive Remote Identification of Microbes from Raw Sequencing Reads
title_fullStr	Low-Bandwidth and Non-Compute Intensive Remote Identification of Microbes from Raw Sequencing Reads
title_full_unstemmed	Low-Bandwidth and Non-Compute Intensive Remote Identification of Microbes from Raw Sequencing Reads
title_short	Low-Bandwidth and Non-Compute Intensive Remote Identification of Microbes from Raw Sequencing Reads
title_sort	low-bandwidth and non-compute intensive remote identification of microbes from raw sequencing reads
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3877093/ https://www.ncbi.nlm.nih.gov/pubmed/24391826 http://dx.doi.org/10.1371/journal.pone.0083784
work_keys_str_mv	AT gautierlaurent lowbandwidthandnoncomputeintensiveremoteidentificationofmicrobesfromrawsequencingreads AT lundole lowbandwidthandnoncomputeintensiveremoteidentificationofmicrobesfromrawsequencingreads

Low-Bandwidth and Non-Compute Intensive Remote Identification of Microbes from Raw Sequencing Reads

Ejemplares similares