Cargando…

Ensemble analysis of adaptive compressed genome sequencing strategies

BACKGROUND: Acquiring genomes at single-cell resolution has many applications such as in the study of microbiota. However, deep sequencing and assembly of all of millions of cells in a sample is prohibitively costly. A property that can come to rescue is that deep sequencing of every cell should not...

Descripción completa

Detalles Bibliográficos
Autor principal:	Taghavi, Zeinab
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4221792/ https://www.ncbi.nlm.nih.gov/pubmed/25252999 http://dx.doi.org/10.1186/1471-2105-15-S9-S13

_version_	1782342933543911424
author	Taghavi, Zeinab
author_facet	Taghavi, Zeinab
author_sort	Taghavi, Zeinab
collection	PubMed
description	BACKGROUND: Acquiring genomes at single-cell resolution has many applications such as in the study of microbiota. However, deep sequencing and assembly of all of millions of cells in a sample is prohibitively costly. A property that can come to rescue is that deep sequencing of every cell should not be necessary to capture all distinct genomes, as the majority of cells are biological replicates. Biologically important samples are often sparse in that sense. In this paper, we propose an adaptive compressed method, also known as distilled sensing, to capture all distinct genomes in a sparse microbial community with reduced sequencing effort. As opposed to group testing in which the number of distinct events is often constant and sparsity is equivalent to rarity of an event, sparsity in our case means scarcity of distinct events in comparison to the data size. Previously, we introduced the problem and proposed a distilled sensing solution based on the breadth first search strategy. We simulated the whole process which constrained our ability to study the behavior of the algorithm for the entire ensemble due to its computational intensity. RESULTS: In this paper, we modify our previous breadth first search strategy and introduce the depth first search strategy. Instead of simulating the entire process, which is intractable for a large number of experiments, we provide a dynamic programming algorithm to analyze the behavior of the method for the entire ensemble. The ensemble analysis algorithm recursively calculates the probability of capturing every distinct genome and also the expected total sequenced nucleotides for a given population profile. Our results suggest that the expected total sequenced nucleotides grows proportional to log of the number of cells and proportional linearly with the number of distinct genomes. The probability of missing a genome depends on its abundance and the ratio of its size over the maximum genome size in the sample. The modified resource allocation method accommodates a parameter to control that probability. AVAILABILITY: The squeezambler 2.0 C++ source code is available at http://sourceforge.net/projects/hyda/. The ensemble analysis MATLAB code is available at http://sourceforge.net/projects/distilled-sequencing/.
format	Online Article Text
id	pubmed-4221792
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-42217922014-11-10 Ensemble analysis of adaptive compressed genome sequencing strategies Taghavi, Zeinab BMC Bioinformatics Proceedings BACKGROUND: Acquiring genomes at single-cell resolution has many applications such as in the study of microbiota. However, deep sequencing and assembly of all of millions of cells in a sample is prohibitively costly. A property that can come to rescue is that deep sequencing of every cell should not be necessary to capture all distinct genomes, as the majority of cells are biological replicates. Biologically important samples are often sparse in that sense. In this paper, we propose an adaptive compressed method, also known as distilled sensing, to capture all distinct genomes in a sparse microbial community with reduced sequencing effort. As opposed to group testing in which the number of distinct events is often constant and sparsity is equivalent to rarity of an event, sparsity in our case means scarcity of distinct events in comparison to the data size. Previously, we introduced the problem and proposed a distilled sensing solution based on the breadth first search strategy. We simulated the whole process which constrained our ability to study the behavior of the algorithm for the entire ensemble due to its computational intensity. RESULTS: In this paper, we modify our previous breadth first search strategy and introduce the depth first search strategy. Instead of simulating the entire process, which is intractable for a large number of experiments, we provide a dynamic programming algorithm to analyze the behavior of the method for the entire ensemble. The ensemble analysis algorithm recursively calculates the probability of capturing every distinct genome and also the expected total sequenced nucleotides for a given population profile. Our results suggest that the expected total sequenced nucleotides grows proportional to log of the number of cells and proportional linearly with the number of distinct genomes. The probability of missing a genome depends on its abundance and the ratio of its size over the maximum genome size in the sample. The modified resource allocation method accommodates a parameter to control that probability. AVAILABILITY: The squeezambler 2.0 C++ source code is available at http://sourceforge.net/projects/hyda/. The ensemble analysis MATLAB code is available at http://sourceforge.net/projects/distilled-sequencing/. BioMed Central 2014-09-10 /pmc/articles/PMC4221792/ /pubmed/25252999 http://dx.doi.org/10.1186/1471-2105-15-S9-S13 Text en Copyright © 2014 Taghavi; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Proceedings Taghavi, Zeinab Ensemble analysis of adaptive compressed genome sequencing strategies
title	Ensemble analysis of adaptive compressed genome sequencing strategies
title_full	Ensemble analysis of adaptive compressed genome sequencing strategies
title_fullStr	Ensemble analysis of adaptive compressed genome sequencing strategies
title_full_unstemmed	Ensemble analysis of adaptive compressed genome sequencing strategies
title_short	Ensemble analysis of adaptive compressed genome sequencing strategies
title_sort	ensemble analysis of adaptive compressed genome sequencing strategies
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4221792/ https://www.ncbi.nlm.nih.gov/pubmed/25252999 http://dx.doi.org/10.1186/1471-2105-15-S9-S13
work_keys_str_mv	AT taghavizeinab ensembleanalysisofadaptivecompressedgenomesequencingstrategies

Ensemble analysis of adaptive compressed genome sequencing strategies

Ejemplares similares