Cargando…

Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance

BACKGROUND: Massively parallel DNA sequencing technologies have enabled the sequencing of several individual human genomes. These technologies are also being used in novel ways for mRNA expression profiling, genome-wide discovery of transcription-factor binding sites, small RNA discovery, etc. The m...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bashir, Ali, Bansal, Vikas, Bafna, Vineet
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2010
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3091630/ https://www.ncbi.nlm.nih.gov/pubmed/20565853 http://dx.doi.org/10.1186/1471-2164-11-385

_version_	1782203290559184896
author	Bashir, Ali Bansal, Vikas Bafna, Vineet
author_facet	Bashir, Ali Bansal, Vikas Bafna, Vineet
author_sort	Bashir, Ali
collection	PubMed
description	BACKGROUND: Massively parallel DNA sequencing technologies have enabled the sequencing of several individual human genomes. These technologies are also being used in novel ways for mRNA expression profiling, genome-wide discovery of transcription-factor binding sites, small RNA discovery, etc. The multitude of sequencing platforms, each with their unique characteristics, pose a number of design challenges, regarding the technology to be used and the depth of sequencing required for a particular sequencing application. Here we describe a number of analytical and empirical results to address design questions for two applications: detection of structural variations from paired-end sequencing and estimating mRNA transcript abundance. RESULTS: For structural variation, our results provide explicit trade-offs between the detection and resolution of rearrangement breakpoints, and the optimal mix of paired-read insert lengths. Specifically, we prove that optimal detection and resolution of breakpoints is achieved using a mix of exactly two insert library lengths. Furthermore, we derive explicit formulae to determine these insert length combinations, enabling a 15% improvement in breakpoint detection at the same experimental cost. On empirical short read data, these predictions show good concordance with Illumina 200 bp and 2 Kbp insert length libraries. For transcriptome sequencing, we determine the sequencing depth needed to detect rare transcripts from a small pilot study. With only 1 Million reads, we derive corrections that enable almost perfect prediction of the underlying expression probability distribution, and use this to predict the sequencing depth required to detect low expressed genes with greater than 95% probability. CONCLUSIONS: Together, our results form a generic framework for many design considerations related to high-throughput sequencing. We provide software tools http://bix.ucsd.edu/projects/NGS-DesignTools to derive platform independent guidelines for designing sequencing experiments (amount of sequencing, choice of insert length, mix of libraries) for novel applications of next generation sequencing.
format	Text
id	pubmed-3091630
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-30916302011-05-11 Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance Bashir, Ali Bansal, Vikas Bafna, Vineet BMC Genomics Methodology Article BACKGROUND: Massively parallel DNA sequencing technologies have enabled the sequencing of several individual human genomes. These technologies are also being used in novel ways for mRNA expression profiling, genome-wide discovery of transcription-factor binding sites, small RNA discovery, etc. The multitude of sequencing platforms, each with their unique characteristics, pose a number of design challenges, regarding the technology to be used and the depth of sequencing required for a particular sequencing application. Here we describe a number of analytical and empirical results to address design questions for two applications: detection of structural variations from paired-end sequencing and estimating mRNA transcript abundance. RESULTS: For structural variation, our results provide explicit trade-offs between the detection and resolution of rearrangement breakpoints, and the optimal mix of paired-read insert lengths. Specifically, we prove that optimal detection and resolution of breakpoints is achieved using a mix of exactly two insert library lengths. Furthermore, we derive explicit formulae to determine these insert length combinations, enabling a 15% improvement in breakpoint detection at the same experimental cost. On empirical short read data, these predictions show good concordance with Illumina 200 bp and 2 Kbp insert length libraries. For transcriptome sequencing, we determine the sequencing depth needed to detect rare transcripts from a small pilot study. With only 1 Million reads, we derive corrections that enable almost perfect prediction of the underlying expression probability distribution, and use this to predict the sequencing depth required to detect low expressed genes with greater than 95% probability. CONCLUSIONS: Together, our results form a generic framework for many design considerations related to high-throughput sequencing. We provide software tools http://bix.ucsd.edu/projects/NGS-DesignTools to derive platform independent guidelines for designing sequencing experiments (amount of sequencing, choice of insert length, mix of libraries) for novel applications of next generation sequencing. BioMed Central 2010-06-18 /pmc/articles/PMC3091630/ /pubmed/20565853 http://dx.doi.org/10.1186/1471-2164-11-385 Text en Copyright ©2010 Bashir et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Bashir, Ali Bansal, Vikas Bafna, Vineet Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance
title	Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance
title_full	Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance
title_fullStr	Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance
title_full_unstemmed	Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance
title_short	Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance
title_sort	designing deep sequencing experiments: detecting structural variation and estimating transcript abundance
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3091630/ https://www.ncbi.nlm.nih.gov/pubmed/20565853 http://dx.doi.org/10.1186/1471-2164-11-385
work_keys_str_mv	AT bashirali designingdeepsequencingexperimentsdetectingstructuralvariationandestimatingtranscriptabundance AT bansalvikas designingdeepsequencingexperimentsdetectingstructuralvariationandestimatingtranscriptabundance AT bafnavineet designingdeepsequencingexperimentsdetectingstructuralvariationandestimatingtranscriptabundance

Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance

Ejemplares similares