Cargando…

Integrating Sequencing Technologies in Personal Genomics: Optimal Low Cost Reconstruction of Structural Variants

The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arra...

Descripción completa

Detalles Bibliográficos
Autores principales: Du, Jiang, Bjornson, Robert D., Zhang, Zhengdong D., Kong, Yong, Snyder, Michael, Gerstein, Mark B.
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2700963/
https://www.ncbi.nlm.nih.gov/pubmed/19593373
http://dx.doi.org/10.1371/journal.pcbi.1000432
_version_ 1782168665167233024
author Du, Jiang
Bjornson, Robert D.
Zhang, Zhengdong D.
Kong, Yong
Snyder, Michael
Gerstein, Mark B.
author_facet Du, Jiang
Bjornson, Robert D.
Zhang, Zhengdong D.
Kong, Yong
Snyder, Michael
Gerstein, Mark B.
author_sort Du, Jiang
collection PubMed
description The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen), with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs). SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome.) To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of human genomes at maximum accuracy and low cost.
format Text
id pubmed-2700963
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-27009632009-07-10 Integrating Sequencing Technologies in Personal Genomics: Optimal Low Cost Reconstruction of Structural Variants Du, Jiang Bjornson, Robert D. Zhang, Zhengdong D. Kong, Yong Snyder, Michael Gerstein, Mark B. PLoS Comput Biol Research Article The goal of human genome re-sequencing is obtaining an accurate assembly of an individual's genome. Recently, there has been great excitement in the development of many technologies for this (e.g. medium and short read sequencing from companies such as 454 and SOLiD, and high-density oligo-arrays from Affymetrix and NimbelGen), with even more expected to appear. The costs and sensitivities of these technologies differ considerably from each other. As an important goal of personal genomics is to reduce the cost of re-sequencing to an affordable point, it is worthwhile to consider optimally integrating technologies. Here, we build a simulation toolbox that will help us optimally combine different technologies for genome re-sequencing, especially in reconstructing large structural variants (SVs). SV reconstruction is considered the most challenging step in human genome re-sequencing. (It is sometimes even harder than de novo assembly of small genomes because of the duplications and repetitive sequences in the human genome.) To this end, we formulate canonical problems that are representative of issues in reconstruction and are of small enough scale to be computationally tractable and simulatable. Using semi-realistic simulations, we show how we can combine different technologies to optimally solve the assembly at low cost. With mapability maps, our simulations efficiently handle the inhomogeneous repeat-containing structure of the human genome and the computational complexity of practical assembly algorithms. They quantitatively show how combining different read lengths is more cost-effective than using one length, how an optimal mixed sequencing strategy for reconstructing large novel SVs usually also gives accurate detection of SNPs/indels, how paired-end reads can improve reconstruction efficiency, and how adding in arrays is more efficient than just sequencing for disentangling some complex SVs. Our strategy should facilitate the sequencing of human genomes at maximum accuracy and low cost. Public Library of Science 2009-07-10 /pmc/articles/PMC2700963/ /pubmed/19593373 http://dx.doi.org/10.1371/journal.pcbi.1000432 Text en Du et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Du, Jiang
Bjornson, Robert D.
Zhang, Zhengdong D.
Kong, Yong
Snyder, Michael
Gerstein, Mark B.
Integrating Sequencing Technologies in Personal Genomics: Optimal Low Cost Reconstruction of Structural Variants
title Integrating Sequencing Technologies in Personal Genomics: Optimal Low Cost Reconstruction of Structural Variants
title_full Integrating Sequencing Technologies in Personal Genomics: Optimal Low Cost Reconstruction of Structural Variants
title_fullStr Integrating Sequencing Technologies in Personal Genomics: Optimal Low Cost Reconstruction of Structural Variants
title_full_unstemmed Integrating Sequencing Technologies in Personal Genomics: Optimal Low Cost Reconstruction of Structural Variants
title_short Integrating Sequencing Technologies in Personal Genomics: Optimal Low Cost Reconstruction of Structural Variants
title_sort integrating sequencing technologies in personal genomics: optimal low cost reconstruction of structural variants
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2700963/
https://www.ncbi.nlm.nih.gov/pubmed/19593373
http://dx.doi.org/10.1371/journal.pcbi.1000432
work_keys_str_mv AT dujiang integratingsequencingtechnologiesinpersonalgenomicsoptimallowcostreconstructionofstructuralvariants
AT bjornsonrobertd integratingsequencingtechnologiesinpersonalgenomicsoptimallowcostreconstructionofstructuralvariants
AT zhangzhengdongd integratingsequencingtechnologiesinpersonalgenomicsoptimallowcostreconstructionofstructuralvariants
AT kongyong integratingsequencingtechnologiesinpersonalgenomicsoptimallowcostreconstructionofstructuralvariants
AT snydermichael integratingsequencingtechnologiesinpersonalgenomicsoptimallowcostreconstructionofstructuralvariants
AT gersteinmarkb integratingsequencingtechnologiesinpersonalgenomicsoptimallowcostreconstructionofstructuralvariants