Cargando…

A Computer Simulator for Assessing Different Challenges and Strategies of de Novo Sequence Assembly

This study presents a new computer program for assessing the effects of different factors and sequencing strategies on de novo sequence assembly. The program uses reads from actual sequencing studies or from simulations with a reference genome that may also be real or simulated. The simulated reads...

Descripción completa

Detalles Bibliográficos
Autores principales: Knudsen, Bjarne, Forsberg, Roald, Miyamoto, Michael M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3954094/
https://www.ncbi.nlm.nih.gov/pubmed/24710045
http://dx.doi.org/10.3390/genes1020263
_version_ 1782307435209293824
author Knudsen, Bjarne
Forsberg, Roald
Miyamoto, Michael M.
author_facet Knudsen, Bjarne
Forsberg, Roald
Miyamoto, Michael M.
author_sort Knudsen, Bjarne
collection PubMed
description This study presents a new computer program for assessing the effects of different factors and sequencing strategies on de novo sequence assembly. The program uses reads from actual sequencing studies or from simulations with a reference genome that may also be real or simulated. The simulated reads can be created with our read simulator. They can be of differing length and coverage, consist of paired reads with varying distance, and include sequencing errors such as color space miscalls to imitate SOLiD data. The simulated or real reads are mapped to their reference genome and our assembly simulator is then used to obtain optimal assemblies that are limited only by the distribution of repeats. By way of this mapping, the assembly simulator determines which contigs are theoretically possible, or conversely (and perhaps more importantly), which are not. We illustrate the application and utility of our new simulation tools with several experiments that test the effects of genome complexity (repeats), read length and coverage, word size in De Bruijn graph assembly, and alternative sequencing strategies (e.g., BAC pooling) on sequence assemblies. These experiments highlight just some of the uses of our simulators in the experimental design of sequencing projects and in the further development of assembly algorithms.
format Online
Article
Text
id pubmed-3954094
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-39540942014-03-26 A Computer Simulator for Assessing Different Challenges and Strategies of de Novo Sequence Assembly Knudsen, Bjarne Forsberg, Roald Miyamoto, Michael M. Genes (Basel) Article This study presents a new computer program for assessing the effects of different factors and sequencing strategies on de novo sequence assembly. The program uses reads from actual sequencing studies or from simulations with a reference genome that may also be real or simulated. The simulated reads can be created with our read simulator. They can be of differing length and coverage, consist of paired reads with varying distance, and include sequencing errors such as color space miscalls to imitate SOLiD data. The simulated or real reads are mapped to their reference genome and our assembly simulator is then used to obtain optimal assemblies that are limited only by the distribution of repeats. By way of this mapping, the assembly simulator determines which contigs are theoretically possible, or conversely (and perhaps more importantly), which are not. We illustrate the application and utility of our new simulation tools with several experiments that test the effects of genome complexity (repeats), read length and coverage, word size in De Bruijn graph assembly, and alternative sequencing strategies (e.g., BAC pooling) on sequence assemblies. These experiments highlight just some of the uses of our simulators in the experimental design of sequencing projects and in the further development of assembly algorithms. MDPI 2010-09-13 /pmc/articles/PMC3954094/ /pubmed/24710045 http://dx.doi.org/10.3390/genes1020263 Text en © 2010 by the authors; licensee MDPI, Basel, Switzerland http://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Article
Knudsen, Bjarne
Forsberg, Roald
Miyamoto, Michael M.
A Computer Simulator for Assessing Different Challenges and Strategies of de Novo Sequence Assembly
title A Computer Simulator for Assessing Different Challenges and Strategies of de Novo Sequence Assembly
title_full A Computer Simulator for Assessing Different Challenges and Strategies of de Novo Sequence Assembly
title_fullStr A Computer Simulator for Assessing Different Challenges and Strategies of de Novo Sequence Assembly
title_full_unstemmed A Computer Simulator for Assessing Different Challenges and Strategies of de Novo Sequence Assembly
title_short A Computer Simulator for Assessing Different Challenges and Strategies of de Novo Sequence Assembly
title_sort computer simulator for assessing different challenges and strategies of de novo sequence assembly
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3954094/
https://www.ncbi.nlm.nih.gov/pubmed/24710045
http://dx.doi.org/10.3390/genes1020263
work_keys_str_mv AT knudsenbjarne acomputersimulatorforassessingdifferentchallengesandstrategiesofdenovosequenceassembly
AT forsbergroald acomputersimulatorforassessingdifferentchallengesandstrategiesofdenovosequenceassembly
AT miyamotomichaelm acomputersimulatorforassessingdifferentchallengesandstrategiesofdenovosequenceassembly
AT knudsenbjarne computersimulatorforassessingdifferentchallengesandstrategiesofdenovosequenceassembly
AT forsbergroald computersimulatorforassessingdifferentchallengesandstrategiesofdenovosequenceassembly
AT miyamotomichaelm computersimulatorforassessingdifferentchallengesandstrategiesofdenovosequenceassembly