Cargando…

SInC: an accurate and fast error-model based simulator for SNPs, Indels and CNVs coupled with a read generator for short-read sequence data

BACKGROUND: The rapid advancements in the field of genome sequencing are aiding our understanding on many biological systems. In the last five years, computational biologists and bioinformatics specialists have come up with newer, better and more efficient tools towards the discovery, analysis and i...

Descripción completa

Detalles Bibliográficos
Autores principales: Pattnaik, Swetansu, Gupta, Saurabh, Rao, Arjun A, Panda, Binay
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3926339/
https://www.ncbi.nlm.nih.gov/pubmed/24495296
http://dx.doi.org/10.1186/1471-2105-15-40
_version_ 1782303962837286912
author Pattnaik, Swetansu
Gupta, Saurabh
Rao, Arjun A
Panda, Binay
author_facet Pattnaik, Swetansu
Gupta, Saurabh
Rao, Arjun A
Panda, Binay
author_sort Pattnaik, Swetansu
collection PubMed
description BACKGROUND: The rapid advancements in the field of genome sequencing are aiding our understanding on many biological systems. In the last five years, computational biologists and bioinformatics specialists have come up with newer, better and more efficient tools towards the discovery, analysis and interpretation of different genomic variants from high-throughput sequencing data. Availability of reliable simulated dataset is essential and is the first step towards testing any newly developed analytical tools for variant discovery. Although there are tools currently available that can simulate variants, none present the possibility of simulating all the three major types of variations (Single Nucleotide Polymorphisms, Insertions and Deletions and Copy Number Variations) and can generate reads taking a realistic error-model into consideration. Therefore, an efficient simulator and read generator is needed that can simulate variants taking the error rates of true biological samples into consideration. RESULTS: We report SInC (Snp, Indel and Cnv) an open-source variant simulator and read generator capable of simulating all the three common types of biological variants taking into account a distribution of base quality score from a most commonly used next-generation sequencing instrument from Illumina. SInC is capable of generating single- and paired-end reads with user-defined insert size and with high efficiency compared to the other existing tools. SInC, due to its multi-threaded capability during read generation, has a low time footprint. SInC is currently optimised to work in limited infrastructure setup and can efficiently exploit the commonly used quad-core desktop architecture to simulate short sequence reads with deep coverage for large genomes. CONCLUSIONS: We have come up with a user-friendly multi-variant simulator and read-generator tools called SInC. SInC can be downloaded from http://sourceforge.net/projects/sincsimulator.
format Online
Article
Text
id pubmed-3926339
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-39263392014-02-18 SInC: an accurate and fast error-model based simulator for SNPs, Indels and CNVs coupled with a read generator for short-read sequence data Pattnaik, Swetansu Gupta, Saurabh Rao, Arjun A Panda, Binay BMC Bioinformatics Software BACKGROUND: The rapid advancements in the field of genome sequencing are aiding our understanding on many biological systems. In the last five years, computational biologists and bioinformatics specialists have come up with newer, better and more efficient tools towards the discovery, analysis and interpretation of different genomic variants from high-throughput sequencing data. Availability of reliable simulated dataset is essential and is the first step towards testing any newly developed analytical tools for variant discovery. Although there are tools currently available that can simulate variants, none present the possibility of simulating all the three major types of variations (Single Nucleotide Polymorphisms, Insertions and Deletions and Copy Number Variations) and can generate reads taking a realistic error-model into consideration. Therefore, an efficient simulator and read generator is needed that can simulate variants taking the error rates of true biological samples into consideration. RESULTS: We report SInC (Snp, Indel and Cnv) an open-source variant simulator and read generator capable of simulating all the three common types of biological variants taking into account a distribution of base quality score from a most commonly used next-generation sequencing instrument from Illumina. SInC is capable of generating single- and paired-end reads with user-defined insert size and with high efficiency compared to the other existing tools. SInC, due to its multi-threaded capability during read generation, has a low time footprint. SInC is currently optimised to work in limited infrastructure setup and can efficiently exploit the commonly used quad-core desktop architecture to simulate short sequence reads with deep coverage for large genomes. CONCLUSIONS: We have come up with a user-friendly multi-variant simulator and read-generator tools called SInC. SInC can be downloaded from http://sourceforge.net/projects/sincsimulator. BioMed Central 2014-02-05 /pmc/articles/PMC3926339/ /pubmed/24495296 http://dx.doi.org/10.1186/1471-2105-15-40 Text en Copyright © 2014 Pattnaik et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle Software
Pattnaik, Swetansu
Gupta, Saurabh
Rao, Arjun A
Panda, Binay
SInC: an accurate and fast error-model based simulator for SNPs, Indels and CNVs coupled with a read generator for short-read sequence data
title SInC: an accurate and fast error-model based simulator for SNPs, Indels and CNVs coupled with a read generator for short-read sequence data
title_full SInC: an accurate and fast error-model based simulator for SNPs, Indels and CNVs coupled with a read generator for short-read sequence data
title_fullStr SInC: an accurate and fast error-model based simulator for SNPs, Indels and CNVs coupled with a read generator for short-read sequence data
title_full_unstemmed SInC: an accurate and fast error-model based simulator for SNPs, Indels and CNVs coupled with a read generator for short-read sequence data
title_short SInC: an accurate and fast error-model based simulator for SNPs, Indels and CNVs coupled with a read generator for short-read sequence data
title_sort sinc: an accurate and fast error-model based simulator for snps, indels and cnvs coupled with a read generator for short-read sequence data
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3926339/
https://www.ncbi.nlm.nih.gov/pubmed/24495296
http://dx.doi.org/10.1186/1471-2105-15-40
work_keys_str_mv AT pattnaikswetansu sincanaccurateandfasterrormodelbasedsimulatorforsnpsindelsandcnvscoupledwithareadgeneratorforshortreadsequencedata
AT guptasaurabh sincanaccurateandfasterrormodelbasedsimulatorforsnpsindelsandcnvscoupledwithareadgeneratorforshortreadsequencedata
AT raoarjuna sincanaccurateandfasterrormodelbasedsimulatorforsnpsindelsandcnvscoupledwithareadgeneratorforshortreadsequencedata
AT pandabinay sincanaccurateandfasterrormodelbasedsimulatorforsnpsindelsandcnvscoupledwithareadgeneratorforshortreadsequencedata