Cargando…

Generation of Artificial FASTQ Files to Evaluate the Performance of Next-Generation Sequencing Pipelines

Pipelines for the analysis of Next-Generation Sequencing (NGS) data are generally composed of a set of different publicly available software, configured together in order to map short reads of a genome and call variants. The fidelity of pipelines is variable. We have developed ArtificialFastqGenerat...

Descripción completa

Detalles Bibliográficos
Autores principales: Frampton, Matthew, Houlston, Richard
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3495771/
https://www.ncbi.nlm.nih.gov/pubmed/23152858
http://dx.doi.org/10.1371/journal.pone.0049110
_version_ 1782249564741304320
author Frampton, Matthew
Houlston, Richard
author_facet Frampton, Matthew
Houlston, Richard
author_sort Frampton, Matthew
collection PubMed
description Pipelines for the analysis of Next-Generation Sequencing (NGS) data are generally composed of a set of different publicly available software, configured together in order to map short reads of a genome and call variants. The fidelity of pipelines is variable. We have developed ArtificialFastqGenerator, which takes a reference genome sequence as input and outputs artificial paired-end FASTQ files containing Phred quality scores. Since these artificial FASTQs are derived from the reference genome, it provides a gold-standard for read-alignment and variant-calling, thereby enabling the performance of any NGS pipeline to be evaluated. The user can customise DNA template/read length, the modelling of coverage based on GC content, whether to use real Phred base quality scores taken from existing FASTQ files, and whether to simulate sequencing errors. Detailed coverage and error summary statistics are outputted. Here we describe ArtificialFastqGenerator and illustrate its implementation in evaluating a typical bespoke NGS analysis pipeline under different experimental conditions. ArtificialFastqGenerator was released in January 2012. Source code, example files and binaries are freely available under the terms of the GNU General Public License v3.0. from https://sourceforge.net/projects/artfastqgen/.
format Online
Article
Text
id pubmed-3495771
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-34957712012-11-14 Generation of Artificial FASTQ Files to Evaluate the Performance of Next-Generation Sequencing Pipelines Frampton, Matthew Houlston, Richard PLoS One Research Article Pipelines for the analysis of Next-Generation Sequencing (NGS) data are generally composed of a set of different publicly available software, configured together in order to map short reads of a genome and call variants. The fidelity of pipelines is variable. We have developed ArtificialFastqGenerator, which takes a reference genome sequence as input and outputs artificial paired-end FASTQ files containing Phred quality scores. Since these artificial FASTQs are derived from the reference genome, it provides a gold-standard for read-alignment and variant-calling, thereby enabling the performance of any NGS pipeline to be evaluated. The user can customise DNA template/read length, the modelling of coverage based on GC content, whether to use real Phred base quality scores taken from existing FASTQ files, and whether to simulate sequencing errors. Detailed coverage and error summary statistics are outputted. Here we describe ArtificialFastqGenerator and illustrate its implementation in evaluating a typical bespoke NGS analysis pipeline under different experimental conditions. ArtificialFastqGenerator was released in January 2012. Source code, example files and binaries are freely available under the terms of the GNU General Public License v3.0. from https://sourceforge.net/projects/artfastqgen/. Public Library of Science 2012-11-12 /pmc/articles/PMC3495771/ /pubmed/23152858 http://dx.doi.org/10.1371/journal.pone.0049110 Text en © 2012 Frampton, Houlston http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Frampton, Matthew
Houlston, Richard
Generation of Artificial FASTQ Files to Evaluate the Performance of Next-Generation Sequencing Pipelines
title Generation of Artificial FASTQ Files to Evaluate the Performance of Next-Generation Sequencing Pipelines
title_full Generation of Artificial FASTQ Files to Evaluate the Performance of Next-Generation Sequencing Pipelines
title_fullStr Generation of Artificial FASTQ Files to Evaluate the Performance of Next-Generation Sequencing Pipelines
title_full_unstemmed Generation of Artificial FASTQ Files to Evaluate the Performance of Next-Generation Sequencing Pipelines
title_short Generation of Artificial FASTQ Files to Evaluate the Performance of Next-Generation Sequencing Pipelines
title_sort generation of artificial fastq files to evaluate the performance of next-generation sequencing pipelines
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3495771/
https://www.ncbi.nlm.nih.gov/pubmed/23152858
http://dx.doi.org/10.1371/journal.pone.0049110
work_keys_str_mv AT framptonmatthew generationofartificialfastqfilestoevaluatetheperformanceofnextgenerationsequencingpipelines
AT houlstonrichard generationofartificialfastqfilestoevaluatetheperformanceofnextgenerationsequencingpipelines