Cargando…

The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants

FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants. This article defines the FASTQ format, covering the or...

Descripción completa

Detalles Bibliográficos
Autores principales: Cock, Peter J. A., Fields, Christopher J., Goto, Naohisa, Heuer, Michael L., Rice, Peter M.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2847217/
https://www.ncbi.nlm.nih.gov/pubmed/20015970
http://dx.doi.org/10.1093/nar/gkp1137
_version_ 1782179550776524800
author Cock, Peter J. A.
Fields, Christopher J.
Goto, Naohisa
Heuer, Michael L.
Rice, Peter M.
author_facet Cock, Peter J. A.
Fields, Christopher J.
Goto, Naohisa
Heuer, Michael L.
Rice, Peter M.
author_sort Cock, Peter J. A.
collection PubMed
description FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants. This article defines the FASTQ format, covering the original Sanger standard, the Solexa/Illumina variants and conversion between them, based on publicly available information such as the MAQ documentation and conventions recently agreed by the Open Bioinformatics Foundation projects Biopython, BioPerl, BioRuby, BioJava and EMBOSS. Being an open access publication, it is hoped that this description, with the example files provided as Supplementary Data, will serve in future as a reference for this important file format.
format Text
id pubmed-2847217
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-28472172010-04-01 The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants Cock, Peter J. A. Fields, Christopher J. Goto, Naohisa Heuer, Michael L. Rice, Peter M. Nucleic Acids Res Survey and Summary FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants. This article defines the FASTQ format, covering the original Sanger standard, the Solexa/Illumina variants and conversion between them, based on publicly available information such as the MAQ documentation and conventions recently agreed by the Open Bioinformatics Foundation projects Biopython, BioPerl, BioRuby, BioJava and EMBOSS. Being an open access publication, it is hoped that this description, with the example files provided as Supplementary Data, will serve in future as a reference for this important file format. Oxford University Press 2010-04 2009-12-16 /pmc/articles/PMC2847217/ /pubmed/20015970 http://dx.doi.org/10.1093/nar/gkp1137 Text en © The Author(s) 2009. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Survey and Summary
Cock, Peter J. A.
Fields, Christopher J.
Goto, Naohisa
Heuer, Michael L.
Rice, Peter M.
The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants
title The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants
title_full The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants
title_fullStr The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants
title_full_unstemmed The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants
title_short The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants
title_sort sanger fastq file format for sequences with quality scores, and the solexa/illumina fastq variants
topic Survey and Summary
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2847217/
https://www.ncbi.nlm.nih.gov/pubmed/20015970
http://dx.doi.org/10.1093/nar/gkp1137
work_keys_str_mv AT cockpeterja thesangerfastqfileformatforsequenceswithqualityscoresandthesolexailluminafastqvariants
AT fieldschristopherj thesangerfastqfileformatforsequenceswithqualityscoresandthesolexailluminafastqvariants
AT gotonaohisa thesangerfastqfileformatforsequenceswithqualityscoresandthesolexailluminafastqvariants
AT heuermichaell thesangerfastqfileformatforsequenceswithqualityscoresandthesolexailluminafastqvariants
AT ricepeterm thesangerfastqfileformatforsequenceswithqualityscoresandthesolexailluminafastqvariants
AT cockpeterja sangerfastqfileformatforsequenceswithqualityscoresandthesolexailluminafastqvariants
AT fieldschristopherj sangerfastqfileformatforsequenceswithqualityscoresandthesolexailluminafastqvariants
AT gotonaohisa sangerfastqfileformatforsequenceswithqualityscoresandthesolexailluminafastqvariants
AT heuermichaell sangerfastqfileformatforsequenceswithqualityscoresandthesolexailluminafastqvariants
AT ricepeterm sangerfastqfileformatforsequenceswithqualityscoresandthesolexailluminafastqvariants