Cargando…
The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants
FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants. This article defines the FASTQ format, covering the or...
Autores principales: | , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2847217/ https://www.ncbi.nlm.nih.gov/pubmed/20015970 http://dx.doi.org/10.1093/nar/gkp1137 |
_version_ | 1782179550776524800 |
---|---|
author | Cock, Peter J. A. Fields, Christopher J. Goto, Naohisa Heuer, Michael L. Rice, Peter M. |
author_facet | Cock, Peter J. A. Fields, Christopher J. Goto, Naohisa Heuer, Michael L. Rice, Peter M. |
author_sort | Cock, Peter J. A. |
collection | PubMed |
description | FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants. This article defines the FASTQ format, covering the original Sanger standard, the Solexa/Illumina variants and conversion between them, based on publicly available information such as the MAQ documentation and conventions recently agreed by the Open Bioinformatics Foundation projects Biopython, BioPerl, BioRuby, BioJava and EMBOSS. Being an open access publication, it is hoped that this description, with the example files provided as Supplementary Data, will serve in future as a reference for this important file format. |
format | Text |
id | pubmed-2847217 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-28472172010-04-01 The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants Cock, Peter J. A. Fields, Christopher J. Goto, Naohisa Heuer, Michael L. Rice, Peter M. Nucleic Acids Res Survey and Summary FASTQ has emerged as a common file format for sharing sequencing read data combining both the sequence and an associated per base quality score, despite lacking any formal definition to date, and existing in at least three incompatible variants. This article defines the FASTQ format, covering the original Sanger standard, the Solexa/Illumina variants and conversion between them, based on publicly available information such as the MAQ documentation and conventions recently agreed by the Open Bioinformatics Foundation projects Biopython, BioPerl, BioRuby, BioJava and EMBOSS. Being an open access publication, it is hoped that this description, with the example files provided as Supplementary Data, will serve in future as a reference for this important file format. Oxford University Press 2010-04 2009-12-16 /pmc/articles/PMC2847217/ /pubmed/20015970 http://dx.doi.org/10.1093/nar/gkp1137 Text en © The Author(s) 2009. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Survey and Summary Cock, Peter J. A. Fields, Christopher J. Goto, Naohisa Heuer, Michael L. Rice, Peter M. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants |
title | The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants |
title_full | The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants |
title_fullStr | The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants |
title_full_unstemmed | The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants |
title_short | The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants |
title_sort | sanger fastq file format for sequences with quality scores, and the solexa/illumina fastq variants |
topic | Survey and Summary |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2847217/ https://www.ncbi.nlm.nih.gov/pubmed/20015970 http://dx.doi.org/10.1093/nar/gkp1137 |
work_keys_str_mv | AT cockpeterja thesangerfastqfileformatforsequenceswithqualityscoresandthesolexailluminafastqvariants AT fieldschristopherj thesangerfastqfileformatforsequenceswithqualityscoresandthesolexailluminafastqvariants AT gotonaohisa thesangerfastqfileformatforsequenceswithqualityscoresandthesolexailluminafastqvariants AT heuermichaell thesangerfastqfileformatforsequenceswithqualityscoresandthesolexailluminafastqvariants AT ricepeterm thesangerfastqfileformatforsequenceswithqualityscoresandthesolexailluminafastqvariants AT cockpeterja sangerfastqfileformatforsequenceswithqualityscoresandthesolexailluminafastqvariants AT fieldschristopherj sangerfastqfileformatforsequenceswithqualityscoresandthesolexailluminafastqvariants AT gotonaohisa sangerfastqfileformatforsequenceswithqualityscoresandthesolexailluminafastqvariants AT heuermichaell sangerfastqfileformatforsequenceswithqualityscoresandthesolexailluminafastqvariants AT ricepeterm sangerfastqfileformatforsequenceswithqualityscoresandthesolexailluminafastqvariants |