Cargando…

HTSeq—a Python framework to work with high-throughput sequencing data

Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such sc...

Descripción completa

Detalles Bibliográficos
Autores principales: Anders, Simon, Pyl, Paul Theodor, Huber, Wolfgang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4287950/
https://www.ncbi.nlm.nih.gov/pubmed/25260700
http://dx.doi.org/10.1093/bioinformatics/btu638
_version_ 1782351888196304896
author Anders, Simon
Pyl, Paul Theodor
Huber, Wolfgang
author_facet Anders, Simon
Pyl, Paul Theodor
Huber, Wolfgang
author_sort Anders, Simon
collection PubMed
description Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an open-source software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. Contact: sanders@fs.tum.de
format Online
Article
Text
id pubmed-4287950
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-42879502015-01-30 HTSeq—a Python framework to work with high-throughput sequencing data Anders, Simon Pyl, Paul Theodor Huber, Wolfgang Bioinformatics Original Papers Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an open-source software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. Contact: sanders@fs.tum.de Oxford University Press 2015-01-15 2014-09-25 /pmc/articles/PMC4287950/ /pubmed/25260700 http://dx.doi.org/10.1093/bioinformatics/btu638 Text en © The Author 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Anders, Simon
Pyl, Paul Theodor
Huber, Wolfgang
HTSeq—a Python framework to work with high-throughput sequencing data
title HTSeq—a Python framework to work with high-throughput sequencing data
title_full HTSeq—a Python framework to work with high-throughput sequencing data
title_fullStr HTSeq—a Python framework to work with high-throughput sequencing data
title_full_unstemmed HTSeq—a Python framework to work with high-throughput sequencing data
title_short HTSeq—a Python framework to work with high-throughput sequencing data
title_sort htseq—a python framework to work with high-throughput sequencing data
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4287950/
https://www.ncbi.nlm.nih.gov/pubmed/25260700
http://dx.doi.org/10.1093/bioinformatics/btu638
work_keys_str_mv AT anderssimon htseqapythonframeworktoworkwithhighthroughputsequencingdata
AT pylpaultheodor htseqapythonframeworktoworkwithhighthroughputsequencingdata
AT huberwolfgang htseqapythonframeworktoworkwithhighthroughputsequencingdata