Cargando…

Processing genome scale tabular data with wormtable

BACKGROUND: Modern biological science generates a vast amount of data, the analysis of which presents a major challenge to researchers. Data are commonly represented in tables stored as plain text files and require line-by-line parsing for analysis, which is time consuming and error prone. Furthermo...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kelleher, Jerome, Ness, Rob W, Halligan, Daniel L
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4234461/ https://www.ncbi.nlm.nih.gov/pubmed/24308302 http://dx.doi.org/10.1186/1471-2105-14-356

_version_	1782344864661241856
author	Kelleher, Jerome Ness, Rob W Halligan, Daniel L
author_facet	Kelleher, Jerome Ness, Rob W Halligan, Daniel L
author_sort	Kelleher, Jerome
collection	PubMed
description	BACKGROUND: Modern biological science generates a vast amount of data, the analysis of which presents a major challenge to researchers. Data are commonly represented in tables stored as plain text files and require line-by-line parsing for analysis, which is time consuming and error prone. Furthermore, there is no simple means of indexing these files so that rows containing particular values can be quickly found. RESULTS: We introduce a new data format and software library called wormtable, which provides efficient access to tabular data in Python. Wormtable stores data in a compact binary format, provides random access to rows, and enables sophisticated indexing on columns within these tables. Files written in existing formats can be easily converted to wormtable format, and we provide conversion utilities for the VCF and GTF formats. CONCLUSIONS: Wormtable’s simple API allows users to process large tables orders of magnitude more quickly than is possible when parsing text. Furthermore, the indexing facilities provide efficient access to subsets of the data along with providing useful methods of summarising columns. Since third-party libraries or custom code are no longer needed to parse complex plain text formats, analysis code can also be substantially simpler as well as being uniform across different data formats. These benefits of reduced code complexity and greatly increased performance allow users much greater freedom to explore their data.
format	Online Article Text
id	pubmed-4234461
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-42344612014-11-18 Processing genome scale tabular data with wormtable Kelleher, Jerome Ness, Rob W Halligan, Daniel L BMC Bioinformatics Software BACKGROUND: Modern biological science generates a vast amount of data, the analysis of which presents a major challenge to researchers. Data are commonly represented in tables stored as plain text files and require line-by-line parsing for analysis, which is time consuming and error prone. Furthermore, there is no simple means of indexing these files so that rows containing particular values can be quickly found. RESULTS: We introduce a new data format and software library called wormtable, which provides efficient access to tabular data in Python. Wormtable stores data in a compact binary format, provides random access to rows, and enables sophisticated indexing on columns within these tables. Files written in existing formats can be easily converted to wormtable format, and we provide conversion utilities for the VCF and GTF formats. CONCLUSIONS: Wormtable’s simple API allows users to process large tables orders of magnitude more quickly than is possible when parsing text. Furthermore, the indexing facilities provide efficient access to subsets of the data along with providing useful methods of summarising columns. Since third-party libraries or custom code are no longer needed to parse complex plain text formats, analysis code can also be substantially simpler as well as being uniform across different data formats. These benefits of reduced code complexity and greatly increased performance allow users much greater freedom to explore their data. BioMed Central 2013-12-05 /pmc/articles/PMC4234461/ /pubmed/24308302 http://dx.doi.org/10.1186/1471-2105-14-356 Text en Copyright © 2013 Kelleher et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Software Kelleher, Jerome Ness, Rob W Halligan, Daniel L Processing genome scale tabular data with wormtable
title	Processing genome scale tabular data with wormtable
title_full	Processing genome scale tabular data with wormtable
title_fullStr	Processing genome scale tabular data with wormtable
title_full_unstemmed	Processing genome scale tabular data with wormtable
title_short	Processing genome scale tabular data with wormtable
title_sort	processing genome scale tabular data with wormtable
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4234461/ https://www.ncbi.nlm.nih.gov/pubmed/24308302 http://dx.doi.org/10.1186/1471-2105-14-356
work_keys_str_mv	AT kelleherjerome processinggenomescaletabulardatawithwormtable AT nessrobw processinggenomescaletabulardatawithwormtable AT halligandaniell processinggenomescaletabulardatawithwormtable

Processing genome scale tabular data with wormtable

Ejemplares similares