Cargando…

Keemei: cloud-based validation of tabular bioinformatics file formats in Google Sheets

BACKGROUND: Bioinformatics software often requires human-generated tabular text files as input and has specific requirements for how those data are formatted. Users frequently manage these data in spreadsheet programs, which is convenient for researchers who are compiling the requisite information b...

Descripción completa

Detalles Bibliográficos
Autores principales: Rideout, Jai Ram, Chase, John H., Bolyen, Evan, Ackermann, Gail, González, Antonio, Knight, Rob, Caporaso, J. Gregory
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4906574/
https://www.ncbi.nlm.nih.gov/pubmed/27296526
http://dx.doi.org/10.1186/s13742-016-0133-6
_version_ 1782437434054672384
author Rideout, Jai Ram
Chase, John H.
Bolyen, Evan
Ackermann, Gail
González, Antonio
Knight, Rob
Caporaso, J. Gregory
author_facet Rideout, Jai Ram
Chase, John H.
Bolyen, Evan
Ackermann, Gail
González, Antonio
Knight, Rob
Caporaso, J. Gregory
author_sort Rideout, Jai Ram
collection PubMed
description BACKGROUND: Bioinformatics software often requires human-generated tabular text files as input and has specific requirements for how those data are formatted. Users frequently manage these data in spreadsheet programs, which is convenient for researchers who are compiling the requisite information because the spreadsheet programs can easily be used on different platforms including laptops and tablets, and because they provide a familiar interface. It is increasingly common for many different researchers to be involved in compiling these data, including study coordinators, clinicians, lab technicians and bioinformaticians. As a result, many research groups are shifting toward using cloud-based spreadsheet programs, such as Google Sheets, which support the concurrent editing of a single spreadsheet by different users working on different platforms. Most of the researchers who enter data are not familiar with the formatting requirements of the bioinformatics programs that will be used, so validating and correcting file formats is often a bottleneck prior to beginning bioinformatics analysis. MAIN TEXT: We present Keemei, a Google Sheets Add-on, for validating tabular files used in bioinformatics analyses. Keemei is available free of charge from Google’s Chrome Web Store. Keemei can be installed and run on any web browser supported by Google Sheets. Keemei currently supports the validation of two widely used tabular bioinformatics formats, the Quantitative Insights into Microbial Ecology (QIIME) sample metadata mapping file format and the Spatially Referenced Genetic Data (SRGD) format, but is designed to easily support the addition of others. CONCLUSIONS: Keemei will save researchers time and frustration by providing a convenient interface for tabular bioinformatics file format validation. By allowing everyone involved with data entry for a project to easily validate their data, it will reduce the validation and formatting bottlenecks that are commonly encountered when human-generated data files are first used with a bioinformatics system. Simplifying the validation of essential tabular data files, such as sample metadata, will reduce common errors and thereby improve the quality and reliability of research outcomes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13742-016-0133-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4906574
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-49065742016-06-15 Keemei: cloud-based validation of tabular bioinformatics file formats in Google Sheets Rideout, Jai Ram Chase, John H. Bolyen, Evan Ackermann, Gail González, Antonio Knight, Rob Caporaso, J. Gregory Gigascience Technical Note BACKGROUND: Bioinformatics software often requires human-generated tabular text files as input and has specific requirements for how those data are formatted. Users frequently manage these data in spreadsheet programs, which is convenient for researchers who are compiling the requisite information because the spreadsheet programs can easily be used on different platforms including laptops and tablets, and because they provide a familiar interface. It is increasingly common for many different researchers to be involved in compiling these data, including study coordinators, clinicians, lab technicians and bioinformaticians. As a result, many research groups are shifting toward using cloud-based spreadsheet programs, such as Google Sheets, which support the concurrent editing of a single spreadsheet by different users working on different platforms. Most of the researchers who enter data are not familiar with the formatting requirements of the bioinformatics programs that will be used, so validating and correcting file formats is often a bottleneck prior to beginning bioinformatics analysis. MAIN TEXT: We present Keemei, a Google Sheets Add-on, for validating tabular files used in bioinformatics analyses. Keemei is available free of charge from Google’s Chrome Web Store. Keemei can be installed and run on any web browser supported by Google Sheets. Keemei currently supports the validation of two widely used tabular bioinformatics formats, the Quantitative Insights into Microbial Ecology (QIIME) sample metadata mapping file format and the Spatially Referenced Genetic Data (SRGD) format, but is designed to easily support the addition of others. CONCLUSIONS: Keemei will save researchers time and frustration by providing a convenient interface for tabular bioinformatics file format validation. By allowing everyone involved with data entry for a project to easily validate their data, it will reduce the validation and formatting bottlenecks that are commonly encountered when human-generated data files are first used with a bioinformatics system. Simplifying the validation of essential tabular data files, such as sample metadata, will reduce common errors and thereby improve the quality and reliability of research outcomes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13742-016-0133-6) contains supplementary material, which is available to authorized users. BioMed Central 2016-06-13 /pmc/articles/PMC4906574/ /pubmed/27296526 http://dx.doi.org/10.1186/s13742-016-0133-6 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Technical Note
Rideout, Jai Ram
Chase, John H.
Bolyen, Evan
Ackermann, Gail
González, Antonio
Knight, Rob
Caporaso, J. Gregory
Keemei: cloud-based validation of tabular bioinformatics file formats in Google Sheets
title Keemei: cloud-based validation of tabular bioinformatics file formats in Google Sheets
title_full Keemei: cloud-based validation of tabular bioinformatics file formats in Google Sheets
title_fullStr Keemei: cloud-based validation of tabular bioinformatics file formats in Google Sheets
title_full_unstemmed Keemei: cloud-based validation of tabular bioinformatics file formats in Google Sheets
title_short Keemei: cloud-based validation of tabular bioinformatics file formats in Google Sheets
title_sort keemei: cloud-based validation of tabular bioinformatics file formats in google sheets
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4906574/
https://www.ncbi.nlm.nih.gov/pubmed/27296526
http://dx.doi.org/10.1186/s13742-016-0133-6
work_keys_str_mv AT rideoutjairam keemeicloudbasedvalidationoftabularbioinformaticsfileformatsingooglesheets
AT chasejohnh keemeicloudbasedvalidationoftabularbioinformaticsfileformatsingooglesheets
AT bolyenevan keemeicloudbasedvalidationoftabularbioinformaticsfileformatsingooglesheets
AT ackermanngail keemeicloudbasedvalidationoftabularbioinformaticsfileformatsingooglesheets
AT gonzalezantonio keemeicloudbasedvalidationoftabularbioinformaticsfileformatsingooglesheets
AT knightrob keemeicloudbasedvalidationoftabularbioinformaticsfileformatsingooglesheets
AT caporasojgregory keemeicloudbasedvalidationoftabularbioinformaticsfileformatsingooglesheets