Cargando…

FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences

BACKGROUND: Advances in sequencing technologies challenge the efficient importing and validation of FASTA formatted sequence data which is still a prerequisite for most bioinformatic tools and pipelines. Comparative analysis of commonly used Bio*-frameworks (BioPerl, BioJava and Biopython) shows tha...

Descripción completa

Detalles Bibliográficos
Autores principales: Waldmann, Jost, Gerken, Jan, Hankeln, Wolfgang, Schweer, Timmy, Glöckner, Frank Oliver
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4094456/
https://www.ncbi.nlm.nih.gov/pubmed/24929426
http://dx.doi.org/10.1186/1756-0500-7-365
_version_ 1782325830366527488
author Waldmann, Jost
Gerken, Jan
Hankeln, Wolfgang
Schweer, Timmy
Glöckner, Frank Oliver
author_facet Waldmann, Jost
Gerken, Jan
Hankeln, Wolfgang
Schweer, Timmy
Glöckner, Frank Oliver
author_sort Waldmann, Jost
collection PubMed
description BACKGROUND: Advances in sequencing technologies challenge the efficient importing and validation of FASTA formatted sequence data which is still a prerequisite for most bioinformatic tools and pipelines. Comparative analysis of commonly used Bio*-frameworks (BioPerl, BioJava and Biopython) shows that their scalability and accuracy is hampered. FINDINGS: FastaValidator represents a platform-independent, standardized, light-weight software library written in the Java programming language. It targets computer scientists and bioinformaticians writing software which needs to parse quickly and accurately large amounts of sequence data. For end-users FastaValidator includes an interactive out-of-the-box validation of FASTA formatted files, as well as a non-interactive mode designed for high-throughput validation in software pipelines. CONCLUSIONS: The accuracy and performance of the FastaValidator library qualifies it for large data sets such as those commonly produced by massive parallel (NGS) technologies. It offers scientists a fast, accurate and standardized method for parsing and validating FASTA formatted sequence data.
format Online
Article
Text
id pubmed-4094456
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40944562014-07-12 FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences Waldmann, Jost Gerken, Jan Hankeln, Wolfgang Schweer, Timmy Glöckner, Frank Oliver BMC Res Notes Technical Note BACKGROUND: Advances in sequencing technologies challenge the efficient importing and validation of FASTA formatted sequence data which is still a prerequisite for most bioinformatic tools and pipelines. Comparative analysis of commonly used Bio*-frameworks (BioPerl, BioJava and Biopython) shows that their scalability and accuracy is hampered. FINDINGS: FastaValidator represents a platform-independent, standardized, light-weight software library written in the Java programming language. It targets computer scientists and bioinformaticians writing software which needs to parse quickly and accurately large amounts of sequence data. For end-users FastaValidator includes an interactive out-of-the-box validation of FASTA formatted files, as well as a non-interactive mode designed for high-throughput validation in software pipelines. CONCLUSIONS: The accuracy and performance of the FastaValidator library qualifies it for large data sets such as those commonly produced by massive parallel (NGS) technologies. It offers scientists a fast, accurate and standardized method for parsing and validating FASTA formatted sequence data. BioMed Central 2014-06-14 /pmc/articles/PMC4094456/ /pubmed/24929426 http://dx.doi.org/10.1186/1756-0500-7-365 Text en Copyright © 2014 Waldmann et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Technical Note
Waldmann, Jost
Gerken, Jan
Hankeln, Wolfgang
Schweer, Timmy
Glöckner, Frank Oliver
FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences
title FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences
title_full FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences
title_fullStr FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences
title_full_unstemmed FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences
title_short FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences
title_sort fastavalidator: an open-source java library to parse and validate fasta formatted sequences
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4094456/
https://www.ncbi.nlm.nih.gov/pubmed/24929426
http://dx.doi.org/10.1186/1756-0500-7-365
work_keys_str_mv AT waldmannjost fastavalidatoranopensourcejavalibrarytoparseandvalidatefastaformattedsequences
AT gerkenjan fastavalidatoranopensourcejavalibrarytoparseandvalidatefastaformattedsequences
AT hankelnwolfgang fastavalidatoranopensourcejavalibrarytoparseandvalidatefastaformattedsequences
AT schweertimmy fastavalidatoranopensourcejavalibrarytoparseandvalidatefastaformattedsequences
AT glocknerfrankoliver fastavalidatoranopensourcejavalibrarytoparseandvalidatefastaformattedsequences