Cargando…
FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences
BACKGROUND: Advances in sequencing technologies challenge the efficient importing and validation of FASTA formatted sequence data which is still a prerequisite for most bioinformatic tools and pipelines. Comparative analysis of commonly used Bio*-frameworks (BioPerl, BioJava and Biopython) shows tha...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4094456/ https://www.ncbi.nlm.nih.gov/pubmed/24929426 http://dx.doi.org/10.1186/1756-0500-7-365 |
_version_ | 1782325830366527488 |
---|---|
author | Waldmann, Jost Gerken, Jan Hankeln, Wolfgang Schweer, Timmy Glöckner, Frank Oliver |
author_facet | Waldmann, Jost Gerken, Jan Hankeln, Wolfgang Schweer, Timmy Glöckner, Frank Oliver |
author_sort | Waldmann, Jost |
collection | PubMed |
description | BACKGROUND: Advances in sequencing technologies challenge the efficient importing and validation of FASTA formatted sequence data which is still a prerequisite for most bioinformatic tools and pipelines. Comparative analysis of commonly used Bio*-frameworks (BioPerl, BioJava and Biopython) shows that their scalability and accuracy is hampered. FINDINGS: FastaValidator represents a platform-independent, standardized, light-weight software library written in the Java programming language. It targets computer scientists and bioinformaticians writing software which needs to parse quickly and accurately large amounts of sequence data. For end-users FastaValidator includes an interactive out-of-the-box validation of FASTA formatted files, as well as a non-interactive mode designed for high-throughput validation in software pipelines. CONCLUSIONS: The accuracy and performance of the FastaValidator library qualifies it for large data sets such as those commonly produced by massive parallel (NGS) technologies. It offers scientists a fast, accurate and standardized method for parsing and validating FASTA formatted sequence data. |
format | Online Article Text |
id | pubmed-4094456 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-40944562014-07-12 FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences Waldmann, Jost Gerken, Jan Hankeln, Wolfgang Schweer, Timmy Glöckner, Frank Oliver BMC Res Notes Technical Note BACKGROUND: Advances in sequencing technologies challenge the efficient importing and validation of FASTA formatted sequence data which is still a prerequisite for most bioinformatic tools and pipelines. Comparative analysis of commonly used Bio*-frameworks (BioPerl, BioJava and Biopython) shows that their scalability and accuracy is hampered. FINDINGS: FastaValidator represents a platform-independent, standardized, light-weight software library written in the Java programming language. It targets computer scientists and bioinformaticians writing software which needs to parse quickly and accurately large amounts of sequence data. For end-users FastaValidator includes an interactive out-of-the-box validation of FASTA formatted files, as well as a non-interactive mode designed for high-throughput validation in software pipelines. CONCLUSIONS: The accuracy and performance of the FastaValidator library qualifies it for large data sets such as those commonly produced by massive parallel (NGS) technologies. It offers scientists a fast, accurate and standardized method for parsing and validating FASTA formatted sequence data. BioMed Central 2014-06-14 /pmc/articles/PMC4094456/ /pubmed/24929426 http://dx.doi.org/10.1186/1756-0500-7-365 Text en Copyright © 2014 Waldmann et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Technical Note Waldmann, Jost Gerken, Jan Hankeln, Wolfgang Schweer, Timmy Glöckner, Frank Oliver FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences |
title | FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences |
title_full | FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences |
title_fullStr | FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences |
title_full_unstemmed | FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences |
title_short | FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences |
title_sort | fastavalidator: an open-source java library to parse and validate fasta formatted sequences |
topic | Technical Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4094456/ https://www.ncbi.nlm.nih.gov/pubmed/24929426 http://dx.doi.org/10.1186/1756-0500-7-365 |
work_keys_str_mv | AT waldmannjost fastavalidatoranopensourcejavalibrarytoparseandvalidatefastaformattedsequences AT gerkenjan fastavalidatoranopensourcejavalibrarytoparseandvalidatefastaformattedsequences AT hankelnwolfgang fastavalidatoranopensourcejavalibrarytoparseandvalidatefastaformattedsequences AT schweertimmy fastavalidatoranopensourcejavalibrarytoparseandvalidatefastaformattedsequences AT glocknerfrankoliver fastavalidatoranopensourcejavalibrarytoparseandvalidatefastaformattedsequences |