Cargando…

Scalability and Validation of Big Data Bioinformatics Software

This review examines two important aspects that are central to modern big data bioinformatics analysis – software scalability and validity. We argue that not only are the issues of scalability and validation common to all big data bioinformatics analyses, they can be tackled by conceptually related...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yang, Andrian, Troup, Michael, Ho, Joshua W.K.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Research Network of Computational and Structural Biotechnology 2017
Materias:	Short Survey
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5537105/ https://www.ncbi.nlm.nih.gov/pubmed/28794828 http://dx.doi.org/10.1016/j.csbj.2017.07.002

_version_	1783254107897724928
author	Yang, Andrian Troup, Michael Ho, Joshua W.K.
author_facet	Yang, Andrian Troup, Michael Ho, Joshua W.K.
author_sort	Yang, Andrian
collection	PubMed
description	This review examines two important aspects that are central to modern big data bioinformatics analysis – software scalability and validity. We argue that not only are the issues of scalability and validation common to all big data bioinformatics analyses, they can be tackled by conceptually related methodological approaches, namely divide-and-conquer (scalability) and multiple executions (validation). Scalability is defined as the ability for a program to scale based on workload. It has always been an important consideration when developing bioinformatics algorithms and programs. Nonetheless the surge of volume and variety of biological and biomedical data has posed new challenges. We discuss how modern cloud computing and big data programming frameworks such as MapReduce and Spark are being used to effectively implement divide-and-conquer in a distributed computing environment. Validation of software is another important issue in big data bioinformatics that is often ignored. Software validation is the process of determining whether the program under test fulfils the task for which it was designed. Determining the correctness of the computational output of big data bioinformatics software is especially difficult due to the large input space and complex algorithms involved. We discuss how state-of-the-art software testing techniques that are based on the idea of multiple executions, such as metamorphic testing, can be used to implement an effective bioinformatics quality assurance strategy. We hope this review will raise awareness of these critical issues in bioinformatics.
format	Online Article Text
id	pubmed-5537105
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Research Network of Computational and Structural Biotechnology
record_format	MEDLINE/PubMed
spelling	pubmed-55371052017-08-09 Scalability and Validation of Big Data Bioinformatics Software Yang, Andrian Troup, Michael Ho, Joshua W.K. Comput Struct Biotechnol J Short Survey This review examines two important aspects that are central to modern big data bioinformatics analysis – software scalability and validity. We argue that not only are the issues of scalability and validation common to all big data bioinformatics analyses, they can be tackled by conceptually related methodological approaches, namely divide-and-conquer (scalability) and multiple executions (validation). Scalability is defined as the ability for a program to scale based on workload. It has always been an important consideration when developing bioinformatics algorithms and programs. Nonetheless the surge of volume and variety of biological and biomedical data has posed new challenges. We discuss how modern cloud computing and big data programming frameworks such as MapReduce and Spark are being used to effectively implement divide-and-conquer in a distributed computing environment. Validation of software is another important issue in big data bioinformatics that is often ignored. Software validation is the process of determining whether the program under test fulfils the task for which it was designed. Determining the correctness of the computational output of big data bioinformatics software is especially difficult due to the large input space and complex algorithms involved. We discuss how state-of-the-art software testing techniques that are based on the idea of multiple executions, such as metamorphic testing, can be used to implement an effective bioinformatics quality assurance strategy. We hope this review will raise awareness of these critical issues in bioinformatics. Research Network of Computational and Structural Biotechnology 2017-07-20 /pmc/articles/PMC5537105/ /pubmed/28794828 http://dx.doi.org/10.1016/j.csbj.2017.07.002 Text en © 2017 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Short Survey Yang, Andrian Troup, Michael Ho, Joshua W.K. Scalability and Validation of Big Data Bioinformatics Software
title	Scalability and Validation of Big Data Bioinformatics Software
title_full	Scalability and Validation of Big Data Bioinformatics Software
title_fullStr	Scalability and Validation of Big Data Bioinformatics Software
title_full_unstemmed	Scalability and Validation of Big Data Bioinformatics Software
title_short	Scalability and Validation of Big Data Bioinformatics Software
title_sort	scalability and validation of big data bioinformatics software
topic	Short Survey
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5537105/ https://www.ncbi.nlm.nih.gov/pubmed/28794828 http://dx.doi.org/10.1016/j.csbj.2017.07.002
work_keys_str_mv	AT yangandrian scalabilityandvalidationofbigdatabioinformaticssoftware AT troupmichael scalabilityandvalidationofbigdatabioinformaticssoftware AT hojoshuawk scalabilityandvalidationofbigdatabioinformaticssoftware

Scalability and Validation of Big Data Bioinformatics Software

Ejemplares similares