Cargando…

Mutation Operators for Large Scale Data Processing Programs in Spark

This paper proposes a mutation testing approach for big data processing programs that follow a data flow model, such as those implemented on top of Apache Spark. Mutation testing is a fault-based technique that relies on fault simulation by modifying programs, to create faulty versions called mutant...

Descripción completa

Detalles Bibliográficos
Autores principales:	de Souza Neto, João Batista, Martins Moreira, Anamaria, Vargas-Solar, Genoveva, Musicante, Martin Alejandro
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7266460/ http://dx.doi.org/10.1007/978-3-030-49435-3_30

_version_	1783541314186379264
author	de Souza Neto, João Batista Martins Moreira, Anamaria Vargas-Solar, Genoveva Musicante, Martin Alejandro
author_facet	de Souza Neto, João Batista Martins Moreira, Anamaria Vargas-Solar, Genoveva Musicante, Martin Alejandro
author_sort	de Souza Neto, João Batista
collection	PubMed
description	This paper proposes a mutation testing approach for big data processing programs that follow a data flow model, such as those implemented on top of Apache Spark. Mutation testing is a fault-based technique that relies on fault simulation by modifying programs, to create faulty versions called mutants. Mutant creation is carried on by operators able to simulate specific and well identified faults. A testing process must be able to signal faults within mutants and thereby avoid having ill behaviours within a program. We propose a set of mutation operators designed for Spark programs characterized by a data flow and data processing operations. These operators model changes in the data flow and operations, to simulate faults that take into account Spark program characteristics. We performed manual experiments to evaluate the proposed mutation operators in terms of cost and effectiveness. Thereby, we show that mutation operators can contribute to the testing process, in the construction of reliable Spark programs.
format	Online Article Text
id	pubmed-7266460
institution	National Center for Biotechnology Information
language	English
publishDate	2020
record_format	MEDLINE/PubMed
spelling	pubmed-72664602020-06-03 Mutation Operators for Large Scale Data Processing Programs in Spark de Souza Neto, João Batista Martins Moreira, Anamaria Vargas-Solar, Genoveva Musicante, Martin Alejandro Advanced Information Systems Engineering Article This paper proposes a mutation testing approach for big data processing programs that follow a data flow model, such as those implemented on top of Apache Spark. Mutation testing is a fault-based technique that relies on fault simulation by modifying programs, to create faulty versions called mutants. Mutant creation is carried on by operators able to simulate specific and well identified faults. A testing process must be able to signal faults within mutants and thereby avoid having ill behaviours within a program. We propose a set of mutation operators designed for Spark programs characterized by a data flow and data processing operations. These operators model changes in the data flow and operations, to simulate faults that take into account Spark program characteristics. We performed manual experiments to evaluate the proposed mutation operators in terms of cost and effectiveness. Thereby, we show that mutation operators can contribute to the testing process, in the construction of reliable Spark programs. 2020-05-30 /pmc/articles/PMC7266460/ http://dx.doi.org/10.1007/978-3-030-49435-3_30 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Article de Souza Neto, João Batista Martins Moreira, Anamaria Vargas-Solar, Genoveva Musicante, Martin Alejandro Mutation Operators for Large Scale Data Processing Programs in Spark
title	Mutation Operators for Large Scale Data Processing Programs in Spark
title_full	Mutation Operators for Large Scale Data Processing Programs in Spark
title_fullStr	Mutation Operators for Large Scale Data Processing Programs in Spark
title_full_unstemmed	Mutation Operators for Large Scale Data Processing Programs in Spark
title_short	Mutation Operators for Large Scale Data Processing Programs in Spark
title_sort	mutation operators for large scale data processing programs in spark
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7266460/ http://dx.doi.org/10.1007/978-3-030-49435-3_30
work_keys_str_mv	AT desouzanetojoaobatista mutationoperatorsforlargescaledataprocessingprogramsinspark AT martinsmoreiraanamaria mutationoperatorsforlargescaledataprocessingprogramsinspark AT vargassolargenoveva mutationoperatorsforlargescaledataprocessingprogramsinspark AT musicantemartinalejandro mutationoperatorsforlargescaledataprocessingprogramsinspark

Mutation Operators for Large Scale Data Processing Programs in Spark

Ejemplares similares