Cargando…

Framework for Parallel Preprocessing of Microarray Data Using Hadoop

Nowadays, microarray technology has become one of the popular ways to study gene expression and diagnosis of disease. National Center for Biology Information (NCBI) hosts public databases containing large volumes of biological data required to be preprocessed, since they carry high levels of noise a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sahlabadi, Amirhossein, Chandren Muniyandi, Ravie, Sahlabadi, Mahdi, Golshanbafghy, Hossein
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi 2018
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5896349/ https://www.ncbi.nlm.nih.gov/pubmed/29796018 http://dx.doi.org/10.1155/2018/9391635

_version_	1783313823816482816
author	Sahlabadi, Amirhossein Chandren Muniyandi, Ravie Sahlabadi, Mahdi Golshanbafghy, Hossein
author_facet	Sahlabadi, Amirhossein Chandren Muniyandi, Ravie Sahlabadi, Mahdi Golshanbafghy, Hossein
author_sort	Sahlabadi, Amirhossein
collection	PubMed
description	Nowadays, microarray technology has become one of the popular ways to study gene expression and diagnosis of disease. National Center for Biology Information (NCBI) hosts public databases containing large volumes of biological data required to be preprocessed, since they carry high levels of noise and bias. Robust Multiarray Average (RMA) is one of the standard and popular methods that is utilized to preprocess the data and remove the noises. Most of the preprocessing algorithms are time-consuming and not able to handle a large number of datasets with thousands of experiments. Parallel processing can be used to address the above-mentioned issues. Hadoop is a well-known and ideal distributed file system framework that provides a parallel environment to run the experiment. In this research, for the first time, the capability of Hadoop and statistical power of R have been leveraged to parallelize the available preprocessing algorithm called RMA to efficiently process microarray data. The experiment has been run on cluster containing 5 nodes, while each node has 16 cores and 16 GB memory. It compares efficiency and the performance of parallelized RMA using Hadoop with parallelized RMA using affyPara package as well as sequential RMA. The result shows the speed-up rate of the proposed approach outperforms the sequential approach and affyPara approach.
format	Online Article Text
id	pubmed-5896349
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Hindawi
record_format	MEDLINE/PubMed
spelling	pubmed-58963492018-05-24 Framework for Parallel Preprocessing of Microarray Data Using Hadoop Sahlabadi, Amirhossein Chandren Muniyandi, Ravie Sahlabadi, Mahdi Golshanbafghy, Hossein Adv Bioinformatics Research Article Nowadays, microarray technology has become one of the popular ways to study gene expression and diagnosis of disease. National Center for Biology Information (NCBI) hosts public databases containing large volumes of biological data required to be preprocessed, since they carry high levels of noise and bias. Robust Multiarray Average (RMA) is one of the standard and popular methods that is utilized to preprocess the data and remove the noises. Most of the preprocessing algorithms are time-consuming and not able to handle a large number of datasets with thousands of experiments. Parallel processing can be used to address the above-mentioned issues. Hadoop is a well-known and ideal distributed file system framework that provides a parallel environment to run the experiment. In this research, for the first time, the capability of Hadoop and statistical power of R have been leveraged to parallelize the available preprocessing algorithm called RMA to efficiently process microarray data. The experiment has been run on cluster containing 5 nodes, while each node has 16 cores and 16 GB memory. It compares efficiency and the performance of parallelized RMA using Hadoop with parallelized RMA using affyPara package as well as sequential RMA. The result shows the speed-up rate of the proposed approach outperforms the sequential approach and affyPara approach. Hindawi 2018-03-29 /pmc/articles/PMC5896349/ /pubmed/29796018 http://dx.doi.org/10.1155/2018/9391635 Text en Copyright © 2018 Amirhossein Sahlabadi et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Sahlabadi, Amirhossein Chandren Muniyandi, Ravie Sahlabadi, Mahdi Golshanbafghy, Hossein Framework for Parallel Preprocessing of Microarray Data Using Hadoop
title	Framework for Parallel Preprocessing of Microarray Data Using Hadoop
title_full	Framework for Parallel Preprocessing of Microarray Data Using Hadoop
title_fullStr	Framework for Parallel Preprocessing of Microarray Data Using Hadoop
title_full_unstemmed	Framework for Parallel Preprocessing of Microarray Data Using Hadoop
title_short	Framework for Parallel Preprocessing of Microarray Data Using Hadoop
title_sort	framework for parallel preprocessing of microarray data using hadoop
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5896349/ https://www.ncbi.nlm.nih.gov/pubmed/29796018 http://dx.doi.org/10.1155/2018/9391635
work_keys_str_mv	AT sahlabadiamirhossein frameworkforparallelpreprocessingofmicroarraydatausinghadoop AT chandrenmuniyandiravie frameworkforparallelpreprocessingofmicroarraydatausinghadoop AT sahlabadimahdi frameworkforparallelpreprocessingofmicroarraydatausinghadoop AT golshanbafghyhossein frameworkforparallelpreprocessingofmicroarraydatausinghadoop

Framework for Parallel Preprocessing of Microarray Data Using Hadoop

Ejemplares similares