Cargando…

Design of high-performance parallelized gene predictors in MATLAB

BACKGROUND: This paper proposes a method of implementing parallel gene prediction algorithms in MATLAB. The proposed designs are based on either Goertzel’s algorithm or on FFTs and have been implemented using varying amounts of parallelism on a central processing unit (CPU) and on a graphics process...

Descripción completa

Detalles Bibliográficos
Autores principales: Rivard, Sylvain Robert, Mailloux, Jean-Gabriel, Beguenane, Rachid, Bui, Hung Tien
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3444342/
https://www.ncbi.nlm.nih.gov/pubmed/22490084
http://dx.doi.org/10.1186/1756-0500-5-183
_version_ 1782243664014082048
author Rivard, Sylvain Robert
Mailloux, Jean-Gabriel
Beguenane, Rachid
Bui, Hung Tien
author_facet Rivard, Sylvain Robert
Mailloux, Jean-Gabriel
Beguenane, Rachid
Bui, Hung Tien
author_sort Rivard, Sylvain Robert
collection PubMed
description BACKGROUND: This paper proposes a method of implementing parallel gene prediction algorithms in MATLAB. The proposed designs are based on either Goertzel’s algorithm or on FFTs and have been implemented using varying amounts of parallelism on a central processing unit (CPU) and on a graphics processing unit (GPU). FINDINGS: Results show that an implementation using a straightforward approach can require over 4.5 h to process 15 million base pairs (bps) whereas a properly designed one could perform the same task in less than five minutes. In the best case, a GPU implementation can yield these results in 57 s. CONCLUSIONS: The present work shows how parallelism can be used in MATLAB for gene prediction in very large DNA sequences to produce results that are over 270 times faster than a conventional approach. This is significant as MATLAB is typically overlooked due to its apparent slow processing time even though it offers a convenient environment for bioinformatics. From a practical standpoint, this work proposes two strategies for accelerating genome data processing which rely on different parallelization mechanisms. Using a CPU, the work shows that direct access to the MEX function increases execution speed and that the PARFOR construct should be used in order to take full advantage of the parallelizable Goertzel implementation. When the target is a GPU, the work shows that data needs to be segmented into manageable sizes within the GFOR construct before processing in order to minimize execution time.
format Online
Article
Text
id pubmed-3444342
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34443422012-09-20 Design of high-performance parallelized gene predictors in MATLAB Rivard, Sylvain Robert Mailloux, Jean-Gabriel Beguenane, Rachid Bui, Hung Tien BMC Res Notes Technical Note BACKGROUND: This paper proposes a method of implementing parallel gene prediction algorithms in MATLAB. The proposed designs are based on either Goertzel’s algorithm or on FFTs and have been implemented using varying amounts of parallelism on a central processing unit (CPU) and on a graphics processing unit (GPU). FINDINGS: Results show that an implementation using a straightforward approach can require over 4.5 h to process 15 million base pairs (bps) whereas a properly designed one could perform the same task in less than five minutes. In the best case, a GPU implementation can yield these results in 57 s. CONCLUSIONS: The present work shows how parallelism can be used in MATLAB for gene prediction in very large DNA sequences to produce results that are over 270 times faster than a conventional approach. This is significant as MATLAB is typically overlooked due to its apparent slow processing time even though it offers a convenient environment for bioinformatics. From a practical standpoint, this work proposes two strategies for accelerating genome data processing which rely on different parallelization mechanisms. Using a CPU, the work shows that direct access to the MEX function increases execution speed and that the PARFOR construct should be used in order to take full advantage of the parallelizable Goertzel implementation. When the target is a GPU, the work shows that data needs to be segmented into manageable sizes within the GFOR construct before processing in order to minimize execution time. BioMed Central 2012-04-10 /pmc/articles/PMC3444342/ /pubmed/22490084 http://dx.doi.org/10.1186/1756-0500-5-183 Text en Copyright ©2012 Rivard et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Rivard, Sylvain Robert
Mailloux, Jean-Gabriel
Beguenane, Rachid
Bui, Hung Tien
Design of high-performance parallelized gene predictors in MATLAB
title Design of high-performance parallelized gene predictors in MATLAB
title_full Design of high-performance parallelized gene predictors in MATLAB
title_fullStr Design of high-performance parallelized gene predictors in MATLAB
title_full_unstemmed Design of high-performance parallelized gene predictors in MATLAB
title_short Design of high-performance parallelized gene predictors in MATLAB
title_sort design of high-performance parallelized gene predictors in matlab
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3444342/
https://www.ncbi.nlm.nih.gov/pubmed/22490084
http://dx.doi.org/10.1186/1756-0500-5-183
work_keys_str_mv AT rivardsylvainrobert designofhighperformanceparallelizedgenepredictorsinmatlab
AT maillouxjeangabriel designofhighperformanceparallelizedgenepredictorsinmatlab
AT beguenanerachid designofhighperformanceparallelizedgenepredictorsinmatlab
AT buihungtien designofhighperformanceparallelizedgenepredictorsinmatlab