Cargando…

A Scheduling Algorithm for Computational Grids that Minimizes Centralized Processing in Genome Assembly of Next-Generation Sequencing Data

Improvements in genome sequencing techniques have resulted in generation of huge volumes of data. As a consequence of this progress, the genome assembly stage demands even more computational power, since the incoming sequence files contain large amounts of data. To speed up the process, it is often...

Descripción completa

Detalles Bibliográficos
Autores principales: Lima, Jakelyne, Cerdeira, Louise Teixeira, Bol, Erick, Schneider, Maria Paula Cruz, Silva, Artur, Azevedo, Vasco, Abelém, Antônio Jorge Gomes
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Research Foundation 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3306921/
https://www.ncbi.nlm.nih.gov/pubmed/22461785
http://dx.doi.org/10.3389/fgene.2012.00038
_version_ 1782227254071263232
author Lima, Jakelyne
Cerdeira, Louise Teixeira
Bol, Erick
Schneider, Maria Paula Cruz
Silva, Artur
Azevedo, Vasco
Abelém, Antônio Jorge Gomes
author_facet Lima, Jakelyne
Cerdeira, Louise Teixeira
Bol, Erick
Schneider, Maria Paula Cruz
Silva, Artur
Azevedo, Vasco
Abelém, Antônio Jorge Gomes
author_sort Lima, Jakelyne
collection PubMed
description Improvements in genome sequencing techniques have resulted in generation of huge volumes of data. As a consequence of this progress, the genome assembly stage demands even more computational power, since the incoming sequence files contain large amounts of data. To speed up the process, it is often necessary to distribute the workload among a group of machines. However, this requires hardware and software solutions specially configured for this purpose. Grid computing try to simplify this process of aggregate resources, but do not always offer the best performance possible due to heterogeneity and decentralized management of its resources. Thus, it is necessary to develop software that takes into account these peculiarities. In order to achieve this purpose, we developed an algorithm aimed to optimize the functionality of de novo assembly software ABySS in order to optimize its operation in grids. We run ABySS with and without the algorithm we developed in the grid simulator SimGrid. Tests showed that our algorithm is viable, flexible, and scalable even on a heterogeneous environment, which improved the genome assembly time in computational grids without changing its quality.
format Online
Article
Text
id pubmed-3306921
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Frontiers Research Foundation
record_format MEDLINE/PubMed
spelling pubmed-33069212012-03-29 A Scheduling Algorithm for Computational Grids that Minimizes Centralized Processing in Genome Assembly of Next-Generation Sequencing Data Lima, Jakelyne Cerdeira, Louise Teixeira Bol, Erick Schneider, Maria Paula Cruz Silva, Artur Azevedo, Vasco Abelém, Antônio Jorge Gomes Front Genet Genetics Improvements in genome sequencing techniques have resulted in generation of huge volumes of data. As a consequence of this progress, the genome assembly stage demands even more computational power, since the incoming sequence files contain large amounts of data. To speed up the process, it is often necessary to distribute the workload among a group of machines. However, this requires hardware and software solutions specially configured for this purpose. Grid computing try to simplify this process of aggregate resources, but do not always offer the best performance possible due to heterogeneity and decentralized management of its resources. Thus, it is necessary to develop software that takes into account these peculiarities. In order to achieve this purpose, we developed an algorithm aimed to optimize the functionality of de novo assembly software ABySS in order to optimize its operation in grids. We run ABySS with and without the algorithm we developed in the grid simulator SimGrid. Tests showed that our algorithm is viable, flexible, and scalable even on a heterogeneous environment, which improved the genome assembly time in computational grids without changing its quality. Frontiers Research Foundation 2012-03-19 /pmc/articles/PMC3306921/ /pubmed/22461785 http://dx.doi.org/10.3389/fgene.2012.00038 Text en Copyright © 2012 Lima, Cerdeira, Bol, Schneider, Silva, Azevedo and Abelém. http://www.frontiersin.org/licenseagreement This is an open-access article distributed under the terms of the Creative Commons Attribution Non Commercial License, which permits non-commercial use, distribution, and reproduction in other forums, provided the original authors and source are credited.
spellingShingle Genetics
Lima, Jakelyne
Cerdeira, Louise Teixeira
Bol, Erick
Schneider, Maria Paula Cruz
Silva, Artur
Azevedo, Vasco
Abelém, Antônio Jorge Gomes
A Scheduling Algorithm for Computational Grids that Minimizes Centralized Processing in Genome Assembly of Next-Generation Sequencing Data
title A Scheduling Algorithm for Computational Grids that Minimizes Centralized Processing in Genome Assembly of Next-Generation Sequencing Data
title_full A Scheduling Algorithm for Computational Grids that Minimizes Centralized Processing in Genome Assembly of Next-Generation Sequencing Data
title_fullStr A Scheduling Algorithm for Computational Grids that Minimizes Centralized Processing in Genome Assembly of Next-Generation Sequencing Data
title_full_unstemmed A Scheduling Algorithm for Computational Grids that Minimizes Centralized Processing in Genome Assembly of Next-Generation Sequencing Data
title_short A Scheduling Algorithm for Computational Grids that Minimizes Centralized Processing in Genome Assembly of Next-Generation Sequencing Data
title_sort scheduling algorithm for computational grids that minimizes centralized processing in genome assembly of next-generation sequencing data
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3306921/
https://www.ncbi.nlm.nih.gov/pubmed/22461785
http://dx.doi.org/10.3389/fgene.2012.00038
work_keys_str_mv AT limajakelyne aschedulingalgorithmforcomputationalgridsthatminimizescentralizedprocessingingenomeassemblyofnextgenerationsequencingdata
AT cerdeiralouiseteixeira aschedulingalgorithmforcomputationalgridsthatminimizescentralizedprocessingingenomeassemblyofnextgenerationsequencingdata
AT bolerick aschedulingalgorithmforcomputationalgridsthatminimizescentralizedprocessingingenomeassemblyofnextgenerationsequencingdata
AT schneidermariapaulacruz aschedulingalgorithmforcomputationalgridsthatminimizescentralizedprocessingingenomeassemblyofnextgenerationsequencingdata
AT silvaartur aschedulingalgorithmforcomputationalgridsthatminimizescentralizedprocessingingenomeassemblyofnextgenerationsequencingdata
AT azevedovasco aschedulingalgorithmforcomputationalgridsthatminimizescentralizedprocessingingenomeassemblyofnextgenerationsequencingdata
AT abelemantoniojorgegomes aschedulingalgorithmforcomputationalgridsthatminimizescentralizedprocessingingenomeassemblyofnextgenerationsequencingdata
AT limajakelyne schedulingalgorithmforcomputationalgridsthatminimizescentralizedprocessingingenomeassemblyofnextgenerationsequencingdata
AT cerdeiralouiseteixeira schedulingalgorithmforcomputationalgridsthatminimizescentralizedprocessingingenomeassemblyofnextgenerationsequencingdata
AT bolerick schedulingalgorithmforcomputationalgridsthatminimizescentralizedprocessingingenomeassemblyofnextgenerationsequencingdata
AT schneidermariapaulacruz schedulingalgorithmforcomputationalgridsthatminimizescentralizedprocessingingenomeassemblyofnextgenerationsequencingdata
AT silvaartur schedulingalgorithmforcomputationalgridsthatminimizescentralizedprocessingingenomeassemblyofnextgenerationsequencingdata
AT azevedovasco schedulingalgorithmforcomputationalgridsthatminimizescentralizedprocessingingenomeassemblyofnextgenerationsequencingdata
AT abelemantoniojorgegomes schedulingalgorithmforcomputationalgridsthatminimizescentralizedprocessingingenomeassemblyofnextgenerationsequencingdata