Cargando…

Breaking the computational barriers of pairwise genome comparison

BACKGROUND: Conventional pairwise sequence comparison software algorithms are being used to process much larger datasets than they were originally designed for. This can result in processing bottlenecks that limit software capabilities or prevent full use of the available hardware resources. Overcom...

Descripción completa

Detalles Bibliográficos
Autores principales:	Torreno, Oscar, Trelles, Oswaldo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4531504/ https://www.ncbi.nlm.nih.gov/pubmed/26260162 http://dx.doi.org/10.1186/s12859-015-0679-9

_version_	1782385050628653056
author	Torreno, Oscar Trelles, Oswaldo
author_facet	Torreno, Oscar Trelles, Oswaldo
author_sort	Torreno, Oscar
collection	PubMed
description	BACKGROUND: Conventional pairwise sequence comparison software algorithms are being used to process much larger datasets than they were originally designed for. This can result in processing bottlenecks that limit software capabilities or prevent full use of the available hardware resources. Overcoming the barriers that limit the efficient computational analysis of large biological sequence datasets by retrofitting existing algorithms or by creating new applications represents a major challenge for the bioinformatics community. RESULTS: We have developed C libraries for pairwise sequence comparison within diverse architectures, ranging from commodity systems to high performance and cloud computing environments. Exhaustive tests were performed using different datasets of closely- and distantly-related sequences that span from small viral genomes to large mammalian chromosomes. The tests demonstrated that our solution is capable of generating high quality results with a linear-time response and controlled memory consumption, being comparable or faster than the current state-of-the-art methods. CONCLUSIONS: We have addressed the problem of pairwise and all-versus-all comparison of large sequences in general, greatly increasing the limits on input data size. The approach described here is based on a modular out-of-core strategy that uses secondary storage to avoid reaching memory limits during the identification of High-scoring Segment Pairs (HSPs) between the sequences under comparison. Software engineering concepts were applied to avoid intermediate result re-calculation, to minimise the performance impact of input/output (I/O) operations and to modularise the process, thus enhancing application flexibility and extendibility. Our computationally-efficient approach allows tasks such as the massive comparison of complete genomes, evolutionary event detection, the identification of conserved synteny blocks and inter-genome distance calculations to be performed more effectively. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0679-9) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4531504
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-45315042015-08-12 Breaking the computational barriers of pairwise genome comparison Torreno, Oscar Trelles, Oswaldo BMC Bioinformatics Research Article BACKGROUND: Conventional pairwise sequence comparison software algorithms are being used to process much larger datasets than they were originally designed for. This can result in processing bottlenecks that limit software capabilities or prevent full use of the available hardware resources. Overcoming the barriers that limit the efficient computational analysis of large biological sequence datasets by retrofitting existing algorithms or by creating new applications represents a major challenge for the bioinformatics community. RESULTS: We have developed C libraries for pairwise sequence comparison within diverse architectures, ranging from commodity systems to high performance and cloud computing environments. Exhaustive tests were performed using different datasets of closely- and distantly-related sequences that span from small viral genomes to large mammalian chromosomes. The tests demonstrated that our solution is capable of generating high quality results with a linear-time response and controlled memory consumption, being comparable or faster than the current state-of-the-art methods. CONCLUSIONS: We have addressed the problem of pairwise and all-versus-all comparison of large sequences in general, greatly increasing the limits on input data size. The approach described here is based on a modular out-of-core strategy that uses secondary storage to avoid reaching memory limits during the identification of High-scoring Segment Pairs (HSPs) between the sequences under comparison. Software engineering concepts were applied to avoid intermediate result re-calculation, to minimise the performance impact of input/output (I/O) operations and to modularise the process, thus enhancing application flexibility and extendibility. Our computationally-efficient approach allows tasks such as the massive comparison of complete genomes, evolutionary event detection, the identification of conserved synteny blocks and inter-genome distance calculations to be performed more effectively. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0679-9) contains supplementary material, which is available to authorized users. BioMed Central 2015-08-11 /pmc/articles/PMC4531504/ /pubmed/26260162 http://dx.doi.org/10.1186/s12859-015-0679-9 Text en © Torreno and Trelles; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Torreno, Oscar Trelles, Oswaldo Breaking the computational barriers of pairwise genome comparison
title	Breaking the computational barriers of pairwise genome comparison
title_full	Breaking the computational barriers of pairwise genome comparison
title_fullStr	Breaking the computational barriers of pairwise genome comparison
title_full_unstemmed	Breaking the computational barriers of pairwise genome comparison
title_short	Breaking the computational barriers of pairwise genome comparison
title_sort	breaking the computational barriers of pairwise genome comparison
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4531504/ https://www.ncbi.nlm.nih.gov/pubmed/26260162 http://dx.doi.org/10.1186/s12859-015-0679-9
work_keys_str_mv	AT torrenooscar breakingthecomputationalbarriersofpairwisegenomecomparison AT trellesoswaldo breakingthecomputationalbarriersofpairwisegenomecomparison

Breaking the computational barriers of pairwise genome comparison

Ejemplares similares