Cargando…

Accelerating String Set Matching in FPGA Hardware for Bioinformatics Research

BACKGROUND: This paper describes techniques for accelerating the performance of the string set matching problem with particular emphasis on applications in computational proteomics. The process of matching peptide sequences against a genome translated in six reading frames is part of a proteogenomic...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dandass, Yoginder S, Burgess, Shane C, Lawrence, Mark, Bridges, Susan M
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2374783/ https://www.ncbi.nlm.nih.gov/pubmed/18412963 http://dx.doi.org/10.1186/1471-2105-9-197

_version_	1782154526088757248
author	Dandass, Yoginder S Burgess, Shane C Lawrence, Mark Bridges, Susan M
author_facet	Dandass, Yoginder S Burgess, Shane C Lawrence, Mark Bridges, Susan M
author_sort	Dandass, Yoginder S
collection	PubMed
description	BACKGROUND: This paper describes techniques for accelerating the performance of the string set matching problem with particular emphasis on applications in computational proteomics. The process of matching peptide sequences against a genome translated in six reading frames is part of a proteogenomic mapping pipeline that is used as a case-study. The Aho-Corasick algorithm is adapted for execution in field programmable gate array (FPGA) devices in a manner that optimizes space and performance. In this approach, the traditional Aho-Corasick finite state machine (FSM) is split into smaller FSMs, operating in parallel, each of which matches up to 20 peptides in the input translated genome. Each of the smaller FSMs is further divided into five simpler FSMs such that each simple FSM operates on a single bit position in the input (five bits are sufficient for representing all amino acids and special symbols in protein sequences). RESULTS: This bit-split organization of the Aho-Corasick implementation enables efficient utilization of the limited random access memory (RAM) resources available in typical FPGAs. The use of on-chip RAM as opposed to FPGA logic resources for FSM implementation also enables rapid reconfiguration of the FPGA without the place and routing delays associated with complex digital designs. CONCLUSION: Experimental results show storage efficiencies of over 80% for several data sets. Furthermore, the FPGA implementation executing at 100 MHz is nearly 20 times faster than an implementation of the traditional Aho-Corasick algorithm executing on a 2.67 GHz workstation.
format	Text
id	pubmed-2374783
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-23747832008-05-09 Accelerating String Set Matching in FPGA Hardware for Bioinformatics Research Dandass, Yoginder S Burgess, Shane C Lawrence, Mark Bridges, Susan M BMC Bioinformatics Methodology Article BACKGROUND: This paper describes techniques for accelerating the performance of the string set matching problem with particular emphasis on applications in computational proteomics. The process of matching peptide sequences against a genome translated in six reading frames is part of a proteogenomic mapping pipeline that is used as a case-study. The Aho-Corasick algorithm is adapted for execution in field programmable gate array (FPGA) devices in a manner that optimizes space and performance. In this approach, the traditional Aho-Corasick finite state machine (FSM) is split into smaller FSMs, operating in parallel, each of which matches up to 20 peptides in the input translated genome. Each of the smaller FSMs is further divided into five simpler FSMs such that each simple FSM operates on a single bit position in the input (five bits are sufficient for representing all amino acids and special symbols in protein sequences). RESULTS: This bit-split organization of the Aho-Corasick implementation enables efficient utilization of the limited random access memory (RAM) resources available in typical FPGAs. The use of on-chip RAM as opposed to FPGA logic resources for FSM implementation also enables rapid reconfiguration of the FPGA without the place and routing delays associated with complex digital designs. CONCLUSION: Experimental results show storage efficiencies of over 80% for several data sets. Furthermore, the FPGA implementation executing at 100 MHz is nearly 20 times faster than an implementation of the traditional Aho-Corasick algorithm executing on a 2.67 GHz workstation. BioMed Central 2008-04-15 /pmc/articles/PMC2374783/ /pubmed/18412963 http://dx.doi.org/10.1186/1471-2105-9-197 Text en Copyright © 2008 Dandass et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Dandass, Yoginder S Burgess, Shane C Lawrence, Mark Bridges, Susan M Accelerating String Set Matching in FPGA Hardware for Bioinformatics Research
title	Accelerating String Set Matching in FPGA Hardware for Bioinformatics Research
title_full	Accelerating String Set Matching in FPGA Hardware for Bioinformatics Research
title_fullStr	Accelerating String Set Matching in FPGA Hardware for Bioinformatics Research
title_full_unstemmed	Accelerating String Set Matching in FPGA Hardware for Bioinformatics Research
title_short	Accelerating String Set Matching in FPGA Hardware for Bioinformatics Research
title_sort	accelerating string set matching in fpga hardware for bioinformatics research
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2374783/ https://www.ncbi.nlm.nih.gov/pubmed/18412963 http://dx.doi.org/10.1186/1471-2105-9-197
work_keys_str_mv	AT dandassyoginders acceleratingstringsetmatchinginfpgahardwareforbioinformaticsresearch AT burgessshanec acceleratingstringsetmatchinginfpgahardwareforbioinformaticsresearch AT lawrencemark acceleratingstringsetmatchinginfpgahardwareforbioinformaticsresearch AT bridgessusanm acceleratingstringsetmatchinginfpgahardwareforbioinformaticsresearch

Accelerating String Set Matching in FPGA Hardware for Bioinformatics Research

Ejemplares similares