Cargando…

BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU

This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, u...

Descripción completa

Detalles Bibliográficos
Autores principales: Luo, Ruibang, Wong, Yiu-Lun, Law, Wai-Chun, Lee, Lap-Kei, Cheung, Jeanno, Liu, Chi-Man, Lam, Tak-Wah
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4060040/
https://www.ncbi.nlm.nih.gov/pubmed/24949238
http://dx.doi.org/10.7717/peerj.421
_version_ 1782321312153206784
author Luo, Ruibang
Wong, Yiu-Lun
Law, Wai-Chun
Lee, Lap-Kei
Cheung, Jeanno
Liu, Chi-Man
Lam, Tak-Wah
author_facet Luo, Ruibang
Wong, Yiu-Lun
Law, Wai-Chun
Lee, Lap-Kei
Cheung, Jeanno
Liu, Chi-Man
Lam, Tak-Wah
author_sort Luo, Ruibang
collection PubMed
description This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads), or just 25 min for 210-fold whole exome sequencing. BALSA’s speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa.
format Online
Article
Text
id pubmed-4060040
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-40600402014-06-19 BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU Luo, Ruibang Wong, Yiu-Lun Law, Wai-Chun Lee, Lap-Kei Cheung, Jeanno Liu, Chi-Man Lam, Tak-Wah PeerJ Bioinformatics This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads), or just 25 min for 210-fold whole exome sequencing. BALSA’s speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa. PeerJ Inc. 2014-06-03 /pmc/articles/PMC4060040/ /pubmed/24949238 http://dx.doi.org/10.7717/peerj.421 Text en © 2014 Luo et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Luo, Ruibang
Wong, Yiu-Lun
Law, Wai-Chun
Lee, Lap-Kei
Cheung, Jeanno
Liu, Chi-Man
Lam, Tak-Wah
BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU
title BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU
title_full BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU
title_fullStr BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU
title_full_unstemmed BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU
title_short BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU
title_sort balsa: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by gpu
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4060040/
https://www.ncbi.nlm.nih.gov/pubmed/24949238
http://dx.doi.org/10.7717/peerj.421
work_keys_str_mv AT luoruibang balsaintegratedsecondaryanalysisforwholegenomeandwholeexomesequencingacceleratedbygpu
AT wongyiulun balsaintegratedsecondaryanalysisforwholegenomeandwholeexomesequencingacceleratedbygpu
AT lawwaichun balsaintegratedsecondaryanalysisforwholegenomeandwholeexomesequencingacceleratedbygpu
AT leelapkei balsaintegratedsecondaryanalysisforwholegenomeandwholeexomesequencingacceleratedbygpu
AT cheungjeanno balsaintegratedsecondaryanalysisforwholegenomeandwholeexomesequencingacceleratedbygpu
AT liuchiman balsaintegratedsecondaryanalysisforwholegenomeandwholeexomesequencingacceleratedbygpu
AT lamtakwah balsaintegratedsecondaryanalysisforwholegenomeandwholeexomesequencingacceleratedbygpu