Cargando…
FamSeq: A Variant Calling Program for Family-Based Sequencing Data Using Graphics Processing Units
Various algorithms have been developed for variant calling using next-generation sequencing data, and various methods have been applied to reduce the associated false positive and false negative rates. Few variant calling programs, however, utilize the pedigree information when the family-based sequ...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4214554/ https://www.ncbi.nlm.nih.gov/pubmed/25357123 http://dx.doi.org/10.1371/journal.pcbi.1003880 |
_version_ | 1782341975319511040 |
---|---|
author | Peng, Gang Fan, Yu Wang, Wenyi |
author_facet | Peng, Gang Fan, Yu Wang, Wenyi |
author_sort | Peng, Gang |
collection | PubMed |
description | Various algorithms have been developed for variant calling using next-generation sequencing data, and various methods have been applied to reduce the associated false positive and false negative rates. Few variant calling programs, however, utilize the pedigree information when the family-based sequencing data are available. Here, we present a program, FamSeq, which reduces both false positive and false negative rates by incorporating the pedigree information from the Mendelian genetic model into variant calling. To accommodate variations in data complexity, FamSeq consists of four distinct implementations of the Mendelian genetic model: the Bayesian network algorithm, a graphics processing unit version of the Bayesian network algorithm, the Elston-Stewart algorithm and the Markov chain Monte Carlo algorithm. To make the software efficient and applicable to large families, we parallelized the Bayesian network algorithm that copes with pedigrees with inbreeding loops without losing calculation precision on an NVIDIA graphics processing unit. In order to compare the difference in the four methods, we applied FamSeq to pedigree sequencing data with family sizes that varied from 7 to 12. When there is no inbreeding loop in the pedigree, the Elston-Stewart algorithm gives analytical results in a short time. If there are inbreeding loops in the pedigree, we recommend the Bayesian network method, which provides exact answers. To improve the computing speed of the Bayesian network method, we parallelized the computation on a graphics processing unit. This allowed the Bayesian network method to process the whole genome sequencing data of a family of 12 individuals within two days, which was a 10-fold time reduction compared to the time required for this computation on a central processing unit. |
format | Online Article Text |
id | pubmed-4214554 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-42145542014-11-05 FamSeq: A Variant Calling Program for Family-Based Sequencing Data Using Graphics Processing Units Peng, Gang Fan, Yu Wang, Wenyi PLoS Comput Biol Research Article Various algorithms have been developed for variant calling using next-generation sequencing data, and various methods have been applied to reduce the associated false positive and false negative rates. Few variant calling programs, however, utilize the pedigree information when the family-based sequencing data are available. Here, we present a program, FamSeq, which reduces both false positive and false negative rates by incorporating the pedigree information from the Mendelian genetic model into variant calling. To accommodate variations in data complexity, FamSeq consists of four distinct implementations of the Mendelian genetic model: the Bayesian network algorithm, a graphics processing unit version of the Bayesian network algorithm, the Elston-Stewart algorithm and the Markov chain Monte Carlo algorithm. To make the software efficient and applicable to large families, we parallelized the Bayesian network algorithm that copes with pedigrees with inbreeding loops without losing calculation precision on an NVIDIA graphics processing unit. In order to compare the difference in the four methods, we applied FamSeq to pedigree sequencing data with family sizes that varied from 7 to 12. When there is no inbreeding loop in the pedigree, the Elston-Stewart algorithm gives analytical results in a short time. If there are inbreeding loops in the pedigree, we recommend the Bayesian network method, which provides exact answers. To improve the computing speed of the Bayesian network method, we parallelized the computation on a graphics processing unit. This allowed the Bayesian network method to process the whole genome sequencing data of a family of 12 individuals within two days, which was a 10-fold time reduction compared to the time required for this computation on a central processing unit. Public Library of Science 2014-10-30 /pmc/articles/PMC4214554/ /pubmed/25357123 http://dx.doi.org/10.1371/journal.pcbi.1003880 Text en © 2014 Peng et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Peng, Gang Fan, Yu Wang, Wenyi FamSeq: A Variant Calling Program for Family-Based Sequencing Data Using Graphics Processing Units |
title | FamSeq: A Variant Calling Program for Family-Based Sequencing Data Using Graphics Processing Units |
title_full | FamSeq: A Variant Calling Program for Family-Based Sequencing Data Using Graphics Processing Units |
title_fullStr | FamSeq: A Variant Calling Program for Family-Based Sequencing Data Using Graphics Processing Units |
title_full_unstemmed | FamSeq: A Variant Calling Program for Family-Based Sequencing Data Using Graphics Processing Units |
title_short | FamSeq: A Variant Calling Program for Family-Based Sequencing Data Using Graphics Processing Units |
title_sort | famseq: a variant calling program for family-based sequencing data using graphics processing units |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4214554/ https://www.ncbi.nlm.nih.gov/pubmed/25357123 http://dx.doi.org/10.1371/journal.pcbi.1003880 |
work_keys_str_mv | AT penggang famseqavariantcallingprogramforfamilybasedsequencingdatausinggraphicsprocessingunits AT fanyu famseqavariantcallingprogramforfamilybasedsequencingdatausinggraphicsprocessingunits AT wangwenyi famseqavariantcallingprogramforfamilybasedsequencingdatausinggraphicsprocessingunits |