Cargando…

Shape-IT: new rapid and accurate algorithm for haplotype inference

BACKGROUND: We have developed a new computational algorithm, Shape-IT, to infer haplotypes under the genetic model of coalescence with recombination developed by Stephens et al in Phase v2.1. It runs much faster than Phase v2.1 while exhibiting the same accuracy. The major algorithmic improvements r...

Descripción completa

Detalles Bibliográficos
Autores principales: Delaneau, Olivier, Coulonges, Cédric, Zagury, Jean-François
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2647951/
https://www.ncbi.nlm.nih.gov/pubmed/19087329
http://dx.doi.org/10.1186/1471-2105-9-540
_version_ 1782164955895693312
author Delaneau, Olivier
Coulonges, Cédric
Zagury, Jean-François
author_facet Delaneau, Olivier
Coulonges, Cédric
Zagury, Jean-François
author_sort Delaneau, Olivier
collection PubMed
description BACKGROUND: We have developed a new computational algorithm, Shape-IT, to infer haplotypes under the genetic model of coalescence with recombination developed by Stephens et al in Phase v2.1. It runs much faster than Phase v2.1 while exhibiting the same accuracy. The major algorithmic improvements rely on the use of binary trees to represent the sets of candidate haplotypes for each individual. These binary tree representations: (1) speed up the computations of posterior probabilities of the haplotypes by avoiding the redundant operations made in Phase v2.1, and (2) overcome the exponential aspect of the haplotypes inference problem by the smart exploration of the most plausible pathways (ie. haplotypes) in the binary trees. RESULTS: Our results show that Shape-IT is several orders of magnitude faster than Phase v2.1 while being as accurate. For instance, Shape-IT runs 50 times faster than Phase v2.1 to compute the haplotypes of 200 subjects on 6,000 segments of 50 SNPs extracted from a standard Illumina 300 K chip (13 days instead of 630 days). We also compared Shape-IT with other widely used software, Gerbil, PL-EM, Fastphase, 2SNP, and Ishape in various tests: Shape-IT and Phase v2.1 were the most accurate in all cases, followed by Ishape and Fastphase. As a matter of speed, Shape-IT was faster than Ishape and Fastphase for datasets smaller than 100 SNPs, but Fastphase became faster -but still less accurate- to infer haplotypes on larger SNP datasets. CONCLUSION: Shape-IT deserves to be extensively used for regular haplotype inference but also in the context of the new high-throughput genotyping chips since it permits to fit the genetic model of Phase v2.1 on large datasets. This new algorithm based on tree representations could be used in other HMM-based haplotype inference software and may apply more largely to other fields using HMM.
format Text
id pubmed-2647951
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26479512009-02-26 Shape-IT: new rapid and accurate algorithm for haplotype inference Delaneau, Olivier Coulonges, Cédric Zagury, Jean-François BMC Bioinformatics Research Article BACKGROUND: We have developed a new computational algorithm, Shape-IT, to infer haplotypes under the genetic model of coalescence with recombination developed by Stephens et al in Phase v2.1. It runs much faster than Phase v2.1 while exhibiting the same accuracy. The major algorithmic improvements rely on the use of binary trees to represent the sets of candidate haplotypes for each individual. These binary tree representations: (1) speed up the computations of posterior probabilities of the haplotypes by avoiding the redundant operations made in Phase v2.1, and (2) overcome the exponential aspect of the haplotypes inference problem by the smart exploration of the most plausible pathways (ie. haplotypes) in the binary trees. RESULTS: Our results show that Shape-IT is several orders of magnitude faster than Phase v2.1 while being as accurate. For instance, Shape-IT runs 50 times faster than Phase v2.1 to compute the haplotypes of 200 subjects on 6,000 segments of 50 SNPs extracted from a standard Illumina 300 K chip (13 days instead of 630 days). We also compared Shape-IT with other widely used software, Gerbil, PL-EM, Fastphase, 2SNP, and Ishape in various tests: Shape-IT and Phase v2.1 were the most accurate in all cases, followed by Ishape and Fastphase. As a matter of speed, Shape-IT was faster than Ishape and Fastphase for datasets smaller than 100 SNPs, but Fastphase became faster -but still less accurate- to infer haplotypes on larger SNP datasets. CONCLUSION: Shape-IT deserves to be extensively used for regular haplotype inference but also in the context of the new high-throughput genotyping chips since it permits to fit the genetic model of Phase v2.1 on large datasets. This new algorithm based on tree representations could be used in other HMM-based haplotype inference software and may apply more largely to other fields using HMM. BioMed Central 2008-12-16 /pmc/articles/PMC2647951/ /pubmed/19087329 http://dx.doi.org/10.1186/1471-2105-9-540 Text en Copyright © 2008 Delaneau et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Delaneau, Olivier
Coulonges, Cédric
Zagury, Jean-François
Shape-IT: new rapid and accurate algorithm for haplotype inference
title Shape-IT: new rapid and accurate algorithm for haplotype inference
title_full Shape-IT: new rapid and accurate algorithm for haplotype inference
title_fullStr Shape-IT: new rapid and accurate algorithm for haplotype inference
title_full_unstemmed Shape-IT: new rapid and accurate algorithm for haplotype inference
title_short Shape-IT: new rapid and accurate algorithm for haplotype inference
title_sort shape-it: new rapid and accurate algorithm for haplotype inference
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2647951/
https://www.ncbi.nlm.nih.gov/pubmed/19087329
http://dx.doi.org/10.1186/1471-2105-9-540
work_keys_str_mv AT delaneauolivier shapeitnewrapidandaccuratealgorithmforhaplotypeinference
AT coulongescedric shapeitnewrapidandaccuratealgorithmforhaplotypeinference
AT zaguryjeanfrancois shapeitnewrapidandaccuratealgorithmforhaplotypeinference