Cargando…

Whole genome association mapping by incompatibilities and local perfect phylogenies

BACKGROUND: With current technology, vast amounts of data can be cheaply and efficiently produced in association studies, and to prevent data analysis to become the bottleneck of studies, fast and efficient analysis methods that scale to such data set sizes must be developed. RESULTS: We present a f...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mailund, Thomas, Besenbacher, Søren, Schierup, Mikkel H
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1624851/ https://www.ncbi.nlm.nih.gov/pubmed/17042942 http://dx.doi.org/10.1186/1471-2105-7-454

_version_	1782130575459483648
author	Mailund, Thomas Besenbacher, Søren Schierup, Mikkel H
author_facet	Mailund, Thomas Besenbacher, Søren Schierup, Mikkel H
author_sort	Mailund, Thomas
collection	PubMed
description	BACKGROUND: With current technology, vast amounts of data can be cheaply and efficiently produced in association studies, and to prevent data analysis to become the bottleneck of studies, fast and efficient analysis methods that scale to such data set sizes must be developed. RESULTS: We present a fast method for accurate localisation of disease causing variants in high density case-control association mapping experiments with large numbers of cases and controls. The method searches for significant clustering of case chromosomes in the "perfect" phylogenetic tree defined by the largest region around each marker that is compatible with a single phylogenetic tree. This perfect phylogenetic tree is treated as a decision tree for determining disease status, and scored by its accuracy as a decision tree. The rationale for this is that the perfect phylogeny near a disease affecting mutation should provide more information about the affected/unaffected classification than random trees. If regions of compatibility contain few markers, due to e.g. large marker spacing, the algorithm can allow the inclusion of incompatibility markers in order to enlarge the regions prior to estimating their phylogeny. Haplotype data and phased genotype data can be analysed. The power and efficiency of the method is investigated on 1) simulated genotype data under different models of disease determination 2) artificial data sets created from the HapMap ressource, and 3) data sets used for testing of other methods in order to compare with these. Our method has the same accuracy as single marker association (SMA) in the simplest case of a single disease causing mutation and a constant recombination rate. However, when it comes to more complex scenarios of mutation heterogeneity and more complex haplotype structure such as found in the HapMap data our method outperforms SMA as well as other fast, data mining approaches such as HapMiner and Haplotype Pattern Mining (HPM) despite being significantly faster. For unphased genotype data, an initial step of estimating the phase only slightly decreases the power of the method. The method was also found to accurately localise the known susceptibility variants in an empirical data set – the ΔF508 mutation for cystic fibrosis – where the susceptibility variant is already known – and to find significant signals for association between the CYP2D6 gene and poor drug metabolism, although for this dataset the highest association score is about 60 kb from the CYP2D6 gene. CONCLUSION: Our method has been implemented in the Blossoc (BLOck aSSOCiation) software. Using Blossoc, genome wide chip-based surveys of 3 million SNPs in 1000 cases and 1000 controls can be analysed in less than two CPU hours.
format	Text
id	pubmed-1624851
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-16248512006-10-26 Whole genome association mapping by incompatibilities and local perfect phylogenies Mailund, Thomas Besenbacher, Søren Schierup, Mikkel H BMC Bioinformatics Methodology Article BACKGROUND: With current technology, vast amounts of data can be cheaply and efficiently produced in association studies, and to prevent data analysis to become the bottleneck of studies, fast and efficient analysis methods that scale to such data set sizes must be developed. RESULTS: We present a fast method for accurate localisation of disease causing variants in high density case-control association mapping experiments with large numbers of cases and controls. The method searches for significant clustering of case chromosomes in the "perfect" phylogenetic tree defined by the largest region around each marker that is compatible with a single phylogenetic tree. This perfect phylogenetic tree is treated as a decision tree for determining disease status, and scored by its accuracy as a decision tree. The rationale for this is that the perfect phylogeny near a disease affecting mutation should provide more information about the affected/unaffected classification than random trees. If regions of compatibility contain few markers, due to e.g. large marker spacing, the algorithm can allow the inclusion of incompatibility markers in order to enlarge the regions prior to estimating their phylogeny. Haplotype data and phased genotype data can be analysed. The power and efficiency of the method is investigated on 1) simulated genotype data under different models of disease determination 2) artificial data sets created from the HapMap ressource, and 3) data sets used for testing of other methods in order to compare with these. Our method has the same accuracy as single marker association (SMA) in the simplest case of a single disease causing mutation and a constant recombination rate. However, when it comes to more complex scenarios of mutation heterogeneity and more complex haplotype structure such as found in the HapMap data our method outperforms SMA as well as other fast, data mining approaches such as HapMiner and Haplotype Pattern Mining (HPM) despite being significantly faster. For unphased genotype data, an initial step of estimating the phase only slightly decreases the power of the method. The method was also found to accurately localise the known susceptibility variants in an empirical data set – the ΔF508 mutation for cystic fibrosis – where the susceptibility variant is already known – and to find significant signals for association between the CYP2D6 gene and poor drug metabolism, although for this dataset the highest association score is about 60 kb from the CYP2D6 gene. CONCLUSION: Our method has been implemented in the Blossoc (BLOck aSSOCiation) software. Using Blossoc, genome wide chip-based surveys of 3 million SNPs in 1000 cases and 1000 controls can be analysed in less than two CPU hours. BioMed Central 2006-10-16 /pmc/articles/PMC1624851/ /pubmed/17042942 http://dx.doi.org/10.1186/1471-2105-7-454 Text en Copyright © 2006 Mailund et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Mailund, Thomas Besenbacher, Søren Schierup, Mikkel H Whole genome association mapping by incompatibilities and local perfect phylogenies
title	Whole genome association mapping by incompatibilities and local perfect phylogenies
title_full	Whole genome association mapping by incompatibilities and local perfect phylogenies
title_fullStr	Whole genome association mapping by incompatibilities and local perfect phylogenies
title_full_unstemmed	Whole genome association mapping by incompatibilities and local perfect phylogenies
title_short	Whole genome association mapping by incompatibilities and local perfect phylogenies
title_sort	whole genome association mapping by incompatibilities and local perfect phylogenies
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1624851/ https://www.ncbi.nlm.nih.gov/pubmed/17042942 http://dx.doi.org/10.1186/1471-2105-7-454
work_keys_str_mv	AT mailundthomas wholegenomeassociationmappingbyincompatibilitiesandlocalperfectphylogenies AT besenbachersøren wholegenomeassociationmappingbyincompatibilitiesandlocalperfectphylogenies AT schierupmikkelh wholegenomeassociationmappingbyincompatibilitiesandlocalperfectphylogenies

Whole genome association mapping by incompatibilities and local perfect phylogenies

Ejemplares similares