Cargando…

A novel similarity-measure for the analysis of genetic data in complex phenotypes

BACKGROUND: Recent technological advances in DNA sequencing and genotyping have led to the accumulation of a remarkable quantity of data on genetic polymorphisms. However, the development of new statistical and computational tools for effective processing of these data has not been equally as fast....

Descripción completa

Detalles Bibliográficos
Autores principales:	Lagani, Vincenzo, Montesanto, Alberto, Di Cianni, Fausta, Moreno, Victor, Landi, Stefano, Conforti, Domenico, Rose, Giuseppina, Passarino, Giuseppe
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2697648/ https://www.ncbi.nlm.nih.gov/pubmed/19534750 http://dx.doi.org/10.1186/1471-2105-10-S6-S24

_version_	1782168347834580992
author	Lagani, Vincenzo Montesanto, Alberto Di Cianni, Fausta Moreno, Victor Landi, Stefano Conforti, Domenico Rose, Giuseppina Passarino, Giuseppe
author_facet	Lagani, Vincenzo Montesanto, Alberto Di Cianni, Fausta Moreno, Victor Landi, Stefano Conforti, Domenico Rose, Giuseppina Passarino, Giuseppe
author_sort	Lagani, Vincenzo
collection	PubMed
description	BACKGROUND: Recent technological advances in DNA sequencing and genotyping have led to the accumulation of a remarkable quantity of data on genetic polymorphisms. However, the development of new statistical and computational tools for effective processing of these data has not been equally as fast. In particular, Machine Learning literature is limited to relatively few papers which are focused on the development and application of data mining methods for the analysis of genetic variability. On the other hand, these papers apply to genetic data procedures which had been developed for a different kind of analysis and do not take into account the peculiarities of population genetics. The aim of our study was to define a new similarity measure, specifically conceived for measuring the similarity between the genetic profiles of two groups of subjects (i.e., cases and controls) taking into account that genetic profiles are usually distributed in a population group according to the Hardy Weinberg equilibrium. RESULTS: We set up a new kernel function consisting of a similarity measure between groups of subjects genotyped for numerous genetic loci. This measure weighs different genetic profiles according to the estimates of gene frequencies at Hardy-Weinberg equilibrium in the population. We named this function the "Hardy-Weinberg kernel". The effectiveness of the Hardy-Weinberg kernel was compared to the performance of the well established linear kernel. We found that the Hardy-Weinberg kernel significantly outperformed the linear kernel in a number of experiments where we used either simulated data or real data. CONCLUSION: The "Hardy-Weinberg kernel" reported here represents one of the first attempts at incorporating genetic knowledge into the definition of a kernel function designed for the analysis of genetic data. We show that the best performance of the "Hardy-Weinberg kernel" is observed when rare genotypes have different frequencies in cases and controls. The ability to capture the effect of rare genotypes on phenotypic traits might be a very important and useful feature, as most of the current statistical tools loose most of their statistical power when rare genotypes are involved in the susceptibility to the trait under study.
format	Text
id	pubmed-2697648
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-26976482009-06-16 A novel similarity-measure for the analysis of genetic data in complex phenotypes Lagani, Vincenzo Montesanto, Alberto Di Cianni, Fausta Moreno, Victor Landi, Stefano Conforti, Domenico Rose, Giuseppina Passarino, Giuseppe BMC Bioinformatics Proceedings BACKGROUND: Recent technological advances in DNA sequencing and genotyping have led to the accumulation of a remarkable quantity of data on genetic polymorphisms. However, the development of new statistical and computational tools for effective processing of these data has not been equally as fast. In particular, Machine Learning literature is limited to relatively few papers which are focused on the development and application of data mining methods for the analysis of genetic variability. On the other hand, these papers apply to genetic data procedures which had been developed for a different kind of analysis and do not take into account the peculiarities of population genetics. The aim of our study was to define a new similarity measure, specifically conceived for measuring the similarity between the genetic profiles of two groups of subjects (i.e., cases and controls) taking into account that genetic profiles are usually distributed in a population group according to the Hardy Weinberg equilibrium. RESULTS: We set up a new kernel function consisting of a similarity measure between groups of subjects genotyped for numerous genetic loci. This measure weighs different genetic profiles according to the estimates of gene frequencies at Hardy-Weinberg equilibrium in the population. We named this function the "Hardy-Weinberg kernel". The effectiveness of the Hardy-Weinberg kernel was compared to the performance of the well established linear kernel. We found that the Hardy-Weinberg kernel significantly outperformed the linear kernel in a number of experiments where we used either simulated data or real data. CONCLUSION: The "Hardy-Weinberg kernel" reported here represents one of the first attempts at incorporating genetic knowledge into the definition of a kernel function designed for the analysis of genetic data. We show that the best performance of the "Hardy-Weinberg kernel" is observed when rare genotypes have different frequencies in cases and controls. The ability to capture the effect of rare genotypes on phenotypic traits might be a very important and useful feature, as most of the current statistical tools loose most of their statistical power when rare genotypes are involved in the susceptibility to the trait under study. BioMed Central 2009-06-16 /pmc/articles/PMC2697648/ /pubmed/19534750 http://dx.doi.org/10.1186/1471-2105-10-S6-S24 Text en Copyright © 2009 Lagani et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Lagani, Vincenzo Montesanto, Alberto Di Cianni, Fausta Moreno, Victor Landi, Stefano Conforti, Domenico Rose, Giuseppina Passarino, Giuseppe A novel similarity-measure for the analysis of genetic data in complex phenotypes
title	A novel similarity-measure for the analysis of genetic data in complex phenotypes
title_full	A novel similarity-measure for the analysis of genetic data in complex phenotypes
title_fullStr	A novel similarity-measure for the analysis of genetic data in complex phenotypes
title_full_unstemmed	A novel similarity-measure for the analysis of genetic data in complex phenotypes
title_short	A novel similarity-measure for the analysis of genetic data in complex phenotypes
title_sort	novel similarity-measure for the analysis of genetic data in complex phenotypes
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2697648/ https://www.ncbi.nlm.nih.gov/pubmed/19534750 http://dx.doi.org/10.1186/1471-2105-10-S6-S24
work_keys_str_mv	AT laganivincenzo anovelsimilaritymeasurefortheanalysisofgeneticdataincomplexphenotypes AT montesantoalberto anovelsimilaritymeasurefortheanalysisofgeneticdataincomplexphenotypes AT diciannifausta anovelsimilaritymeasurefortheanalysisofgeneticdataincomplexphenotypes AT morenovictor anovelsimilaritymeasurefortheanalysisofgeneticdataincomplexphenotypes AT landistefano anovelsimilaritymeasurefortheanalysisofgeneticdataincomplexphenotypes AT confortidomenico anovelsimilaritymeasurefortheanalysisofgeneticdataincomplexphenotypes AT rosegiuseppina anovelsimilaritymeasurefortheanalysisofgeneticdataincomplexphenotypes AT passarinogiuseppe anovelsimilaritymeasurefortheanalysisofgeneticdataincomplexphenotypes AT laganivincenzo novelsimilaritymeasurefortheanalysisofgeneticdataincomplexphenotypes AT montesantoalberto novelsimilaritymeasurefortheanalysisofgeneticdataincomplexphenotypes AT diciannifausta novelsimilaritymeasurefortheanalysisofgeneticdataincomplexphenotypes AT morenovictor novelsimilaritymeasurefortheanalysisofgeneticdataincomplexphenotypes AT landistefano novelsimilaritymeasurefortheanalysisofgeneticdataincomplexphenotypes AT confortidomenico novelsimilaritymeasurefortheanalysisofgeneticdataincomplexphenotypes AT rosegiuseppina novelsimilaritymeasurefortheanalysisofgeneticdataincomplexphenotypes AT passarinogiuseppe novelsimilaritymeasurefortheanalysisofgeneticdataincomplexphenotypes

A novel similarity-measure for the analysis of genetic data in complex phenotypes

Ejemplares similares