Cargando…
Phenotype Information Retrieval for Existing GWAS Studies
The database of Genotypes and Phenotypes (dbGaP) is archiving the results of different Genome Wide Association Studies (GWAS). dbGaP has a multitude of phenotype variables, but they are not harmonized across studies. We proposed a method to standardize phenotype variables by classifying similar vari...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Medical Informatics Association
201
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3845737/ https://www.ncbi.nlm.nih.gov/pubmed/24303228 |
Sumario: | The database of Genotypes and Phenotypes (dbGaP) is archiving the results of different Genome Wide Association Studies (GWAS). dbGaP has a multitude of phenotype variables, but they are not harmonized across studies. We proposed a method to standardize phenotype variables by classifying similar variables based on semantic distances. We first extracted variables description, enriched them using domain knowledge, and computed the distances among them. We used clustering techniques to classify the most similar variables. We used domain experts to audit clusters, annotated the clusters with appropriate labels, and used re-clustering to build a semantically-driven Genotypes and Phenotypes (sdGaP) ontology using the UMLS semantic network and metathesaurus. The sdGaP ontology allowed us to expand user queries and retrieve information using a semantic metric called density measure (DM). We illustrated the potential improvement of information retrieval using the sdGaP ontology in one search scenario using the variables from the Cleveland Family Study. |
---|