Cargando…

An integrated database-pipeline system for studying single nucleotide polymorphisms and diseases

BACKGROUND: Studies on the relationship between disease and genetic variations such as single nucleotide polymorphisms (SNPs) are important. Genetic variations can cause disease by influencing important biological regulation processes. Despite the needs for analyzing SNP and disease correlation, mos...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yang, Jin Ok, Hwang, Sohyun, Oh, Jeongsu, Bhak, Jong, Sohn, Tae-Kwon
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2638159/ https://www.ncbi.nlm.nih.gov/pubmed/19091018 http://dx.doi.org/10.1186/1471-2105-9-S12-S19

_version_	1782164398464303104
author	Yang, Jin Ok Hwang, Sohyun Oh, Jeongsu Bhak, Jong Sohn, Tae-Kwon
author_facet	Yang, Jin Ok Hwang, Sohyun Oh, Jeongsu Bhak, Jong Sohn, Tae-Kwon
author_sort	Yang, Jin Ok
collection	PubMed
description	BACKGROUND: Studies on the relationship between disease and genetic variations such as single nucleotide polymorphisms (SNPs) are important. Genetic variations can cause disease by influencing important biological regulation processes. Despite the needs for analyzing SNP and disease correlation, most existing databases provide information only on functional variants at specific locations on the genome, or deal with only a few genes associated with disease. There is no combined resource to widely support gene-, SNP-, and disease-related information, and to capture relationships among such data. Therefore, we developed an integrated database-pipeline system for studying SNPs and diseases. RESULTS: To implement the pipeline system for the integrated database, we first unified complicated and redundant disease terms and gene names using the Unified Medical Language System (UMLS) for classification and noun modification, and the HUGO Gene Nomenclature Committee (HGNC) and NCBI gene databases. Next, we collected and integrated representative databases for three categories of information. For genes and proteins, we examined the NCBI mRNA, UniProt, UCSC Table Track and MitoDat databases. For genetic variants we used the dbSNP, JSNP, ALFRED, and HGVbase databases. For disease, we employed OMIM, GAD, and HGMD databases. The database-pipeline system provides a disease thesaurus, including genes and SNPs associated with disease. The search results for these categories are available on the web page , and a genome browser is also available to highlight findings, as well as to permit the convenient review of potentially deleterious SNPs among genes strongly associated with specific diseases and clinical phenotypes. CONCLUSION: Our system is designed to capture the relationships between SNPs associated with disease and disease-causing genes. The integrated database-pipeline provides a list of candidate genes and SNP markers for evaluation in both epidemiological and molecular biological approaches to diseases-gene association studies. Furthermore, researchers then can decide semi-automatically the data set for association studies while considering the relationships between genetic variation and diseases. The database can also be economical for disease-association studies, as well as to facilitate an understanding of the processes which cause disease. Currently, the database contains 14,674 SNP records and 109,715 gene records associated with human diseases and it is updated at regular intervals.
format	Text
id	pubmed-2638159
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-26381592009-02-11 An integrated database-pipeline system for studying single nucleotide polymorphisms and diseases Yang, Jin Ok Hwang, Sohyun Oh, Jeongsu Bhak, Jong Sohn, Tae-Kwon BMC Bioinformatics Research BACKGROUND: Studies on the relationship between disease and genetic variations such as single nucleotide polymorphisms (SNPs) are important. Genetic variations can cause disease by influencing important biological regulation processes. Despite the needs for analyzing SNP and disease correlation, most existing databases provide information only on functional variants at specific locations on the genome, or deal with only a few genes associated with disease. There is no combined resource to widely support gene-, SNP-, and disease-related information, and to capture relationships among such data. Therefore, we developed an integrated database-pipeline system for studying SNPs and diseases. RESULTS: To implement the pipeline system for the integrated database, we first unified complicated and redundant disease terms and gene names using the Unified Medical Language System (UMLS) for classification and noun modification, and the HUGO Gene Nomenclature Committee (HGNC) and NCBI gene databases. Next, we collected and integrated representative databases for three categories of information. For genes and proteins, we examined the NCBI mRNA, UniProt, UCSC Table Track and MitoDat databases. For genetic variants we used the dbSNP, JSNP, ALFRED, and HGVbase databases. For disease, we employed OMIM, GAD, and HGMD databases. The database-pipeline system provides a disease thesaurus, including genes and SNPs associated with disease. The search results for these categories are available on the web page , and a genome browser is also available to highlight findings, as well as to permit the convenient review of potentially deleterious SNPs among genes strongly associated with specific diseases and clinical phenotypes. CONCLUSION: Our system is designed to capture the relationships between SNPs associated with disease and disease-causing genes. The integrated database-pipeline provides a list of candidate genes and SNP markers for evaluation in both epidemiological and molecular biological approaches to diseases-gene association studies. Furthermore, researchers then can decide semi-automatically the data set for association studies while considering the relationships between genetic variation and diseases. The database can also be economical for disease-association studies, as well as to facilitate an understanding of the processes which cause disease. Currently, the database contains 14,674 SNP records and 109,715 gene records associated with human diseases and it is updated at regular intervals. BioMed Central 2008-12-12 /pmc/articles/PMC2638159/ /pubmed/19091018 http://dx.doi.org/10.1186/1471-2105-9-S12-S19 Text en Copyright © 2008 Yang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Yang, Jin Ok Hwang, Sohyun Oh, Jeongsu Bhak, Jong Sohn, Tae-Kwon An integrated database-pipeline system for studying single nucleotide polymorphisms and diseases
title	An integrated database-pipeline system for studying single nucleotide polymorphisms and diseases
title_full	An integrated database-pipeline system for studying single nucleotide polymorphisms and diseases
title_fullStr	An integrated database-pipeline system for studying single nucleotide polymorphisms and diseases
title_full_unstemmed	An integrated database-pipeline system for studying single nucleotide polymorphisms and diseases
title_short	An integrated database-pipeline system for studying single nucleotide polymorphisms and diseases
title_sort	integrated database-pipeline system for studying single nucleotide polymorphisms and diseases
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2638159/ https://www.ncbi.nlm.nih.gov/pubmed/19091018 http://dx.doi.org/10.1186/1471-2105-9-S12-S19
work_keys_str_mv	AT yangjinok anintegrateddatabasepipelinesystemforstudyingsinglenucleotidepolymorphismsanddiseases AT hwangsohyun anintegrateddatabasepipelinesystemforstudyingsinglenucleotidepolymorphismsanddiseases AT ohjeongsu anintegrateddatabasepipelinesystemforstudyingsinglenucleotidepolymorphismsanddiseases AT bhakjong anintegrateddatabasepipelinesystemforstudyingsinglenucleotidepolymorphismsanddiseases AT sohntaekwon anintegrateddatabasepipelinesystemforstudyingsinglenucleotidepolymorphismsanddiseases AT yangjinok integrateddatabasepipelinesystemforstudyingsinglenucleotidepolymorphismsanddiseases AT hwangsohyun integrateddatabasepipelinesystemforstudyingsinglenucleotidepolymorphismsanddiseases AT ohjeongsu integrateddatabasepipelinesystemforstudyingsinglenucleotidepolymorphismsanddiseases AT bhakjong integrateddatabasepipelinesystemforstudyingsinglenucleotidepolymorphismsanddiseases AT sohntaekwon integrateddatabasepipelinesystemforstudyingsinglenucleotidepolymorphismsanddiseases

An integrated database-pipeline system for studying single nucleotide polymorphisms and diseases

Ejemplares similares