Cargando…
SNPranker 2.0: a gene-centric data mining tool for diseases associated SNP prioritization in GWAS
BACKGROUND: The capability of correlating specific genotypes with human diseases is a complex issue in spite of all advantages arisen from high-throughput technologies, such as Genome Wide Association Studies (GWAS). New tools for genetic variants interpretation and for Single Nucleotide Polymorphis...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3548692/ https://www.ncbi.nlm.nih.gov/pubmed/23369106 http://dx.doi.org/10.1186/1471-2105-14-S1-S9 |
_version_ | 1782256348176580608 |
---|---|
author | Merelli, Ivan Calabria, Andrea Cozzi, Paolo Viti, Federica Mosca, Ettore Milanesi, Luciano |
author_facet | Merelli, Ivan Calabria, Andrea Cozzi, Paolo Viti, Federica Mosca, Ettore Milanesi, Luciano |
author_sort | Merelli, Ivan |
collection | PubMed |
description | BACKGROUND: The capability of correlating specific genotypes with human diseases is a complex issue in spite of all advantages arisen from high-throughput technologies, such as Genome Wide Association Studies (GWAS). New tools for genetic variants interpretation and for Single Nucleotide Polymorphisms (SNPs) prioritization are actually needed. Given a list of the most relevant SNPs statistically associated to a specific pathology as result of a genotype study, a critical issue is the identification of genes that are effectively related to the disease by re-scoring the importance of the identified genetic variations. Vice versa, given a list of genes, it can be of great importance to predict which SNPs can be involved in the onset of a particular disease, in order to focus the research on their effects. RESULTS: We propose a new bioinformatics approach to support biological data mining in the analysis and interpretation of SNPs associated to pathologies. This system can be employed to design custom genotyping chips for disease-oriented studies and to re-score GWAS results. The proposed method relies (1) on the data integration of public resources using a gene-centric database design, (2) on the evaluation of a set of static biomolecular annotations, defined as features, and (3) on the SNP scoring function, which computes SNP scores using parameters and weights set by users. We employed a machine learning classifier to set default feature weights and an ontological annotation layer to enable the enrichment of the input gene set. We implemented our method as a web tool called SNPranker 2.0 (http://www.itb.cnr.it/snpranker), improving our first published release of this system. A user-friendly interface allows the input of a list of genes, SNPs or a biological process, and to customize the features set with relative weights. As result, SNPranker 2.0 returns a list of SNPs, localized within input and ontologically enriched genes, combined with their prioritization scores. CONCLUSIONS: Different databases and resources are already available for SNPs annotation, but they do not prioritize or re-score SNPs relying on a-priori biomolecular knowledge. SNPranker 2.0 attempts to fill this gap through a user-friendly integrated web resource. End users, such as researchers in medical genetics and epidemiology, may find in SNPranker 2.0 a new tool for data mining and interpretation able to support SNPs analysis. Possible scenarios are GWAS data re-scoring, SNPs selection for custom genotyping arrays and SNPs/diseases association studies. |
format | Online Article Text |
id | pubmed-3548692 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-35486922013-02-04 SNPranker 2.0: a gene-centric data mining tool for diseases associated SNP prioritization in GWAS Merelli, Ivan Calabria, Andrea Cozzi, Paolo Viti, Federica Mosca, Ettore Milanesi, Luciano BMC Bioinformatics Research BACKGROUND: The capability of correlating specific genotypes with human diseases is a complex issue in spite of all advantages arisen from high-throughput technologies, such as Genome Wide Association Studies (GWAS). New tools for genetic variants interpretation and for Single Nucleotide Polymorphisms (SNPs) prioritization are actually needed. Given a list of the most relevant SNPs statistically associated to a specific pathology as result of a genotype study, a critical issue is the identification of genes that are effectively related to the disease by re-scoring the importance of the identified genetic variations. Vice versa, given a list of genes, it can be of great importance to predict which SNPs can be involved in the onset of a particular disease, in order to focus the research on their effects. RESULTS: We propose a new bioinformatics approach to support biological data mining in the analysis and interpretation of SNPs associated to pathologies. This system can be employed to design custom genotyping chips for disease-oriented studies and to re-score GWAS results. The proposed method relies (1) on the data integration of public resources using a gene-centric database design, (2) on the evaluation of a set of static biomolecular annotations, defined as features, and (3) on the SNP scoring function, which computes SNP scores using parameters and weights set by users. We employed a machine learning classifier to set default feature weights and an ontological annotation layer to enable the enrichment of the input gene set. We implemented our method as a web tool called SNPranker 2.0 (http://www.itb.cnr.it/snpranker), improving our first published release of this system. A user-friendly interface allows the input of a list of genes, SNPs or a biological process, and to customize the features set with relative weights. As result, SNPranker 2.0 returns a list of SNPs, localized within input and ontologically enriched genes, combined with their prioritization scores. CONCLUSIONS: Different databases and resources are already available for SNPs annotation, but they do not prioritize or re-score SNPs relying on a-priori biomolecular knowledge. SNPranker 2.0 attempts to fill this gap through a user-friendly integrated web resource. End users, such as researchers in medical genetics and epidemiology, may find in SNPranker 2.0 a new tool for data mining and interpretation able to support SNPs analysis. Possible scenarios are GWAS data re-scoring, SNPs selection for custom genotyping arrays and SNPs/diseases association studies. BioMed Central 2013-01-14 /pmc/articles/PMC3548692/ /pubmed/23369106 http://dx.doi.org/10.1186/1471-2105-14-S1-S9 Text en Copyright ©2013 Merelli et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Merelli, Ivan Calabria, Andrea Cozzi, Paolo Viti, Federica Mosca, Ettore Milanesi, Luciano SNPranker 2.0: a gene-centric data mining tool for diseases associated SNP prioritization in GWAS |
title | SNPranker 2.0: a gene-centric data mining tool for diseases associated SNP prioritization in GWAS |
title_full | SNPranker 2.0: a gene-centric data mining tool for diseases associated SNP prioritization in GWAS |
title_fullStr | SNPranker 2.0: a gene-centric data mining tool for diseases associated SNP prioritization in GWAS |
title_full_unstemmed | SNPranker 2.0: a gene-centric data mining tool for diseases associated SNP prioritization in GWAS |
title_short | SNPranker 2.0: a gene-centric data mining tool for diseases associated SNP prioritization in GWAS |
title_sort | snpranker 2.0: a gene-centric data mining tool for diseases associated snp prioritization in gwas |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3548692/ https://www.ncbi.nlm.nih.gov/pubmed/23369106 http://dx.doi.org/10.1186/1471-2105-14-S1-S9 |
work_keys_str_mv | AT merelliivan snpranker20agenecentricdataminingtoolfordiseasesassociatedsnpprioritizationingwas AT calabriaandrea snpranker20agenecentricdataminingtoolfordiseasesassociatedsnpprioritizationingwas AT cozzipaolo snpranker20agenecentricdataminingtoolfordiseasesassociatedsnpprioritizationingwas AT vitifederica snpranker20agenecentricdataminingtoolfordiseasesassociatedsnpprioritizationingwas AT moscaettore snpranker20agenecentricdataminingtoolfordiseasesassociatedsnpprioritizationingwas AT milanesiluciano snpranker20agenecentricdataminingtoolfordiseasesassociatedsnpprioritizationingwas |