Cargando…
PhenoRank: reducing study bias in gene prioritization through simulation
MOTIVATION: Genome-wide association studies have identified thousands of loci associated with human disease, but identifying the causal genes at these loci is often difficult. Several methods prioritize genes most likely to be disease causing through the integration of biological data, including pro...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5949213/ https://www.ncbi.nlm.nih.gov/pubmed/29360927 http://dx.doi.org/10.1093/bioinformatics/bty028 |
_version_ | 1783322703197896704 |
---|---|
author | Cornish, Alex J David, Alessia Sternberg, Michael J E |
author_facet | Cornish, Alex J David, Alessia Sternberg, Michael J E |
author_sort | Cornish, Alex J |
collection | PubMed |
description | MOTIVATION: Genome-wide association studies have identified thousands of loci associated with human disease, but identifying the causal genes at these loci is often difficult. Several methods prioritize genes most likely to be disease causing through the integration of biological data, including protein–protein interaction and phenotypic data. Data availability is not the same for all genes however, potentially influencing the performance of these methods. RESULTS: We demonstrate that whilst disease genes tend to be associated with greater numbers of data, this may be at least partially a result of them being better studied. With this observation we develop PhenoRank, which prioritizes disease genes whilst avoiding being biased towards genes with more available data. Bias is avoided by comparing gene scores generated for the query disease against gene scores generated using simulated sets of phenotype terms, which ensures that differences in data availability do not affect the ranking of genes. We demonstrate that whilst existing prioritization methods are biased by data availability, PhenoRank is not similarly biased. Avoiding this bias allows PhenoRank to effectively prioritize genes with fewer available data and improves its overall performance. PhenoRank outperforms three available prioritization methods in cross-validation (PhenoRank area under receiver operating characteristic curve [AUC]=0.89, DADA AUC = 0.87, EXOMISER AUC = 0.71, PRINCE AUC = 0.83, P < 2.2 × 10(−16)). AVAILABILITY AND IMPLEMENTATION: PhenoRank is freely available for download at https://github.com/alexjcornish/PhenoRank. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-5949213 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-59492132018-06-20 PhenoRank: reducing study bias in gene prioritization through simulation Cornish, Alex J David, Alessia Sternberg, Michael J E Bioinformatics Original Papers MOTIVATION: Genome-wide association studies have identified thousands of loci associated with human disease, but identifying the causal genes at these loci is often difficult. Several methods prioritize genes most likely to be disease causing through the integration of biological data, including protein–protein interaction and phenotypic data. Data availability is not the same for all genes however, potentially influencing the performance of these methods. RESULTS: We demonstrate that whilst disease genes tend to be associated with greater numbers of data, this may be at least partially a result of them being better studied. With this observation we develop PhenoRank, which prioritizes disease genes whilst avoiding being biased towards genes with more available data. Bias is avoided by comparing gene scores generated for the query disease against gene scores generated using simulated sets of phenotype terms, which ensures that differences in data availability do not affect the ranking of genes. We demonstrate that whilst existing prioritization methods are biased by data availability, PhenoRank is not similarly biased. Avoiding this bias allows PhenoRank to effectively prioritize genes with fewer available data and improves its overall performance. PhenoRank outperforms three available prioritization methods in cross-validation (PhenoRank area under receiver operating characteristic curve [AUC]=0.89, DADA AUC = 0.87, EXOMISER AUC = 0.71, PRINCE AUC = 0.83, P < 2.2 × 10(−16)). AVAILABILITY AND IMPLEMENTATION: PhenoRank is freely available for download at https://github.com/alexjcornish/PhenoRank. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-06-15 2018-01-17 /pmc/articles/PMC5949213/ /pubmed/29360927 http://dx.doi.org/10.1093/bioinformatics/bty028 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Cornish, Alex J David, Alessia Sternberg, Michael J E PhenoRank: reducing study bias in gene prioritization through simulation |
title | PhenoRank: reducing study bias in gene prioritization through
simulation |
title_full | PhenoRank: reducing study bias in gene prioritization through
simulation |
title_fullStr | PhenoRank: reducing study bias in gene prioritization through
simulation |
title_full_unstemmed | PhenoRank: reducing study bias in gene prioritization through
simulation |
title_short | PhenoRank: reducing study bias in gene prioritization through
simulation |
title_sort | phenorank: reducing study bias in gene prioritization through
simulation |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5949213/ https://www.ncbi.nlm.nih.gov/pubmed/29360927 http://dx.doi.org/10.1093/bioinformatics/bty028 |
work_keys_str_mv | AT cornishalexj phenorankreducingstudybiasingeneprioritizationthroughsimulation AT davidalessia phenorankreducingstudybiasingeneprioritizationthroughsimulation AT sternbergmichaelje phenorankreducingstudybiasingeneprioritizationthroughsimulation |