Cargando…

PhenoRank: reducing study bias in gene prioritization through simulation

MOTIVATION: Genome-wide association studies have identified thousands of loci associated with human disease, but identifying the causal genes at these loci is often difficult. Several methods prioritize genes most likely to be disease causing through the integration of biological data, including pro...

Descripción completa

Detalles Bibliográficos
Autores principales: Cornish, Alex J, David, Alessia, Sternberg, Michael J E
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5949213/
https://www.ncbi.nlm.nih.gov/pubmed/29360927
http://dx.doi.org/10.1093/bioinformatics/bty028
_version_ 1783322703197896704
author Cornish, Alex J
David, Alessia
Sternberg, Michael J E
author_facet Cornish, Alex J
David, Alessia
Sternberg, Michael J E
author_sort Cornish, Alex J
collection PubMed
description MOTIVATION: Genome-wide association studies have identified thousands of loci associated with human disease, but identifying the causal genes at these loci is often difficult. Several methods prioritize genes most likely to be disease causing through the integration of biological data, including protein–protein interaction and phenotypic data. Data availability is not the same for all genes however, potentially influencing the performance of these methods. RESULTS: We demonstrate that whilst disease genes tend to be associated with greater numbers of data, this may be at least partially a result of them being better studied. With this observation we develop PhenoRank, which prioritizes disease genes whilst avoiding being biased towards genes with more available data. Bias is avoided by comparing gene scores generated for the query disease against gene scores generated using simulated sets of phenotype terms, which ensures that differences in data availability do not affect the ranking of genes. We demonstrate that whilst existing prioritization methods are biased by data availability, PhenoRank is not similarly biased. Avoiding this bias allows PhenoRank to effectively prioritize genes with fewer available data and improves its overall performance. PhenoRank outperforms three available prioritization methods in cross-validation (PhenoRank area under receiver operating characteristic curve [AUC]=0.89, DADA AUC = 0.87, EXOMISER AUC = 0.71, PRINCE AUC = 0.83, P < 2.2 × 10(−16)). AVAILABILITY AND IMPLEMENTATION: PhenoRank is freely available for download at https://github.com/alexjcornish/PhenoRank. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5949213
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-59492132018-06-20 PhenoRank: reducing study bias in gene prioritization through simulation Cornish, Alex J David, Alessia Sternberg, Michael J E Bioinformatics Original Papers MOTIVATION: Genome-wide association studies have identified thousands of loci associated with human disease, but identifying the causal genes at these loci is often difficult. Several methods prioritize genes most likely to be disease causing through the integration of biological data, including protein–protein interaction and phenotypic data. Data availability is not the same for all genes however, potentially influencing the performance of these methods. RESULTS: We demonstrate that whilst disease genes tend to be associated with greater numbers of data, this may be at least partially a result of them being better studied. With this observation we develop PhenoRank, which prioritizes disease genes whilst avoiding being biased towards genes with more available data. Bias is avoided by comparing gene scores generated for the query disease against gene scores generated using simulated sets of phenotype terms, which ensures that differences in data availability do not affect the ranking of genes. We demonstrate that whilst existing prioritization methods are biased by data availability, PhenoRank is not similarly biased. Avoiding this bias allows PhenoRank to effectively prioritize genes with fewer available data and improves its overall performance. PhenoRank outperforms three available prioritization methods in cross-validation (PhenoRank area under receiver operating characteristic curve [AUC]=0.89, DADA AUC = 0.87, EXOMISER AUC = 0.71, PRINCE AUC = 0.83, P < 2.2 × 10(−16)). AVAILABILITY AND IMPLEMENTATION: PhenoRank is freely available for download at https://github.com/alexjcornish/PhenoRank. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-06-15 2018-01-17 /pmc/articles/PMC5949213/ /pubmed/29360927 http://dx.doi.org/10.1093/bioinformatics/bty028 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Cornish, Alex J
David, Alessia
Sternberg, Michael J E
PhenoRank: reducing study bias in gene prioritization through simulation
title PhenoRank: reducing study bias in gene prioritization through simulation
title_full PhenoRank: reducing study bias in gene prioritization through simulation
title_fullStr PhenoRank: reducing study bias in gene prioritization through simulation
title_full_unstemmed PhenoRank: reducing study bias in gene prioritization through simulation
title_short PhenoRank: reducing study bias in gene prioritization through simulation
title_sort phenorank: reducing study bias in gene prioritization through simulation
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5949213/
https://www.ncbi.nlm.nih.gov/pubmed/29360927
http://dx.doi.org/10.1093/bioinformatics/bty028
work_keys_str_mv AT cornishalexj phenorankreducingstudybiasingeneprioritizationthroughsimulation
AT davidalessia phenorankreducingstudybiasingeneprioritizationthroughsimulation
AT sternbergmichaelje phenorankreducingstudybiasingeneprioritizationthroughsimulation