Cargando…

GenePy - a score for estimating gene pathogenicity in individuals using next-generation sequencing data

BACKGROUND: Next-generation sequencing is revolutionising diagnosis and treatment of rare diseases, however its application to understanding common disease aetiology is limited. Rare disease applications binarily attribute genetic change(s) at a single locus to a specific phenotype. In common diseas...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mossotto, E., Ashton, J. J., O’Gorman, L., Pengelly, R. J., Beattie, R. M., MacArthur, B. D., Ennis, S.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6524327/ https://www.ncbi.nlm.nih.gov/pubmed/31096927 http://dx.doi.org/10.1186/s12859-019-2877-3

_version_	1783419537132093440
author	Mossotto, E. Ashton, J. J. O’Gorman, L. Pengelly, R. J. Beattie, R. M. MacArthur, B. D. Ennis, S.
author_facet	Mossotto, E. Ashton, J. J. O’Gorman, L. Pengelly, R. J. Beattie, R. M. MacArthur, B. D. Ennis, S.
author_sort	Mossotto, E.
collection	PubMed
description	BACKGROUND: Next-generation sequencing is revolutionising diagnosis and treatment of rare diseases, however its application to understanding common disease aetiology is limited. Rare disease applications binarily attribute genetic change(s) at a single locus to a specific phenotype. In common diseases, where multiple genetic variants within and across genes contribute to disease, binary modelling cannot capture the burden of pathogenicity harboured by an individual across a given gene/pathway. We present GenePy, a novel gene-level scoring system for integration and analysis of next-generation sequencing data on a per-individual basis that transforms NGS data interpretation from variant-level to gene-level. This simple and flexible scoring system is intuitive and amenable to integration for machine learning, network and topological approaches, facilitating the investigation of complex phenotypes. RESULTS: Whole-exome sequencing data from 508 individuals were used to generate GenePy scores. For each variant a score is calculated incorporating: i) population allele frequency estimates; ii) individual zygosity, determined through standard variant calling pipelines and; iii) any user defined deleteriousness metric to inform on functional impact. GenePy then combines scores generated for all variants observed into a single gene score for each individual. We generated a matrix of ~ 14,000 GenePy scores for all individuals for each of sixteen popular deleteriousness metrics. All per-gene scores are corrected for gene length. The majority of genes generate GenePy scores < 0.01 although individuals harbouring multiple rare highly deleterious mutations can accumulate extremely high GenePy scores. In the absence of a comparator metric, we examine GenePy performance in discriminating genes known to be associated with three common, complex diseases. A Mann-Whitney U test conducted on GenePy scores for this positive control gene in cases versus controls demonstrates markedly more significant results (p = 1.37 × 10(− 4)) compared to the most commonly applied association tool that combines common and rare variation (p = 0.003). CONCLUSIONS: Per-gene per-individual GenePy scores are intuitive when assessing genetic variation in individual patients or comparing scores between groups. GenePy outperforms the currently accepted best practice tools for combining common and rare variation. GenePy scores are suitable for downstream data integration with transcriptomic and proteomic data that also report at the gene level. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2877-3) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6524327
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-65243272019-05-24 GenePy - a score for estimating gene pathogenicity in individuals using next-generation sequencing data Mossotto, E. Ashton, J. J. O’Gorman, L. Pengelly, R. J. Beattie, R. M. MacArthur, B. D. Ennis, S. BMC Bioinformatics Software BACKGROUND: Next-generation sequencing is revolutionising diagnosis and treatment of rare diseases, however its application to understanding common disease aetiology is limited. Rare disease applications binarily attribute genetic change(s) at a single locus to a specific phenotype. In common diseases, where multiple genetic variants within and across genes contribute to disease, binary modelling cannot capture the burden of pathogenicity harboured by an individual across a given gene/pathway. We present GenePy, a novel gene-level scoring system for integration and analysis of next-generation sequencing data on a per-individual basis that transforms NGS data interpretation from variant-level to gene-level. This simple and flexible scoring system is intuitive and amenable to integration for machine learning, network and topological approaches, facilitating the investigation of complex phenotypes. RESULTS: Whole-exome sequencing data from 508 individuals were used to generate GenePy scores. For each variant a score is calculated incorporating: i) population allele frequency estimates; ii) individual zygosity, determined through standard variant calling pipelines and; iii) any user defined deleteriousness metric to inform on functional impact. GenePy then combines scores generated for all variants observed into a single gene score for each individual. We generated a matrix of ~ 14,000 GenePy scores for all individuals for each of sixteen popular deleteriousness metrics. All per-gene scores are corrected for gene length. The majority of genes generate GenePy scores < 0.01 although individuals harbouring multiple rare highly deleterious mutations can accumulate extremely high GenePy scores. In the absence of a comparator metric, we examine GenePy performance in discriminating genes known to be associated with three common, complex diseases. A Mann-Whitney U test conducted on GenePy scores for this positive control gene in cases versus controls demonstrates markedly more significant results (p = 1.37 × 10(− 4)) compared to the most commonly applied association tool that combines common and rare variation (p = 0.003). CONCLUSIONS: Per-gene per-individual GenePy scores are intuitive when assessing genetic variation in individual patients or comparing scores between groups. GenePy outperforms the currently accepted best practice tools for combining common and rare variation. GenePy scores are suitable for downstream data integration with transcriptomic and proteomic data that also report at the gene level. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2877-3) contains supplementary material, which is available to authorized users. BioMed Central 2019-05-16 /pmc/articles/PMC6524327/ /pubmed/31096927 http://dx.doi.org/10.1186/s12859-019-2877-3 Text en © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Software Mossotto, E. Ashton, J. J. O’Gorman, L. Pengelly, R. J. Beattie, R. M. MacArthur, B. D. Ennis, S. GenePy - a score for estimating gene pathogenicity in individuals using next-generation sequencing data
title	GenePy - a score for estimating gene pathogenicity in individuals using next-generation sequencing data
title_full	GenePy - a score for estimating gene pathogenicity in individuals using next-generation sequencing data
title_fullStr	GenePy - a score for estimating gene pathogenicity in individuals using next-generation sequencing data
title_full_unstemmed	GenePy - a score for estimating gene pathogenicity in individuals using next-generation sequencing data
title_short	GenePy - a score for estimating gene pathogenicity in individuals using next-generation sequencing data
title_sort	genepy - a score for estimating gene pathogenicity in individuals using next-generation sequencing data
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6524327/ https://www.ncbi.nlm.nih.gov/pubmed/31096927 http://dx.doi.org/10.1186/s12859-019-2877-3
work_keys_str_mv	AT mossottoe genepyascoreforestimatinggenepathogenicityinindividualsusingnextgenerationsequencingdata AT ashtonjj genepyascoreforestimatinggenepathogenicityinindividualsusingnextgenerationsequencingdata AT ogormanl genepyascoreforestimatinggenepathogenicityinindividualsusingnextgenerationsequencingdata AT pengellyrj genepyascoreforestimatinggenepathogenicityinindividualsusingnextgenerationsequencingdata AT beattierm genepyascoreforestimatinggenepathogenicityinindividualsusingnextgenerationsequencingdata AT macarthurbd genepyascoreforestimatinggenepathogenicityinindividualsusingnextgenerationsequencingdata AT enniss genepyascoreforestimatinggenepathogenicityinindividualsusingnextgenerationsequencingdata

GenePy - a score for estimating gene pathogenicity in individuals using next-generation sequencing data

Ejemplares similares