Cargando…

Genetic variant pathogenicity prediction trained using disease-specific clinical sequencing data sets

Recent advances in DNA sequencing have expanded our understanding of the molecular basis of genetic disorders and increased the utilization of clinical genomic tests. Given the paucity of evidence to accurately classify each variant and the difficulty of experimentally evaluating its clinical signif...

Descripción completa

Detalles Bibliográficos
Autores principales: Evans, Perry, Wu, Chao, Lindy, Amanda, McKnight, Dianalee A., Lebo, Matthew, Sarmady, Mahdi, Abou Tayoun, Ahmad N.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6633260/
https://www.ncbi.nlm.nih.gov/pubmed/31235655
http://dx.doi.org/10.1101/gr.240994.118
_version_ 1783435717516460032
author Evans, Perry
Wu, Chao
Lindy, Amanda
McKnight, Dianalee A.
Lebo, Matthew
Sarmady, Mahdi
Abou Tayoun, Ahmad N.
author_facet Evans, Perry
Wu, Chao
Lindy, Amanda
McKnight, Dianalee A.
Lebo, Matthew
Sarmady, Mahdi
Abou Tayoun, Ahmad N.
author_sort Evans, Perry
collection PubMed
description Recent advances in DNA sequencing have expanded our understanding of the molecular basis of genetic disorders and increased the utilization of clinical genomic tests. Given the paucity of evidence to accurately classify each variant and the difficulty of experimentally evaluating its clinical significance, a large number of variants generated by clinical tests are reported as variants of unknown clinical significance. Population-scale variant databases can improve clinical interpretation. Specifically, pathogenicity prediction for novel missense variants can use features describing regional variant constraint. Constrained genomic regions are those that have an unusually low variant count in the general population. Computational methods have been introduced to capture these regions and incorporate them into pathogenicity classifiers, but these methods have yet to be compared on an independent clinical variant data set. Here, we introduce one variant data set derived from clinical sequencing panels and use it to compare the ability of different genomic constraint metrics to determine missense variant pathogenicity. This data set is compiled from 17,071 patients surveyed with clinical genomic sequencing for cardiomyopathy, epilepsy, or RASopathies. We further use this data set to demonstrate the necessity of disease-specific classifiers and to train PathoPredictor, a disease-specific ensemble classifier of pathogenicity based on regional constraint and variant-level features. PathoPredictor achieves an average precision >90% for variants from all 99 tested disease genes while approaching 100% accuracy for some genes. The accumulation of larger clinical variant training data sets can significantly enhance their performance in a disease- and gene-specific manner.
format Online
Article
Text
id pubmed-6633260
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-66332602020-01-01 Genetic variant pathogenicity prediction trained using disease-specific clinical sequencing data sets Evans, Perry Wu, Chao Lindy, Amanda McKnight, Dianalee A. Lebo, Matthew Sarmady, Mahdi Abou Tayoun, Ahmad N. Genome Res Method Recent advances in DNA sequencing have expanded our understanding of the molecular basis of genetic disorders and increased the utilization of clinical genomic tests. Given the paucity of evidence to accurately classify each variant and the difficulty of experimentally evaluating its clinical significance, a large number of variants generated by clinical tests are reported as variants of unknown clinical significance. Population-scale variant databases can improve clinical interpretation. Specifically, pathogenicity prediction for novel missense variants can use features describing regional variant constraint. Constrained genomic regions are those that have an unusually low variant count in the general population. Computational methods have been introduced to capture these regions and incorporate them into pathogenicity classifiers, but these methods have yet to be compared on an independent clinical variant data set. Here, we introduce one variant data set derived from clinical sequencing panels and use it to compare the ability of different genomic constraint metrics to determine missense variant pathogenicity. This data set is compiled from 17,071 patients surveyed with clinical genomic sequencing for cardiomyopathy, epilepsy, or RASopathies. We further use this data set to demonstrate the necessity of disease-specific classifiers and to train PathoPredictor, a disease-specific ensemble classifier of pathogenicity based on regional constraint and variant-level features. PathoPredictor achieves an average precision >90% for variants from all 99 tested disease genes while approaching 100% accuracy for some genes. The accumulation of larger clinical variant training data sets can significantly enhance their performance in a disease- and gene-specific manner. Cold Spring Harbor Laboratory Press 2019-07 /pmc/articles/PMC6633260/ /pubmed/31235655 http://dx.doi.org/10.1101/gr.240994.118 Text en © 2019 Evans et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle Method
Evans, Perry
Wu, Chao
Lindy, Amanda
McKnight, Dianalee A.
Lebo, Matthew
Sarmady, Mahdi
Abou Tayoun, Ahmad N.
Genetic variant pathogenicity prediction trained using disease-specific clinical sequencing data sets
title Genetic variant pathogenicity prediction trained using disease-specific clinical sequencing data sets
title_full Genetic variant pathogenicity prediction trained using disease-specific clinical sequencing data sets
title_fullStr Genetic variant pathogenicity prediction trained using disease-specific clinical sequencing data sets
title_full_unstemmed Genetic variant pathogenicity prediction trained using disease-specific clinical sequencing data sets
title_short Genetic variant pathogenicity prediction trained using disease-specific clinical sequencing data sets
title_sort genetic variant pathogenicity prediction trained using disease-specific clinical sequencing data sets
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6633260/
https://www.ncbi.nlm.nih.gov/pubmed/31235655
http://dx.doi.org/10.1101/gr.240994.118
work_keys_str_mv AT evansperry geneticvariantpathogenicitypredictiontrainedusingdiseasespecificclinicalsequencingdatasets
AT wuchao geneticvariantpathogenicitypredictiontrainedusingdiseasespecificclinicalsequencingdatasets
AT lindyamanda geneticvariantpathogenicitypredictiontrainedusingdiseasespecificclinicalsequencingdatasets
AT mcknightdianaleea geneticvariantpathogenicitypredictiontrainedusingdiseasespecificclinicalsequencingdatasets
AT lebomatthew geneticvariantpathogenicitypredictiontrainedusingdiseasespecificclinicalsequencingdatasets
AT sarmadymahdi geneticvariantpathogenicitypredictiontrainedusingdiseasespecificclinicalsequencingdatasets
AT aboutayounahmadn geneticvariantpathogenicitypredictiontrainedusingdiseasespecificclinicalsequencingdatasets