Cargando…

LEAP: Using machine learning to support variant classification in a clinical setting

Advances in genome sequencing have led to a tremendous increase in the discovery of novel missense variants, but evidence for determining clinical significance can be limited or conflicting. Here, we present Learning from Evidence to Assess Pathogenicity (LEAP), a machine learning model that utilize...

Descripción completa

Detalles Bibliográficos
Autores principales: Lai, Carmen, Zimmer, Anjali D., O'Connor, Robert, Kim, Serra, Chan, Ray, van den Akker, Jeroen, Zhou, Alicia Y., Topper, Scott, Mishne, Gilad
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7317941/
https://www.ncbi.nlm.nih.gov/pubmed/32176384
http://dx.doi.org/10.1002/humu.24011
_version_ 1783550743613014016
author Lai, Carmen
Zimmer, Anjali D.
O'Connor, Robert
Kim, Serra
Chan, Ray
van den Akker, Jeroen
Zhou, Alicia Y.
Topper, Scott
Mishne, Gilad
author_facet Lai, Carmen
Zimmer, Anjali D.
O'Connor, Robert
Kim, Serra
Chan, Ray
van den Akker, Jeroen
Zhou, Alicia Y.
Topper, Scott
Mishne, Gilad
author_sort Lai, Carmen
collection PubMed
description Advances in genome sequencing have led to a tremendous increase in the discovery of novel missense variants, but evidence for determining clinical significance can be limited or conflicting. Here, we present Learning from Evidence to Assess Pathogenicity (LEAP), a machine learning model that utilizes a variety of feature categories to classify variants, and achieves high performance in multiple genes and different health conditions. Feature categories include functional predictions, splice predictions, population frequencies, conservation scores, protein domain data, and clinical observation data such as personal and family history and covariant information. L2‐regularized logistic regression and random forest classification models were trained on missense variants detected and classified during the course of routine clinical testing at Color Genomics (14,226 variants from 24 cancer‐related genes and 5,398 variants from 30 cardiovascular‐related genes). Using 10‐fold cross‐validated predictions, the logistic regression model achieved an area under the receiver operating characteristic curve (AUROC) of 97.8% (cancer) and 98.8% (cardiovascular), while the random forest model achieved 98.3% (cancer) and 98.6% (cardiovascular). We demonstrate generalizability to different genes by validating predictions on genes withheld from training (96.8% AUROC). High accuracy and broad applicability make LEAP effective in the clinical setting as a high‐throughput quality control layer.
format Online
Article
Text
id pubmed-7317941
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-73179412020-06-29 LEAP: Using machine learning to support variant classification in a clinical setting Lai, Carmen Zimmer, Anjali D. O'Connor, Robert Kim, Serra Chan, Ray van den Akker, Jeroen Zhou, Alicia Y. Topper, Scott Mishne, Gilad Hum Mutat Special Article Advances in genome sequencing have led to a tremendous increase in the discovery of novel missense variants, but evidence for determining clinical significance can be limited or conflicting. Here, we present Learning from Evidence to Assess Pathogenicity (LEAP), a machine learning model that utilizes a variety of feature categories to classify variants, and achieves high performance in multiple genes and different health conditions. Feature categories include functional predictions, splice predictions, population frequencies, conservation scores, protein domain data, and clinical observation data such as personal and family history and covariant information. L2‐regularized logistic regression and random forest classification models were trained on missense variants detected and classified during the course of routine clinical testing at Color Genomics (14,226 variants from 24 cancer‐related genes and 5,398 variants from 30 cardiovascular‐related genes). Using 10‐fold cross‐validated predictions, the logistic regression model achieved an area under the receiver operating characteristic curve (AUROC) of 97.8% (cancer) and 98.8% (cardiovascular), while the random forest model achieved 98.3% (cancer) and 98.6% (cardiovascular). We demonstrate generalizability to different genes by validating predictions on genes withheld from training (96.8% AUROC). High accuracy and broad applicability make LEAP effective in the clinical setting as a high‐throughput quality control layer. John Wiley and Sons Inc. 2020-04-01 2020-06 /pmc/articles/PMC7317941/ /pubmed/32176384 http://dx.doi.org/10.1002/humu.24011 Text en © 2020 The Authors. Human Mutation published by Wiley Periodicals, Inc. This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc-nd/4.0/ License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.
spellingShingle Special Article
Lai, Carmen
Zimmer, Anjali D.
O'Connor, Robert
Kim, Serra
Chan, Ray
van den Akker, Jeroen
Zhou, Alicia Y.
Topper, Scott
Mishne, Gilad
LEAP: Using machine learning to support variant classification in a clinical setting
title LEAP: Using machine learning to support variant classification in a clinical setting
title_full LEAP: Using machine learning to support variant classification in a clinical setting
title_fullStr LEAP: Using machine learning to support variant classification in a clinical setting
title_full_unstemmed LEAP: Using machine learning to support variant classification in a clinical setting
title_short LEAP: Using machine learning to support variant classification in a clinical setting
title_sort leap: using machine learning to support variant classification in a clinical setting
topic Special Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7317941/
https://www.ncbi.nlm.nih.gov/pubmed/32176384
http://dx.doi.org/10.1002/humu.24011
work_keys_str_mv AT laicarmen leapusingmachinelearningtosupportvariantclassificationinaclinicalsetting
AT zimmeranjalid leapusingmachinelearningtosupportvariantclassificationinaclinicalsetting
AT oconnorrobert leapusingmachinelearningtosupportvariantclassificationinaclinicalsetting
AT kimserra leapusingmachinelearningtosupportvariantclassificationinaclinicalsetting
AT chanray leapusingmachinelearningtosupportvariantclassificationinaclinicalsetting
AT vandenakkerjeroen leapusingmachinelearningtosupportvariantclassificationinaclinicalsetting
AT zhoualiciay leapusingmachinelearningtosupportvariantclassificationinaclinicalsetting
AT topperscott leapusingmachinelearningtosupportvariantclassificationinaclinicalsetting
AT mishnegilad leapusingmachinelearningtosupportvariantclassificationinaclinicalsetting