Cargando…

LYRUS: a machine learning model for predicting the pathogenicity of missense variants

SUMMARY: Single amino acid variations (SAVs) are a primary contributor to variations in the human genome. Identifying pathogenic SAVs can provide insights to the genetic architecture of complex diseases. Most approaches for predicting the functional effects or pathogenicity of SAVs rely on either se...

Descripción completa

Detalles Bibliográficos
Autores principales: Lai, Jiaying, Yang, Jordan, Gamsiz Uzun, Ece D, Rubenstein, Brenda M, Sarkar, Indra Neil
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8754197/
https://www.ncbi.nlm.nih.gov/pubmed/35036922
http://dx.doi.org/10.1093/bioadv/vbab045
_version_ 1784632224290701312
author Lai, Jiaying
Yang, Jordan
Gamsiz Uzun, Ece D
Rubenstein, Brenda M
Sarkar, Indra Neil
author_facet Lai, Jiaying
Yang, Jordan
Gamsiz Uzun, Ece D
Rubenstein, Brenda M
Sarkar, Indra Neil
author_sort Lai, Jiaying
collection PubMed
description SUMMARY: Single amino acid variations (SAVs) are a primary contributor to variations in the human genome. Identifying pathogenic SAVs can provide insights to the genetic architecture of complex diseases. Most approaches for predicting the functional effects or pathogenicity of SAVs rely on either sequence or structural information. This study presents 〈Lai Yang Rubenstein Uzun Sarkar〉 (LYRUS), a machine learning method that uses an XGBoost classifier to predict the pathogenicity of SAVs. LYRUS incorporates five sequence-based, six structure-based and four dynamics-based features. Uniquely, LYRUS includes a newly proposed sequence co-evolution feature called the variation number. LYRUS was trained using a dataset that contains 4363 protein structures corresponding to 22 639 SAVs from the ClinVar database, and tested using the VariBench testing dataset. Performance analysis showed that LYRUS achieved comparable performance to current variant effect predictors. LYRUS’s performance was also benchmarked against six Deep Mutational Scanning datasets for PTEN and TP53. AVAILABILITY AND IMPLEMENTATION: LYRUS is freely available and the source code can be found at https://github.com/jiaying2508/LYRUS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online.
format Online
Article
Text
id pubmed-8754197
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-87541972022-01-13 LYRUS: a machine learning model for predicting the pathogenicity of missense variants Lai, Jiaying Yang, Jordan Gamsiz Uzun, Ece D Rubenstein, Brenda M Sarkar, Indra Neil Bioinform Adv Original Paper SUMMARY: Single amino acid variations (SAVs) are a primary contributor to variations in the human genome. Identifying pathogenic SAVs can provide insights to the genetic architecture of complex diseases. Most approaches for predicting the functional effects or pathogenicity of SAVs rely on either sequence or structural information. This study presents 〈Lai Yang Rubenstein Uzun Sarkar〉 (LYRUS), a machine learning method that uses an XGBoost classifier to predict the pathogenicity of SAVs. LYRUS incorporates five sequence-based, six structure-based and four dynamics-based features. Uniquely, LYRUS includes a newly proposed sequence co-evolution feature called the variation number. LYRUS was trained using a dataset that contains 4363 protein structures corresponding to 22 639 SAVs from the ClinVar database, and tested using the VariBench testing dataset. Performance analysis showed that LYRUS achieved comparable performance to current variant effect predictors. LYRUS’s performance was also benchmarked against six Deep Mutational Scanning datasets for PTEN and TP53. AVAILABILITY AND IMPLEMENTATION: LYRUS is freely available and the source code can be found at https://github.com/jiaying2508/LYRUS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. Oxford University Press 2021-12-25 /pmc/articles/PMC8754197/ /pubmed/35036922 http://dx.doi.org/10.1093/bioadv/vbab045 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Lai, Jiaying
Yang, Jordan
Gamsiz Uzun, Ece D
Rubenstein, Brenda M
Sarkar, Indra Neil
LYRUS: a machine learning model for predicting the pathogenicity of missense variants
title LYRUS: a machine learning model for predicting the pathogenicity of missense variants
title_full LYRUS: a machine learning model for predicting the pathogenicity of missense variants
title_fullStr LYRUS: a machine learning model for predicting the pathogenicity of missense variants
title_full_unstemmed LYRUS: a machine learning model for predicting the pathogenicity of missense variants
title_short LYRUS: a machine learning model for predicting the pathogenicity of missense variants
title_sort lyrus: a machine learning model for predicting the pathogenicity of missense variants
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8754197/
https://www.ncbi.nlm.nih.gov/pubmed/35036922
http://dx.doi.org/10.1093/bioadv/vbab045
work_keys_str_mv AT laijiaying lyrusamachinelearningmodelforpredictingthepathogenicityofmissensevariants
AT yangjordan lyrusamachinelearningmodelforpredictingthepathogenicityofmissensevariants
AT gamsizuzuneced lyrusamachinelearningmodelforpredictingthepathogenicityofmissensevariants
AT rubensteinbrendam lyrusamachinelearningmodelforpredictingthepathogenicityofmissensevariants
AT sarkarindraneil lyrusamachinelearningmodelforpredictingthepathogenicityofmissensevariants