Cargando…
LYRUS: a machine learning model for predicting the pathogenicity of missense variants
SUMMARY: Single amino acid variations (SAVs) are a primary contributor to variations in the human genome. Identifying pathogenic SAVs can provide insights to the genetic architecture of complex diseases. Most approaches for predicting the functional effects or pathogenicity of SAVs rely on either se...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8754197/ https://www.ncbi.nlm.nih.gov/pubmed/35036922 http://dx.doi.org/10.1093/bioadv/vbab045 |
_version_ | 1784632224290701312 |
---|---|
author | Lai, Jiaying Yang, Jordan Gamsiz Uzun, Ece D Rubenstein, Brenda M Sarkar, Indra Neil |
author_facet | Lai, Jiaying Yang, Jordan Gamsiz Uzun, Ece D Rubenstein, Brenda M Sarkar, Indra Neil |
author_sort | Lai, Jiaying |
collection | PubMed |
description | SUMMARY: Single amino acid variations (SAVs) are a primary contributor to variations in the human genome. Identifying pathogenic SAVs can provide insights to the genetic architecture of complex diseases. Most approaches for predicting the functional effects or pathogenicity of SAVs rely on either sequence or structural information. This study presents 〈Lai Yang Rubenstein Uzun Sarkar〉 (LYRUS), a machine learning method that uses an XGBoost classifier to predict the pathogenicity of SAVs. LYRUS incorporates five sequence-based, six structure-based and four dynamics-based features. Uniquely, LYRUS includes a newly proposed sequence co-evolution feature called the variation number. LYRUS was trained using a dataset that contains 4363 protein structures corresponding to 22 639 SAVs from the ClinVar database, and tested using the VariBench testing dataset. Performance analysis showed that LYRUS achieved comparable performance to current variant effect predictors. LYRUS’s performance was also benchmarked against six Deep Mutational Scanning datasets for PTEN and TP53. AVAILABILITY AND IMPLEMENTATION: LYRUS is freely available and the source code can be found at https://github.com/jiaying2508/LYRUS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. |
format | Online Article Text |
id | pubmed-8754197 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-87541972022-01-13 LYRUS: a machine learning model for predicting the pathogenicity of missense variants Lai, Jiaying Yang, Jordan Gamsiz Uzun, Ece D Rubenstein, Brenda M Sarkar, Indra Neil Bioinform Adv Original Paper SUMMARY: Single amino acid variations (SAVs) are a primary contributor to variations in the human genome. Identifying pathogenic SAVs can provide insights to the genetic architecture of complex diseases. Most approaches for predicting the functional effects or pathogenicity of SAVs rely on either sequence or structural information. This study presents 〈Lai Yang Rubenstein Uzun Sarkar〉 (LYRUS), a machine learning method that uses an XGBoost classifier to predict the pathogenicity of SAVs. LYRUS incorporates five sequence-based, six structure-based and four dynamics-based features. Uniquely, LYRUS includes a newly proposed sequence co-evolution feature called the variation number. LYRUS was trained using a dataset that contains 4363 protein structures corresponding to 22 639 SAVs from the ClinVar database, and tested using the VariBench testing dataset. Performance analysis showed that LYRUS achieved comparable performance to current variant effect predictors. LYRUS’s performance was also benchmarked against six Deep Mutational Scanning datasets for PTEN and TP53. AVAILABILITY AND IMPLEMENTATION: LYRUS is freely available and the source code can be found at https://github.com/jiaying2508/LYRUS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. Oxford University Press 2021-12-25 /pmc/articles/PMC8754197/ /pubmed/35036922 http://dx.doi.org/10.1093/bioadv/vbab045 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Paper Lai, Jiaying Yang, Jordan Gamsiz Uzun, Ece D Rubenstein, Brenda M Sarkar, Indra Neil LYRUS: a machine learning model for predicting the pathogenicity of missense variants |
title | LYRUS: a machine learning model for predicting the pathogenicity of missense variants |
title_full | LYRUS: a machine learning model for predicting the pathogenicity of missense variants |
title_fullStr | LYRUS: a machine learning model for predicting the pathogenicity of missense variants |
title_full_unstemmed | LYRUS: a machine learning model for predicting the pathogenicity of missense variants |
title_short | LYRUS: a machine learning model for predicting the pathogenicity of missense variants |
title_sort | lyrus: a machine learning model for predicting the pathogenicity of missense variants |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8754197/ https://www.ncbi.nlm.nih.gov/pubmed/35036922 http://dx.doi.org/10.1093/bioadv/vbab045 |
work_keys_str_mv | AT laijiaying lyrusamachinelearningmodelforpredictingthepathogenicityofmissensevariants AT yangjordan lyrusamachinelearningmodelforpredictingthepathogenicityofmissensevariants AT gamsizuzuneced lyrusamachinelearningmodelforpredictingthepathogenicityofmissensevariants AT rubensteinbrendam lyrusamachinelearningmodelforpredictingthepathogenicityofmissensevariants AT sarkarindraneil lyrusamachinelearningmodelforpredictingthepathogenicityofmissensevariants |