Cargando…
Genome-wide prediction of disease variant effects with a deep protein language model
Predicting the effects of coding variants is a major challenge. While recent deep-learning models have improved variant effect prediction accuracy, they cannot analyze all coding variants due to dependency on close homologs or software limitations. Here we developed a workflow using ESM1b, a 650-mil...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group US
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10484790/ https://www.ncbi.nlm.nih.gov/pubmed/37563329 http://dx.doi.org/10.1038/s41588-023-01465-0 |
_version_ | 1785102661199069184 |
---|---|
author | Brandes, Nadav Goldman, Grant Wang, Charlotte H. Ye, Chun Jimmie Ntranos, Vasilis |
author_facet | Brandes, Nadav Goldman, Grant Wang, Charlotte H. Ye, Chun Jimmie Ntranos, Vasilis |
author_sort | Brandes, Nadav |
collection | PubMed |
description | Predicting the effects of coding variants is a major challenge. While recent deep-learning models have improved variant effect prediction accuracy, they cannot analyze all coding variants due to dependency on close homologs or software limitations. Here we developed a workflow using ESM1b, a 650-million-parameter protein language model, to predict all ~450 million possible missense variant effects in the human genome, and made all predictions available on a web portal. ESM1b outperformed existing methods in classifying ~150,000 ClinVar/HGMD missense variants as pathogenic or benign and predicting measurements across 28 deep mutational scan datasets. We further annotated ~2 million variants as damaging only in specific protein isoforms, demonstrating the importance of considering all isoforms when predicting variant effects. Our approach also generalizes to more complex coding variants such as in-frame indels and stop-gains. Together, these results establish protein language models as an effective, accurate and general approach to predicting variant effects. |
format | Online Article Text |
id | pubmed-10484790 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group US |
record_format | MEDLINE/PubMed |
spelling | pubmed-104847902023-09-09 Genome-wide prediction of disease variant effects with a deep protein language model Brandes, Nadav Goldman, Grant Wang, Charlotte H. Ye, Chun Jimmie Ntranos, Vasilis Nat Genet Article Predicting the effects of coding variants is a major challenge. While recent deep-learning models have improved variant effect prediction accuracy, they cannot analyze all coding variants due to dependency on close homologs or software limitations. Here we developed a workflow using ESM1b, a 650-million-parameter protein language model, to predict all ~450 million possible missense variant effects in the human genome, and made all predictions available on a web portal. ESM1b outperformed existing methods in classifying ~150,000 ClinVar/HGMD missense variants as pathogenic or benign and predicting measurements across 28 deep mutational scan datasets. We further annotated ~2 million variants as damaging only in specific protein isoforms, demonstrating the importance of considering all isoforms when predicting variant effects. Our approach also generalizes to more complex coding variants such as in-frame indels and stop-gains. Together, these results establish protein language models as an effective, accurate and general approach to predicting variant effects. Nature Publishing Group US 2023-08-10 2023 /pmc/articles/PMC10484790/ /pubmed/37563329 http://dx.doi.org/10.1038/s41588-023-01465-0 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Brandes, Nadav Goldman, Grant Wang, Charlotte H. Ye, Chun Jimmie Ntranos, Vasilis Genome-wide prediction of disease variant effects with a deep protein language model |
title | Genome-wide prediction of disease variant effects with a deep protein language model |
title_full | Genome-wide prediction of disease variant effects with a deep protein language model |
title_fullStr | Genome-wide prediction of disease variant effects with a deep protein language model |
title_full_unstemmed | Genome-wide prediction of disease variant effects with a deep protein language model |
title_short | Genome-wide prediction of disease variant effects with a deep protein language model |
title_sort | genome-wide prediction of disease variant effects with a deep protein language model |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10484790/ https://www.ncbi.nlm.nih.gov/pubmed/37563329 http://dx.doi.org/10.1038/s41588-023-01465-0 |
work_keys_str_mv | AT brandesnadav genomewidepredictionofdiseasevarianteffectswithadeepproteinlanguagemodel AT goldmangrant genomewidepredictionofdiseasevarianteffectswithadeepproteinlanguagemodel AT wangcharlotteh genomewidepredictionofdiseasevarianteffectswithadeepproteinlanguagemodel AT yechunjimmie genomewidepredictionofdiseasevarianteffectswithadeepproteinlanguagemodel AT ntranosvasilis genomewidepredictionofdiseasevarianteffectswithadeepproteinlanguagemodel |