Cargando…
GPN-MSA: an alignment-based DNA language model for genome-wide variant effect prediction
Whereas protein language models have demonstrated remarkable efficacy in predicting the effects of missense variants, DNA counterparts have not yet achieved a similar competitive edge for genome-wide variant effect predictions, especially in complex genomes such as that of humans. To address this ch...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10592768/ https://www.ncbi.nlm.nih.gov/pubmed/37873118 http://dx.doi.org/10.1101/2023.10.10.561776 |
_version_ | 1785124340727021568 |
---|---|
author | Benegas, Gonzalo Albors, Carlos Aw, Alan J. Ye, Chengzhong Song, Yun S. |
author_facet | Benegas, Gonzalo Albors, Carlos Aw, Alan J. Ye, Chengzhong Song, Yun S. |
author_sort | Benegas, Gonzalo |
collection | PubMed |
description | Whereas protein language models have demonstrated remarkable efficacy in predicting the effects of missense variants, DNA counterparts have not yet achieved a similar competitive edge for genome-wide variant effect predictions, especially in complex genomes such as that of humans. To address this challenge, we here introduce GPN-MSA, a novel framework for DNA language models that leverages whole-genome sequence alignments across multiple species and takes only a few hours to train. Across several benchmarks on clinical databases (ClinVar, COSMIC, and OMIM) and population genomic data (gnomAD), our model for the human genome achieves outstanding performance on deleteriousness prediction for both coding and non-coding variants. |
format | Online Article Text |
id | pubmed-10592768 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-105927682023-10-24 GPN-MSA: an alignment-based DNA language model for genome-wide variant effect prediction Benegas, Gonzalo Albors, Carlos Aw, Alan J. Ye, Chengzhong Song, Yun S. bioRxiv Article Whereas protein language models have demonstrated remarkable efficacy in predicting the effects of missense variants, DNA counterparts have not yet achieved a similar competitive edge for genome-wide variant effect predictions, especially in complex genomes such as that of humans. To address this challenge, we here introduce GPN-MSA, a novel framework for DNA language models that leverages whole-genome sequence alignments across multiple species and takes only a few hours to train. Across several benchmarks on clinical databases (ClinVar, COSMIC, and OMIM) and population genomic data (gnomAD), our model for the human genome achieves outstanding performance on deleteriousness prediction for both coding and non-coding variants. Cold Spring Harbor Laboratory 2023-10-11 /pmc/articles/PMC10592768/ /pubmed/37873118 http://dx.doi.org/10.1101/2023.10.10.561776 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator. |
spellingShingle | Article Benegas, Gonzalo Albors, Carlos Aw, Alan J. Ye, Chengzhong Song, Yun S. GPN-MSA: an alignment-based DNA language model for genome-wide variant effect prediction |
title | GPN-MSA: an alignment-based DNA language model for genome-wide variant effect prediction |
title_full | GPN-MSA: an alignment-based DNA language model for genome-wide variant effect prediction |
title_fullStr | GPN-MSA: an alignment-based DNA language model for genome-wide variant effect prediction |
title_full_unstemmed | GPN-MSA: an alignment-based DNA language model for genome-wide variant effect prediction |
title_short | GPN-MSA: an alignment-based DNA language model for genome-wide variant effect prediction |
title_sort | gpn-msa: an alignment-based dna language model for genome-wide variant effect prediction |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10592768/ https://www.ncbi.nlm.nih.gov/pubmed/37873118 http://dx.doi.org/10.1101/2023.10.10.561776 |
work_keys_str_mv | AT benegasgonzalo gpnmsaanalignmentbaseddnalanguagemodelforgenomewidevarianteffectprediction AT alborscarlos gpnmsaanalignmentbaseddnalanguagemodelforgenomewidevarianteffectprediction AT awalanj gpnmsaanalignmentbaseddnalanguagemodelforgenomewidevarianteffectprediction AT yechengzhong gpnmsaanalignmentbaseddnalanguagemodelforgenomewidevarianteffectprediction AT songyuns gpnmsaanalignmentbaseddnalanguagemodelforgenomewidevarianteffectprediction |