Cargando…
A method to build extended sequence context models of point mutations and indels
The mutation rate of a specific position in the human genome depends on the sequence context surrounding it. Modeling the mutation rate by estimating a rate for each possible k-mer, however, only works for small values of k since the data becomes too sparse for larger values of k. Here we propose a...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9780256/ https://www.ncbi.nlm.nih.gov/pubmed/36550134 http://dx.doi.org/10.1038/s41467-022-35596-5 |
_version_ | 1784856796573204480 |
---|---|
author | Bethune, Jörn Kleppe, April Besenbacher, Søren |
author_facet | Bethune, Jörn Kleppe, April Besenbacher, Søren |
author_sort | Bethune, Jörn |
collection | PubMed |
description | The mutation rate of a specific position in the human genome depends on the sequence context surrounding it. Modeling the mutation rate by estimating a rate for each possible k-mer, however, only works for small values of k since the data becomes too sparse for larger values of k. Here we propose a new method that solves this problem by grouping similar k-mers. We refer to the method as k-mer pattern partition and have implemented it in a software package called kmerPaPa. We use a large set of human de novo mutations to show that this new method leads to improved prediction of mutation rates and makes it possible to create models using wider sequence contexts than previous studies. As the first method of its kind, it does not only predict rates for point mutations but also insertions and deletions. We have additionally created a software package called Genovo that, given a k-mer pattern partition model, predicts the expected number of synonymous, missense, and other functional mutation types for each gene. Using this software, we show that the created mutation rate models increase the statistical power to detect genes containing disease-causing variants and to identify genes under strong selective constraint. |
format | Online Article Text |
id | pubmed-9780256 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-97802562022-12-24 A method to build extended sequence context models of point mutations and indels Bethune, Jörn Kleppe, April Besenbacher, Søren Nat Commun Article The mutation rate of a specific position in the human genome depends on the sequence context surrounding it. Modeling the mutation rate by estimating a rate for each possible k-mer, however, only works for small values of k since the data becomes too sparse for larger values of k. Here we propose a new method that solves this problem by grouping similar k-mers. We refer to the method as k-mer pattern partition and have implemented it in a software package called kmerPaPa. We use a large set of human de novo mutations to show that this new method leads to improved prediction of mutation rates and makes it possible to create models using wider sequence contexts than previous studies. As the first method of its kind, it does not only predict rates for point mutations but also insertions and deletions. We have additionally created a software package called Genovo that, given a k-mer pattern partition model, predicts the expected number of synonymous, missense, and other functional mutation types for each gene. Using this software, we show that the created mutation rate models increase the statistical power to detect genes containing disease-causing variants and to identify genes under strong selective constraint. Nature Publishing Group UK 2022-12-22 /pmc/articles/PMC9780256/ /pubmed/36550134 http://dx.doi.org/10.1038/s41467-022-35596-5 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Bethune, Jörn Kleppe, April Besenbacher, Søren A method to build extended sequence context models of point mutations and indels |
title | A method to build extended sequence context models of point mutations and indels |
title_full | A method to build extended sequence context models of point mutations and indels |
title_fullStr | A method to build extended sequence context models of point mutations and indels |
title_full_unstemmed | A method to build extended sequence context models of point mutations and indels |
title_short | A method to build extended sequence context models of point mutations and indels |
title_sort | method to build extended sequence context models of point mutations and indels |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9780256/ https://www.ncbi.nlm.nih.gov/pubmed/36550134 http://dx.doi.org/10.1038/s41467-022-35596-5 |
work_keys_str_mv | AT bethunejorn amethodtobuildextendedsequencecontextmodelsofpointmutationsandindels AT kleppeapril amethodtobuildextendedsequencecontextmodelsofpointmutationsandindels AT besenbachersøren amethodtobuildextendedsequencecontextmodelsofpointmutationsandindels AT bethunejorn methodtobuildextendedsequencecontextmodelsofpointmutationsandindels AT kleppeapril methodtobuildextendedsequencecontextmodelsofpointmutationsandindels AT besenbachersøren methodtobuildextendedsequencecontextmodelsofpointmutationsandindels |