Cargando…

A method to build extended sequence context models of point mutations and indels

The mutation rate of a specific position in the human genome depends on the sequence context surrounding it. Modeling the mutation rate by estimating a rate for each possible k-mer, however, only works for small values of k since the data becomes too sparse for larger values of k. Here we propose a...

Descripción completa

Detalles Bibliográficos
Autores principales: Bethune, Jörn, Kleppe, April, Besenbacher, Søren
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9780256/
https://www.ncbi.nlm.nih.gov/pubmed/36550134
http://dx.doi.org/10.1038/s41467-022-35596-5
_version_ 1784856796573204480
author Bethune, Jörn
Kleppe, April
Besenbacher, Søren
author_facet Bethune, Jörn
Kleppe, April
Besenbacher, Søren
author_sort Bethune, Jörn
collection PubMed
description The mutation rate of a specific position in the human genome depends on the sequence context surrounding it. Modeling the mutation rate by estimating a rate for each possible k-mer, however, only works for small values of k since the data becomes too sparse for larger values of k. Here we propose a new method that solves this problem by grouping similar k-mers. We refer to the method as k-mer pattern partition and have implemented it in a software package called kmerPaPa. We use a large set of human de novo mutations to show that this new method leads to improved prediction of mutation rates and makes it possible to create models using wider sequence contexts than previous studies. As the first method of its kind, it does not only predict rates for point mutations but also insertions and deletions. We have additionally created a software package called Genovo that, given a k-mer pattern partition model, predicts the expected number of synonymous, missense, and other functional mutation types for each gene. Using this software, we show that the created mutation rate models increase the statistical power to detect genes containing disease-causing variants and to identify genes under strong selective constraint.
format Online
Article
Text
id pubmed-9780256
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-97802562022-12-24 A method to build extended sequence context models of point mutations and indels Bethune, Jörn Kleppe, April Besenbacher, Søren Nat Commun Article The mutation rate of a specific position in the human genome depends on the sequence context surrounding it. Modeling the mutation rate by estimating a rate for each possible k-mer, however, only works for small values of k since the data becomes too sparse for larger values of k. Here we propose a new method that solves this problem by grouping similar k-mers. We refer to the method as k-mer pattern partition and have implemented it in a software package called kmerPaPa. We use a large set of human de novo mutations to show that this new method leads to improved prediction of mutation rates and makes it possible to create models using wider sequence contexts than previous studies. As the first method of its kind, it does not only predict rates for point mutations but also insertions and deletions. We have additionally created a software package called Genovo that, given a k-mer pattern partition model, predicts the expected number of synonymous, missense, and other functional mutation types for each gene. Using this software, we show that the created mutation rate models increase the statistical power to detect genes containing disease-causing variants and to identify genes under strong selective constraint. Nature Publishing Group UK 2022-12-22 /pmc/articles/PMC9780256/ /pubmed/36550134 http://dx.doi.org/10.1038/s41467-022-35596-5 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Bethune, Jörn
Kleppe, April
Besenbacher, Søren
A method to build extended sequence context models of point mutations and indels
title A method to build extended sequence context models of point mutations and indels
title_full A method to build extended sequence context models of point mutations and indels
title_fullStr A method to build extended sequence context models of point mutations and indels
title_full_unstemmed A method to build extended sequence context models of point mutations and indels
title_short A method to build extended sequence context models of point mutations and indels
title_sort method to build extended sequence context models of point mutations and indels
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9780256/
https://www.ncbi.nlm.nih.gov/pubmed/36550134
http://dx.doi.org/10.1038/s41467-022-35596-5
work_keys_str_mv AT bethunejorn amethodtobuildextendedsequencecontextmodelsofpointmutationsandindels
AT kleppeapril amethodtobuildextendedsequencecontextmodelsofpointmutationsandindels
AT besenbachersøren amethodtobuildextendedsequencecontextmodelsofpointmutationsandindels
AT bethunejorn methodtobuildextendedsequencecontextmodelsofpointmutationsandindels
AT kleppeapril methodtobuildextendedsequencecontextmodelsofpointmutationsandindels
AT besenbachersøren methodtobuildextendedsequencecontextmodelsofpointmutationsandindels