Cargando…

A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data

BACKGROUND: Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed that the mu...

Descripción completa

Detalles Bibliográficos
Autores principales: Bertl, Johanna, Guo, Qianyun, Juul, Malene, Besenbacher, Søren, Nielsen, Morten Muhlig, Hornshøj, Henrik, Pedersen, Jakob Skou, Hobolth, Asger
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5909259/
https://www.ncbi.nlm.nih.gov/pubmed/29673314
http://dx.doi.org/10.1186/s12859-018-2141-2
_version_ 1783315864273027072
author Bertl, Johanna
Guo, Qianyun
Juul, Malene
Besenbacher, Søren
Nielsen, Morten Muhlig
Hornshøj, Henrik
Pedersen, Jakob Skou
Hobolth, Asger
author_facet Bertl, Johanna
Guo, Qianyun
Juul, Malene
Besenbacher, Søren
Nielsen, Morten Muhlig
Hornshøj, Henrik
Pedersen, Jakob Skou
Hobolth, Asger
author_sort Bertl, Johanna
collection PubMed
description BACKGROUND: Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed that the mutation rate differs between cancer types, between patients and along the genome depending on the genetic and epigenetic context. Therefore, methods that predict the number of different types of mutations in regions or specific genomic elements must consider local genomic explanatory variables. A major drawback of most methods is the need to average the explanatory variables across the entire region or genomic element. This procedure is particularly problematic if the explanatory variable varies dramatically in the element under consideration. RESULTS: To take into account the fine scale of the explanatory variables, we model the probabilities of different types of mutations for each position in the genome by multinomial logistic regression. We analyse 505 cancer genomes from 14 different cancer types and compare the performance in predicting mutation rate for both regional based models and site-specific models. We show that for 1000 randomly selected genomic positions, the site-specific model predicts the mutation rate much better than regional based models. We use a forward selection procedure to identify the most important explanatory variables. The procedure identifies site-specific conservation (phyloP), replication timing, and expression level as the best predictors for the mutation rate. Finally, our model confirms and quantifies certain well-known mutational signatures. CONCLUSION: We find that our site-specific multinomial regression model outperforms the regional based models. The possibility of including genomic variables on different scales and patient specific variables makes it a versatile framework for studying different mutational mechanisms. Our model can serve as the neutral null model for the mutational process; regions that deviate from the null model are candidates for elements that drive cancer development. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2141-2) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5909259
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-59092592018-04-30 A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data Bertl, Johanna Guo, Qianyun Juul, Malene Besenbacher, Søren Nielsen, Morten Muhlig Hornshøj, Henrik Pedersen, Jakob Skou Hobolth, Asger BMC Bioinformatics Research Article BACKGROUND: Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed that the mutation rate differs between cancer types, between patients and along the genome depending on the genetic and epigenetic context. Therefore, methods that predict the number of different types of mutations in regions or specific genomic elements must consider local genomic explanatory variables. A major drawback of most methods is the need to average the explanatory variables across the entire region or genomic element. This procedure is particularly problematic if the explanatory variable varies dramatically in the element under consideration. RESULTS: To take into account the fine scale of the explanatory variables, we model the probabilities of different types of mutations for each position in the genome by multinomial logistic regression. We analyse 505 cancer genomes from 14 different cancer types and compare the performance in predicting mutation rate for both regional based models and site-specific models. We show that for 1000 randomly selected genomic positions, the site-specific model predicts the mutation rate much better than regional based models. We use a forward selection procedure to identify the most important explanatory variables. The procedure identifies site-specific conservation (phyloP), replication timing, and expression level as the best predictors for the mutation rate. Finally, our model confirms and quantifies certain well-known mutational signatures. CONCLUSION: We find that our site-specific multinomial regression model outperforms the regional based models. The possibility of including genomic variables on different scales and patient specific variables makes it a versatile framework for studying different mutational mechanisms. Our model can serve as the neutral null model for the mutational process; regions that deviate from the null model are candidates for elements that drive cancer development. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2141-2) contains supplementary material, which is available to authorized users. BioMed Central 2018-04-19 /pmc/articles/PMC5909259/ /pubmed/29673314 http://dx.doi.org/10.1186/s12859-018-2141-2 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Bertl, Johanna
Guo, Qianyun
Juul, Malene
Besenbacher, Søren
Nielsen, Morten Muhlig
Hornshøj, Henrik
Pedersen, Jakob Skou
Hobolth, Asger
A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data
title A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data
title_full A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data
title_fullStr A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data
title_full_unstemmed A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data
title_short A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data
title_sort site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5909259/
https://www.ncbi.nlm.nih.gov/pubmed/29673314
http://dx.doi.org/10.1186/s12859-018-2141-2
work_keys_str_mv AT bertljohanna asitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT guoqianyun asitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT juulmalene asitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT besenbachersøren asitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT nielsenmortenmuhlig asitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT hornshøjhenrik asitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT pedersenjakobskou asitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT hobolthasger asitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT bertljohanna sitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT guoqianyun sitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT juulmalene sitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT besenbachersøren sitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT nielsenmortenmuhlig sitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT hornshøjhenrik sitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT pedersenjakobskou sitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata
AT hobolthasger sitespecificmodelandanalysisoftheneutralsomaticmutationrateinwholegenomecancerdata