Cargando…

Predicting the recurrence of noncoding regulatory mutations in cancer

BACKGROUND: One of the greatest challenges in cancer genomics is to distinguish driver mutations from passenger mutations. Whereas recurrence is a hallmark of driver mutations, it is difficult to observe recurring noncoding mutations owing to a limited amount of whole-genome sequenced samples. Hence...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Woojin, Bang, Hyoeun, Jang, Kiwon, Sung, Min Kyung, Choi, Jung Kyoon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5135808/
https://www.ncbi.nlm.nih.gov/pubmed/27912731
http://dx.doi.org/10.1186/s12859-016-1385-y
_version_ 1782471612528852992
author Yang, Woojin
Bang, Hyoeun
Jang, Kiwon
Sung, Min Kyung
Choi, Jung Kyoon
author_facet Yang, Woojin
Bang, Hyoeun
Jang, Kiwon
Sung, Min Kyung
Choi, Jung Kyoon
author_sort Yang, Woojin
collection PubMed
description BACKGROUND: One of the greatest challenges in cancer genomics is to distinguish driver mutations from passenger mutations. Whereas recurrence is a hallmark of driver mutations, it is difficult to observe recurring noncoding mutations owing to a limited amount of whole-genome sequenced samples. Hence, it is required to develop a method to predict potentially recurrent mutations. RESULTS: In this work, we developed a random forest classifier that predicts regulatory mutations that may recur based on the features of the mutations repeatedly appearing in a given cohort. With breast cancer as a model, we profiled 35 quantitative features describing genetic and epigenetic signals at the mutation site, transcription factors whose binding motif was disrupted by the mutation, and genes targeted by long-range chromatin interactions. A true set of mutations for machine learning was generated by interrogating publicly available pan-cancer genomes based on our statistical model of mutation recurrence. The performance of our random forest classifier was evaluated by cross validations. The variable importance of each feature in the classification of mutations was investigated. Our statistical recurrence model for the random forest classifier showed an area under the curve (AUC) of ~0.78 in predicting recurrent mutations. Chromatin accessibility at the mutation sites, the distance from the mutations to known cancer risk loci, and the role of the target genes in the regulatory or protein interaction network were among the most important variables. CONCLUSIONS: Our methods enable to characterize recurrent regulatory mutations using a limited number of whole-genome samples, and based on the characterization, to predict potential driver mutations whose recurrence is not found in the given samples but likely to be observed with additional samples. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1385-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5135808
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-51358082016-12-15 Predicting the recurrence of noncoding regulatory mutations in cancer Yang, Woojin Bang, Hyoeun Jang, Kiwon Sung, Min Kyung Choi, Jung Kyoon BMC Bioinformatics Research Article BACKGROUND: One of the greatest challenges in cancer genomics is to distinguish driver mutations from passenger mutations. Whereas recurrence is a hallmark of driver mutations, it is difficult to observe recurring noncoding mutations owing to a limited amount of whole-genome sequenced samples. Hence, it is required to develop a method to predict potentially recurrent mutations. RESULTS: In this work, we developed a random forest classifier that predicts regulatory mutations that may recur based on the features of the mutations repeatedly appearing in a given cohort. With breast cancer as a model, we profiled 35 quantitative features describing genetic and epigenetic signals at the mutation site, transcription factors whose binding motif was disrupted by the mutation, and genes targeted by long-range chromatin interactions. A true set of mutations for machine learning was generated by interrogating publicly available pan-cancer genomes based on our statistical model of mutation recurrence. The performance of our random forest classifier was evaluated by cross validations. The variable importance of each feature in the classification of mutations was investigated. Our statistical recurrence model for the random forest classifier showed an area under the curve (AUC) of ~0.78 in predicting recurrent mutations. Chromatin accessibility at the mutation sites, the distance from the mutations to known cancer risk loci, and the role of the target genes in the regulatory or protein interaction network were among the most important variables. CONCLUSIONS: Our methods enable to characterize recurrent regulatory mutations using a limited number of whole-genome samples, and based on the characterization, to predict potential driver mutations whose recurrence is not found in the given samples but likely to be observed with additional samples. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1385-y) contains supplementary material, which is available to authorized users. BioMed Central 2016-12-03 /pmc/articles/PMC5135808/ /pubmed/27912731 http://dx.doi.org/10.1186/s12859-016-1385-y Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Yang, Woojin
Bang, Hyoeun
Jang, Kiwon
Sung, Min Kyung
Choi, Jung Kyoon
Predicting the recurrence of noncoding regulatory mutations in cancer
title Predicting the recurrence of noncoding regulatory mutations in cancer
title_full Predicting the recurrence of noncoding regulatory mutations in cancer
title_fullStr Predicting the recurrence of noncoding regulatory mutations in cancer
title_full_unstemmed Predicting the recurrence of noncoding regulatory mutations in cancer
title_short Predicting the recurrence of noncoding regulatory mutations in cancer
title_sort predicting the recurrence of noncoding regulatory mutations in cancer
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5135808/
https://www.ncbi.nlm.nih.gov/pubmed/27912731
http://dx.doi.org/10.1186/s12859-016-1385-y
work_keys_str_mv AT yangwoojin predictingtherecurrenceofnoncodingregulatorymutationsincancer
AT banghyoeun predictingtherecurrenceofnoncodingregulatorymutationsincancer
AT jangkiwon predictingtherecurrenceofnoncodingregulatorymutationsincancer
AT sungminkyung predictingtherecurrenceofnoncodingregulatorymutationsincancer
AT choijungkyoon predictingtherecurrenceofnoncodingregulatorymutationsincancer