Cargando…
Predicting the recurrence of noncoding regulatory mutations in cancer
BACKGROUND: One of the greatest challenges in cancer genomics is to distinguish driver mutations from passenger mutations. Whereas recurrence is a hallmark of driver mutations, it is difficult to observe recurring noncoding mutations owing to a limited amount of whole-genome sequenced samples. Hence...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5135808/ https://www.ncbi.nlm.nih.gov/pubmed/27912731 http://dx.doi.org/10.1186/s12859-016-1385-y |
_version_ | 1782471612528852992 |
---|---|
author | Yang, Woojin Bang, Hyoeun Jang, Kiwon Sung, Min Kyung Choi, Jung Kyoon |
author_facet | Yang, Woojin Bang, Hyoeun Jang, Kiwon Sung, Min Kyung Choi, Jung Kyoon |
author_sort | Yang, Woojin |
collection | PubMed |
description | BACKGROUND: One of the greatest challenges in cancer genomics is to distinguish driver mutations from passenger mutations. Whereas recurrence is a hallmark of driver mutations, it is difficult to observe recurring noncoding mutations owing to a limited amount of whole-genome sequenced samples. Hence, it is required to develop a method to predict potentially recurrent mutations. RESULTS: In this work, we developed a random forest classifier that predicts regulatory mutations that may recur based on the features of the mutations repeatedly appearing in a given cohort. With breast cancer as a model, we profiled 35 quantitative features describing genetic and epigenetic signals at the mutation site, transcription factors whose binding motif was disrupted by the mutation, and genes targeted by long-range chromatin interactions. A true set of mutations for machine learning was generated by interrogating publicly available pan-cancer genomes based on our statistical model of mutation recurrence. The performance of our random forest classifier was evaluated by cross validations. The variable importance of each feature in the classification of mutations was investigated. Our statistical recurrence model for the random forest classifier showed an area under the curve (AUC) of ~0.78 in predicting recurrent mutations. Chromatin accessibility at the mutation sites, the distance from the mutations to known cancer risk loci, and the role of the target genes in the regulatory or protein interaction network were among the most important variables. CONCLUSIONS: Our methods enable to characterize recurrent regulatory mutations using a limited number of whole-genome samples, and based on the characterization, to predict potential driver mutations whose recurrence is not found in the given samples but likely to be observed with additional samples. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1385-y) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5135808 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-51358082016-12-15 Predicting the recurrence of noncoding regulatory mutations in cancer Yang, Woojin Bang, Hyoeun Jang, Kiwon Sung, Min Kyung Choi, Jung Kyoon BMC Bioinformatics Research Article BACKGROUND: One of the greatest challenges in cancer genomics is to distinguish driver mutations from passenger mutations. Whereas recurrence is a hallmark of driver mutations, it is difficult to observe recurring noncoding mutations owing to a limited amount of whole-genome sequenced samples. Hence, it is required to develop a method to predict potentially recurrent mutations. RESULTS: In this work, we developed a random forest classifier that predicts regulatory mutations that may recur based on the features of the mutations repeatedly appearing in a given cohort. With breast cancer as a model, we profiled 35 quantitative features describing genetic and epigenetic signals at the mutation site, transcription factors whose binding motif was disrupted by the mutation, and genes targeted by long-range chromatin interactions. A true set of mutations for machine learning was generated by interrogating publicly available pan-cancer genomes based on our statistical model of mutation recurrence. The performance of our random forest classifier was evaluated by cross validations. The variable importance of each feature in the classification of mutations was investigated. Our statistical recurrence model for the random forest classifier showed an area under the curve (AUC) of ~0.78 in predicting recurrent mutations. Chromatin accessibility at the mutation sites, the distance from the mutations to known cancer risk loci, and the role of the target genes in the regulatory or protein interaction network were among the most important variables. CONCLUSIONS: Our methods enable to characterize recurrent regulatory mutations using a limited number of whole-genome samples, and based on the characterization, to predict potential driver mutations whose recurrence is not found in the given samples but likely to be observed with additional samples. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1385-y) contains supplementary material, which is available to authorized users. BioMed Central 2016-12-03 /pmc/articles/PMC5135808/ /pubmed/27912731 http://dx.doi.org/10.1186/s12859-016-1385-y Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Yang, Woojin Bang, Hyoeun Jang, Kiwon Sung, Min Kyung Choi, Jung Kyoon Predicting the recurrence of noncoding regulatory mutations in cancer |
title | Predicting the recurrence of noncoding regulatory mutations in cancer |
title_full | Predicting the recurrence of noncoding regulatory mutations in cancer |
title_fullStr | Predicting the recurrence of noncoding regulatory mutations in cancer |
title_full_unstemmed | Predicting the recurrence of noncoding regulatory mutations in cancer |
title_short | Predicting the recurrence of noncoding regulatory mutations in cancer |
title_sort | predicting the recurrence of noncoding regulatory mutations in cancer |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5135808/ https://www.ncbi.nlm.nih.gov/pubmed/27912731 http://dx.doi.org/10.1186/s12859-016-1385-y |
work_keys_str_mv | AT yangwoojin predictingtherecurrenceofnoncodingregulatorymutationsincancer AT banghyoeun predictingtherecurrenceofnoncodingregulatorymutationsincancer AT jangkiwon predictingtherecurrenceofnoncodingregulatorymutationsincancer AT sungminkyung predictingtherecurrenceofnoncodingregulatorymutationsincancer AT choijungkyoon predictingtherecurrenceofnoncodingregulatorymutationsincancer |