Cargando…

Identification of High-Impact cis-Regulatory Mutations Using Transcription Factor Specific Random Forest Models

Cancer genomes contain vast amounts of somatic mutations, many of which are passenger mutations not involved in oncogenesis. Whereas driver mutations in protein-coding genes can be distinguished from passenger mutations based on their recurrence, non-coding mutations are usually not recurrent at the...

Descripción completa

Detalles Bibliográficos
Autores principales: Svetlichnyy, Dmitry, Imrichova, Hana, Fiers, Mark, Kalender Atak, Zeynep, Aerts, Stein
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4642938/
https://www.ncbi.nlm.nih.gov/pubmed/26562774
http://dx.doi.org/10.1371/journal.pcbi.1004590
_version_ 1782400436496498688
author Svetlichnyy, Dmitry
Imrichova, Hana
Fiers, Mark
Kalender Atak, Zeynep
Aerts, Stein
author_facet Svetlichnyy, Dmitry
Imrichova, Hana
Fiers, Mark
Kalender Atak, Zeynep
Aerts, Stein
author_sort Svetlichnyy, Dmitry
collection PubMed
description Cancer genomes contain vast amounts of somatic mutations, many of which are passenger mutations not involved in oncogenesis. Whereas driver mutations in protein-coding genes can be distinguished from passenger mutations based on their recurrence, non-coding mutations are usually not recurrent at the same position. Therefore, it is still unclear how to identify cis-regulatory driver mutations, particularly when chromatin data from the same patient is not available, thus relying only on sequence and expression information. Here we use machine-learning methods to predict functional regulatory regions using sequence information alone, and compare the predicted activity of the mutated region with the reference sequence. This way we define the Predicted Regulatory Impact of a Mutation in an Enhancer (PRIME). We find that the recently identified driver mutation in the TAL1 enhancer has a high PRIME score, representing a “gain-of-target” for MYB, whereas the highly recurrent TERT promoter mutation has a surprisingly low PRIME score. We trained Random Forest models for 45 cancer-related transcription factors, and used these to score variations in the HeLa genome and somatic mutations across more than five hundred cancer genomes. Each model predicts only a small fraction of non-coding mutations with a potential impact on the function of the encompassing regulatory region. Nevertheless, as these few candidate driver mutations are often linked to gains in chromatin activity and gene expression, they may contribute to the oncogenic program by altering the expression levels of specific oncogenes and tumor suppressor genes.
format Online
Article
Text
id pubmed-4642938
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-46429382015-11-18 Identification of High-Impact cis-Regulatory Mutations Using Transcription Factor Specific Random Forest Models Svetlichnyy, Dmitry Imrichova, Hana Fiers, Mark Kalender Atak, Zeynep Aerts, Stein PLoS Comput Biol Research Article Cancer genomes contain vast amounts of somatic mutations, many of which are passenger mutations not involved in oncogenesis. Whereas driver mutations in protein-coding genes can be distinguished from passenger mutations based on their recurrence, non-coding mutations are usually not recurrent at the same position. Therefore, it is still unclear how to identify cis-regulatory driver mutations, particularly when chromatin data from the same patient is not available, thus relying only on sequence and expression information. Here we use machine-learning methods to predict functional regulatory regions using sequence information alone, and compare the predicted activity of the mutated region with the reference sequence. This way we define the Predicted Regulatory Impact of a Mutation in an Enhancer (PRIME). We find that the recently identified driver mutation in the TAL1 enhancer has a high PRIME score, representing a “gain-of-target” for MYB, whereas the highly recurrent TERT promoter mutation has a surprisingly low PRIME score. We trained Random Forest models for 45 cancer-related transcription factors, and used these to score variations in the HeLa genome and somatic mutations across more than five hundred cancer genomes. Each model predicts only a small fraction of non-coding mutations with a potential impact on the function of the encompassing regulatory region. Nevertheless, as these few candidate driver mutations are often linked to gains in chromatin activity and gene expression, they may contribute to the oncogenic program by altering the expression levels of specific oncogenes and tumor suppressor genes. Public Library of Science 2015-11-12 /pmc/articles/PMC4642938/ /pubmed/26562774 http://dx.doi.org/10.1371/journal.pcbi.1004590 Text en © 2015 Svetlichnyy et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Svetlichnyy, Dmitry
Imrichova, Hana
Fiers, Mark
Kalender Atak, Zeynep
Aerts, Stein
Identification of High-Impact cis-Regulatory Mutations Using Transcription Factor Specific Random Forest Models
title Identification of High-Impact cis-Regulatory Mutations Using Transcription Factor Specific Random Forest Models
title_full Identification of High-Impact cis-Regulatory Mutations Using Transcription Factor Specific Random Forest Models
title_fullStr Identification of High-Impact cis-Regulatory Mutations Using Transcription Factor Specific Random Forest Models
title_full_unstemmed Identification of High-Impact cis-Regulatory Mutations Using Transcription Factor Specific Random Forest Models
title_short Identification of High-Impact cis-Regulatory Mutations Using Transcription Factor Specific Random Forest Models
title_sort identification of high-impact cis-regulatory mutations using transcription factor specific random forest models
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4642938/
https://www.ncbi.nlm.nih.gov/pubmed/26562774
http://dx.doi.org/10.1371/journal.pcbi.1004590
work_keys_str_mv AT svetlichnyydmitry identificationofhighimpactcisregulatorymutationsusingtranscriptionfactorspecificrandomforestmodels
AT imrichovahana identificationofhighimpactcisregulatorymutationsusingtranscriptionfactorspecificrandomforestmodels
AT fiersmark identificationofhighimpactcisregulatorymutationsusingtranscriptionfactorspecificrandomforestmodels
AT kalenderatakzeynep identificationofhighimpactcisregulatorymutationsusingtranscriptionfactorspecificrandomforestmodels
AT aertsstein identificationofhighimpactcisregulatorymutationsusingtranscriptionfactorspecificrandomforestmodels