Cargando…

A Comparative Study of Supervised Machine Learning Algorithms for the Prediction of Long-Range Chromatin Interactions

The role of three-dimensional genome organization as a critical regulator of gene expression has become increasingly clear over the last decade. Most of our understanding of this association comes from the study of long range chromatin interaction maps provided by Chromatin Conformation Capture-base...

Descripción completa

Detalles Bibliográficos
Autores principales: Vanhaeren, Thomas, Divina, Federico, García-Torres, Miguel, Gómez-Vela, Francisco, Vanhoof, Wim, Martínez-García, Pedro Manuel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7563616/
https://www.ncbi.nlm.nih.gov/pubmed/32847102
http://dx.doi.org/10.3390/genes11090985
_version_ 1783595529046851584
author Vanhaeren, Thomas
Divina, Federico
García-Torres, Miguel
Gómez-Vela, Francisco
Vanhoof, Wim
Martínez-García, Pedro Manuel
author_facet Vanhaeren, Thomas
Divina, Federico
García-Torres, Miguel
Gómez-Vela, Francisco
Vanhoof, Wim
Martínez-García, Pedro Manuel
author_sort Vanhaeren, Thomas
collection PubMed
description The role of three-dimensional genome organization as a critical regulator of gene expression has become increasingly clear over the last decade. Most of our understanding of this association comes from the study of long range chromatin interaction maps provided by Chromatin Conformation Capture-based techniques, which have greatly improved in recent years. Since these procedures are experimentally laborious and expensive, in silico prediction has emerged as an alternative strategy to generate virtual maps in cell types and conditions for which experimental data of chromatin interactions is not available. Several methods have been based on predictive models trained on one-dimensional (1D) sequencing features, yielding promising results. However, different approaches vary both in the way they model chromatin interactions and in the machine learning-based strategy they rely on, making it challenging to carry out performance comparison of existing methods. In this study, we use publicly available 1D sequencing signals to model cohesin-mediated chromatin interactions in two human cell lines and evaluate the prediction performance of six popular machine learning algorithms: decision trees, random forests, gradient boosting, support vector machines, multi-layer perceptron and deep learning. Our approach accurately predicts long-range interactions and reveals that gradient boosting significantly outperforms the other five methods, yielding accuracies of about 95%. We show that chromatin features in close genomic proximity to the anchors cover most of the predictive information, as has been previously reported. Moreover, we demonstrate that gradient boosting models trained with different subsets of chromatin features, unlike the other methods tested, are able to produce accurate predictions. In this regard, and besides architectural proteins, transcription factors are shown to be highly informative. Our study provides a framework for the systematic prediction of long-range chromatin interactions, identifies gradient boosting as the best suited algorithm for this task and highlights cell-type specific binding of transcription factors at the anchors as important determinants of chromatin wiring mediated by cohesin.
format Online
Article
Text
id pubmed-7563616
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75636162020-10-27 A Comparative Study of Supervised Machine Learning Algorithms for the Prediction of Long-Range Chromatin Interactions Vanhaeren, Thomas Divina, Federico García-Torres, Miguel Gómez-Vela, Francisco Vanhoof, Wim Martínez-García, Pedro Manuel Genes (Basel) Article The role of three-dimensional genome organization as a critical regulator of gene expression has become increasingly clear over the last decade. Most of our understanding of this association comes from the study of long range chromatin interaction maps provided by Chromatin Conformation Capture-based techniques, which have greatly improved in recent years. Since these procedures are experimentally laborious and expensive, in silico prediction has emerged as an alternative strategy to generate virtual maps in cell types and conditions for which experimental data of chromatin interactions is not available. Several methods have been based on predictive models trained on one-dimensional (1D) sequencing features, yielding promising results. However, different approaches vary both in the way they model chromatin interactions and in the machine learning-based strategy they rely on, making it challenging to carry out performance comparison of existing methods. In this study, we use publicly available 1D sequencing signals to model cohesin-mediated chromatin interactions in two human cell lines and evaluate the prediction performance of six popular machine learning algorithms: decision trees, random forests, gradient boosting, support vector machines, multi-layer perceptron and deep learning. Our approach accurately predicts long-range interactions and reveals that gradient boosting significantly outperforms the other five methods, yielding accuracies of about 95%. We show that chromatin features in close genomic proximity to the anchors cover most of the predictive information, as has been previously reported. Moreover, we demonstrate that gradient boosting models trained with different subsets of chromatin features, unlike the other methods tested, are able to produce accurate predictions. In this regard, and besides architectural proteins, transcription factors are shown to be highly informative. Our study provides a framework for the systematic prediction of long-range chromatin interactions, identifies gradient boosting as the best suited algorithm for this task and highlights cell-type specific binding of transcription factors at the anchors as important determinants of chromatin wiring mediated by cohesin. MDPI 2020-08-24 /pmc/articles/PMC7563616/ /pubmed/32847102 http://dx.doi.org/10.3390/genes11090985 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Vanhaeren, Thomas
Divina, Federico
García-Torres, Miguel
Gómez-Vela, Francisco
Vanhoof, Wim
Martínez-García, Pedro Manuel
A Comparative Study of Supervised Machine Learning Algorithms for the Prediction of Long-Range Chromatin Interactions
title A Comparative Study of Supervised Machine Learning Algorithms for the Prediction of Long-Range Chromatin Interactions
title_full A Comparative Study of Supervised Machine Learning Algorithms for the Prediction of Long-Range Chromatin Interactions
title_fullStr A Comparative Study of Supervised Machine Learning Algorithms for the Prediction of Long-Range Chromatin Interactions
title_full_unstemmed A Comparative Study of Supervised Machine Learning Algorithms for the Prediction of Long-Range Chromatin Interactions
title_short A Comparative Study of Supervised Machine Learning Algorithms for the Prediction of Long-Range Chromatin Interactions
title_sort comparative study of supervised machine learning algorithms for the prediction of long-range chromatin interactions
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7563616/
https://www.ncbi.nlm.nih.gov/pubmed/32847102
http://dx.doi.org/10.3390/genes11090985
work_keys_str_mv AT vanhaerenthomas acomparativestudyofsupervisedmachinelearningalgorithmsforthepredictionoflongrangechromatininteractions
AT divinafederico acomparativestudyofsupervisedmachinelearningalgorithmsforthepredictionoflongrangechromatininteractions
AT garciatorresmiguel acomparativestudyofsupervisedmachinelearningalgorithmsforthepredictionoflongrangechromatininteractions
AT gomezvelafrancisco acomparativestudyofsupervisedmachinelearningalgorithmsforthepredictionoflongrangechromatininteractions
AT vanhoofwim acomparativestudyofsupervisedmachinelearningalgorithmsforthepredictionoflongrangechromatininteractions
AT martinezgarciapedromanuel acomparativestudyofsupervisedmachinelearningalgorithmsforthepredictionoflongrangechromatininteractions
AT vanhaerenthomas comparativestudyofsupervisedmachinelearningalgorithmsforthepredictionoflongrangechromatininteractions
AT divinafederico comparativestudyofsupervisedmachinelearningalgorithmsforthepredictionoflongrangechromatininteractions
AT garciatorresmiguel comparativestudyofsupervisedmachinelearningalgorithmsforthepredictionoflongrangechromatininteractions
AT gomezvelafrancisco comparativestudyofsupervisedmachinelearningalgorithmsforthepredictionoflongrangechromatininteractions
AT vanhoofwim comparativestudyofsupervisedmachinelearningalgorithmsforthepredictionoflongrangechromatininteractions
AT martinezgarciapedromanuel comparativestudyofsupervisedmachinelearningalgorithmsforthepredictionoflongrangechromatininteractions