Cargando…

A new computational method to predict transcriptional activity of a DNA sequence from diverse datasets of massively parallel reporter assays

In recent years, the dramatic increase in the number of applications for massively parallel reporter assay (MPRA) technology has produced a large body of data for various purposes. However, a computational model that can be applied to decipher regulatory codes for diverse MPRAs does not exist yet. H...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Ying, Irie, Takuma, Yada, Tetsushi, Suzuki, Yutaka
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737609/
https://www.ncbi.nlm.nih.gov/pubmed/28531296
http://dx.doi.org/10.1093/nar/gkx396
_version_ 1783287546315276288
author Liu, Ying
Irie, Takuma
Yada, Tetsushi
Suzuki, Yutaka
author_facet Liu, Ying
Irie, Takuma
Yada, Tetsushi
Suzuki, Yutaka
author_sort Liu, Ying
collection PubMed
description In recent years, the dramatic increase in the number of applications for massively parallel reporter assay (MPRA) technology has produced a large body of data for various purposes. However, a computational model that can be applied to decipher regulatory codes for diverse MPRAs does not exist yet. Here, we propose a new computational method to predict the transcriptional activity of MPRAs, as well as luciferase reporter assays, based on the TRANScription FACtor database. We employed regression trees and multivariate adaptive regression splines to obtain these predictions and considered a feature redundancy-dependent formula for conventional regression trees to enable adaptation to diverse data. The developed method was applicable to various MPRAs despite the use of different types of transfected cells, sequence lengths, construct numbers and sequence types. We demonstrate that this method can predict the transcriptional activity of promoters in HEK293 cells through predictive functions that were estimated by independent assays in eight tumor cell lines. The prediction was generally good (Pearson's r = 0.68) which suggested that common active transcription factor binding sites across different cell types make greater contributions to transcriptional activity and that known promoter activity could confer transcriptional activity of unknown promoters in some instances, regardless of cell type.
format Online
Article
Text
id pubmed-5737609
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-57376092018-01-04 A new computational method to predict transcriptional activity of a DNA sequence from diverse datasets of massively parallel reporter assays Liu, Ying Irie, Takuma Yada, Tetsushi Suzuki, Yutaka Nucleic Acids Res Methods Online In recent years, the dramatic increase in the number of applications for massively parallel reporter assay (MPRA) technology has produced a large body of data for various purposes. However, a computational model that can be applied to decipher regulatory codes for diverse MPRAs does not exist yet. Here, we propose a new computational method to predict the transcriptional activity of MPRAs, as well as luciferase reporter assays, based on the TRANScription FACtor database. We employed regression trees and multivariate adaptive regression splines to obtain these predictions and considered a feature redundancy-dependent formula for conventional regression trees to enable adaptation to diverse data. The developed method was applicable to various MPRAs despite the use of different types of transfected cells, sequence lengths, construct numbers and sequence types. We demonstrate that this method can predict the transcriptional activity of promoters in HEK293 cells through predictive functions that were estimated by independent assays in eight tumor cell lines. The prediction was generally good (Pearson's r = 0.68) which suggested that common active transcription factor binding sites across different cell types make greater contributions to transcriptional activity and that known promoter activity could confer transcriptional activity of unknown promoters in some instances, regardless of cell type. Oxford University Press 2017-07-27 2017-05-22 /pmc/articles/PMC5737609/ /pubmed/28531296 http://dx.doi.org/10.1093/nar/gkx396 Text en © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Liu, Ying
Irie, Takuma
Yada, Tetsushi
Suzuki, Yutaka
A new computational method to predict transcriptional activity of a DNA sequence from diverse datasets of massively parallel reporter assays
title A new computational method to predict transcriptional activity of a DNA sequence from diverse datasets of massively parallel reporter assays
title_full A new computational method to predict transcriptional activity of a DNA sequence from diverse datasets of massively parallel reporter assays
title_fullStr A new computational method to predict transcriptional activity of a DNA sequence from diverse datasets of massively parallel reporter assays
title_full_unstemmed A new computational method to predict transcriptional activity of a DNA sequence from diverse datasets of massively parallel reporter assays
title_short A new computational method to predict transcriptional activity of a DNA sequence from diverse datasets of massively parallel reporter assays
title_sort new computational method to predict transcriptional activity of a dna sequence from diverse datasets of massively parallel reporter assays
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737609/
https://www.ncbi.nlm.nih.gov/pubmed/28531296
http://dx.doi.org/10.1093/nar/gkx396
work_keys_str_mv AT liuying anewcomputationalmethodtopredicttranscriptionalactivityofadnasequencefromdiversedatasetsofmassivelyparallelreporterassays
AT irietakuma anewcomputationalmethodtopredicttranscriptionalactivityofadnasequencefromdiversedatasetsofmassivelyparallelreporterassays
AT yadatetsushi anewcomputationalmethodtopredicttranscriptionalactivityofadnasequencefromdiversedatasetsofmassivelyparallelreporterassays
AT suzukiyutaka anewcomputationalmethodtopredicttranscriptionalactivityofadnasequencefromdiversedatasetsofmassivelyparallelreporterassays
AT liuying newcomputationalmethodtopredicttranscriptionalactivityofadnasequencefromdiversedatasetsofmassivelyparallelreporterassays
AT irietakuma newcomputationalmethodtopredicttranscriptionalactivityofadnasequencefromdiversedatasetsofmassivelyparallelreporterassays
AT yadatetsushi newcomputationalmethodtopredicttranscriptionalactivityofadnasequencefromdiversedatasetsofmassivelyparallelreporterassays
AT suzukiyutaka newcomputationalmethodtopredicttranscriptionalactivityofadnasequencefromdiversedatasetsofmassivelyparallelreporterassays