Cargando…
A new computational method to predict transcriptional activity of a DNA sequence from diverse datasets of massively parallel reporter assays
In recent years, the dramatic increase in the number of applications for massively parallel reporter assay (MPRA) technology has produced a large body of data for various purposes. However, a computational model that can be applied to decipher regulatory codes for diverse MPRAs does not exist yet. H...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737609/ https://www.ncbi.nlm.nih.gov/pubmed/28531296 http://dx.doi.org/10.1093/nar/gkx396 |
_version_ | 1783287546315276288 |
---|---|
author | Liu, Ying Irie, Takuma Yada, Tetsushi Suzuki, Yutaka |
author_facet | Liu, Ying Irie, Takuma Yada, Tetsushi Suzuki, Yutaka |
author_sort | Liu, Ying |
collection | PubMed |
description | In recent years, the dramatic increase in the number of applications for massively parallel reporter assay (MPRA) technology has produced a large body of data for various purposes. However, a computational model that can be applied to decipher regulatory codes for diverse MPRAs does not exist yet. Here, we propose a new computational method to predict the transcriptional activity of MPRAs, as well as luciferase reporter assays, based on the TRANScription FACtor database. We employed regression trees and multivariate adaptive regression splines to obtain these predictions and considered a feature redundancy-dependent formula for conventional regression trees to enable adaptation to diverse data. The developed method was applicable to various MPRAs despite the use of different types of transfected cells, sequence lengths, construct numbers and sequence types. We demonstrate that this method can predict the transcriptional activity of promoters in HEK293 cells through predictive functions that were estimated by independent assays in eight tumor cell lines. The prediction was generally good (Pearson's r = 0.68) which suggested that common active transcription factor binding sites across different cell types make greater contributions to transcriptional activity and that known promoter activity could confer transcriptional activity of unknown promoters in some instances, regardless of cell type. |
format | Online Article Text |
id | pubmed-5737609 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-57376092018-01-04 A new computational method to predict transcriptional activity of a DNA sequence from diverse datasets of massively parallel reporter assays Liu, Ying Irie, Takuma Yada, Tetsushi Suzuki, Yutaka Nucleic Acids Res Methods Online In recent years, the dramatic increase in the number of applications for massively parallel reporter assay (MPRA) technology has produced a large body of data for various purposes. However, a computational model that can be applied to decipher regulatory codes for diverse MPRAs does not exist yet. Here, we propose a new computational method to predict the transcriptional activity of MPRAs, as well as luciferase reporter assays, based on the TRANScription FACtor database. We employed regression trees and multivariate adaptive regression splines to obtain these predictions and considered a feature redundancy-dependent formula for conventional regression trees to enable adaptation to diverse data. The developed method was applicable to various MPRAs despite the use of different types of transfected cells, sequence lengths, construct numbers and sequence types. We demonstrate that this method can predict the transcriptional activity of promoters in HEK293 cells through predictive functions that were estimated by independent assays in eight tumor cell lines. The prediction was generally good (Pearson's r = 0.68) which suggested that common active transcription factor binding sites across different cell types make greater contributions to transcriptional activity and that known promoter activity could confer transcriptional activity of unknown promoters in some instances, regardless of cell type. Oxford University Press 2017-07-27 2017-05-22 /pmc/articles/PMC5737609/ /pubmed/28531296 http://dx.doi.org/10.1093/nar/gkx396 Text en © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methods Online Liu, Ying Irie, Takuma Yada, Tetsushi Suzuki, Yutaka A new computational method to predict transcriptional activity of a DNA sequence from diverse datasets of massively parallel reporter assays |
title | A new computational method to predict transcriptional activity of a DNA sequence from diverse datasets of massively parallel reporter assays |
title_full | A new computational method to predict transcriptional activity of a DNA sequence from diverse datasets of massively parallel reporter assays |
title_fullStr | A new computational method to predict transcriptional activity of a DNA sequence from diverse datasets of massively parallel reporter assays |
title_full_unstemmed | A new computational method to predict transcriptional activity of a DNA sequence from diverse datasets of massively parallel reporter assays |
title_short | A new computational method to predict transcriptional activity of a DNA sequence from diverse datasets of massively parallel reporter assays |
title_sort | new computational method to predict transcriptional activity of a dna sequence from diverse datasets of massively parallel reporter assays |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5737609/ https://www.ncbi.nlm.nih.gov/pubmed/28531296 http://dx.doi.org/10.1093/nar/gkx396 |
work_keys_str_mv | AT liuying anewcomputationalmethodtopredicttranscriptionalactivityofadnasequencefromdiversedatasetsofmassivelyparallelreporterassays AT irietakuma anewcomputationalmethodtopredicttranscriptionalactivityofadnasequencefromdiversedatasetsofmassivelyparallelreporterassays AT yadatetsushi anewcomputationalmethodtopredicttranscriptionalactivityofadnasequencefromdiversedatasetsofmassivelyparallelreporterassays AT suzukiyutaka anewcomputationalmethodtopredicttranscriptionalactivityofadnasequencefromdiversedatasetsofmassivelyparallelreporterassays AT liuying newcomputationalmethodtopredicttranscriptionalactivityofadnasequencefromdiversedatasetsofmassivelyparallelreporterassays AT irietakuma newcomputationalmethodtopredicttranscriptionalactivityofadnasequencefromdiversedatasetsofmassivelyparallelreporterassays AT yadatetsushi newcomputationalmethodtopredicttranscriptionalactivityofadnasequencefromdiversedatasetsofmassivelyparallelreporterassays AT suzukiyutaka newcomputationalmethodtopredicttranscriptionalactivityofadnasequencefromdiversedatasetsofmassivelyparallelreporterassays |