Cargando…
Integrating sequence and gene expression information predicts genome-wide DNA-binding proteins and suggests a cooperative mechanism
DNA-binding proteins (DBPs) perform diverse biological functions ranging from transcription to pathogen sensing. Machine learning methods can not only identify DBPs de novo but also provide insights into their DNA-recognition dynamics. However, it remains unclear whether available methods that can a...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5758906/ https://www.ncbi.nlm.nih.gov/pubmed/29186632 http://dx.doi.org/10.1093/nar/gkx1166 |
_version_ | 1783291089083432960 |
---|---|
author | Ahmad, Shandar Prathipati, Philip Tripathi, Lokesh P Chen, Yi-An Arya, Ajay Murakami, Yoichi Mizuguchi, Kenji |
author_facet | Ahmad, Shandar Prathipati, Philip Tripathi, Lokesh P Chen, Yi-An Arya, Ajay Murakami, Yoichi Mizuguchi, Kenji |
author_sort | Ahmad, Shandar |
collection | PubMed |
description | DNA-binding proteins (DBPs) perform diverse biological functions ranging from transcription to pathogen sensing. Machine learning methods can not only identify DBPs de novo but also provide insights into their DNA-recognition dynamics. However, it remains unclear whether available methods that can accurately predict DNA-binding sites in known DBPs can also identify novel DBPs. Moreover, sequence information is blind to the cellular- and disease-specific contexts of DBP activities, whereas the under-utilized knowledge from public gene expression data offers great promise. To address these issues, we have developed novel methods for predicting DBPs by integrating sequence and gene expression-derived features and applied them to explore human, mouse and Arabidopsis proteomes. While our sequence-based models outperformed the gene expression-based ones, some proteins with weaker DBP-like sequence features were correctly predicted by gene expression-based features, suggesting that these proteins acquire a tangible DBP functionality in a conducive gene expression environment. Analysis of motif enrichment among the co-expressed genes of top 100 candidates DBPs from hitherto unannotated genes provides further avenues to explore their functional associations. |
format | Online Article Text |
id | pubmed-5758906 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-57589062018-01-16 Integrating sequence and gene expression information predicts genome-wide DNA-binding proteins and suggests a cooperative mechanism Ahmad, Shandar Prathipati, Philip Tripathi, Lokesh P Chen, Yi-An Arya, Ajay Murakami, Yoichi Mizuguchi, Kenji Nucleic Acids Res Computational Biology DNA-binding proteins (DBPs) perform diverse biological functions ranging from transcription to pathogen sensing. Machine learning methods can not only identify DBPs de novo but also provide insights into their DNA-recognition dynamics. However, it remains unclear whether available methods that can accurately predict DNA-binding sites in known DBPs can also identify novel DBPs. Moreover, sequence information is blind to the cellular- and disease-specific contexts of DBP activities, whereas the under-utilized knowledge from public gene expression data offers great promise. To address these issues, we have developed novel methods for predicting DBPs by integrating sequence and gene expression-derived features and applied them to explore human, mouse and Arabidopsis proteomes. While our sequence-based models outperformed the gene expression-based ones, some proteins with weaker DBP-like sequence features were correctly predicted by gene expression-based features, suggesting that these proteins acquire a tangible DBP functionality in a conducive gene expression environment. Analysis of motif enrichment among the co-expressed genes of top 100 candidates DBPs from hitherto unannotated genes provides further avenues to explore their functional associations. Oxford University Press 2018-01-09 2017-11-25 /pmc/articles/PMC5758906/ /pubmed/29186632 http://dx.doi.org/10.1093/nar/gkx1166 Text en © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Computational Biology Ahmad, Shandar Prathipati, Philip Tripathi, Lokesh P Chen, Yi-An Arya, Ajay Murakami, Yoichi Mizuguchi, Kenji Integrating sequence and gene expression information predicts genome-wide DNA-binding proteins and suggests a cooperative mechanism |
title | Integrating sequence and gene expression information predicts genome-wide DNA-binding proteins and suggests a cooperative mechanism |
title_full | Integrating sequence and gene expression information predicts genome-wide DNA-binding proteins and suggests a cooperative mechanism |
title_fullStr | Integrating sequence and gene expression information predicts genome-wide DNA-binding proteins and suggests a cooperative mechanism |
title_full_unstemmed | Integrating sequence and gene expression information predicts genome-wide DNA-binding proteins and suggests a cooperative mechanism |
title_short | Integrating sequence and gene expression information predicts genome-wide DNA-binding proteins and suggests a cooperative mechanism |
title_sort | integrating sequence and gene expression information predicts genome-wide dna-binding proteins and suggests a cooperative mechanism |
topic | Computational Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5758906/ https://www.ncbi.nlm.nih.gov/pubmed/29186632 http://dx.doi.org/10.1093/nar/gkx1166 |
work_keys_str_mv | AT ahmadshandar integratingsequenceandgeneexpressioninformationpredictsgenomewidednabindingproteinsandsuggestsacooperativemechanism AT prathipatiphilip integratingsequenceandgeneexpressioninformationpredictsgenomewidednabindingproteinsandsuggestsacooperativemechanism AT tripathilokeshp integratingsequenceandgeneexpressioninformationpredictsgenomewidednabindingproteinsandsuggestsacooperativemechanism AT chenyian integratingsequenceandgeneexpressioninformationpredictsgenomewidednabindingproteinsandsuggestsacooperativemechanism AT aryaajay integratingsequenceandgeneexpressioninformationpredictsgenomewidednabindingproteinsandsuggestsacooperativemechanism AT murakamiyoichi integratingsequenceandgeneexpressioninformationpredictsgenomewidednabindingproteinsandsuggestsacooperativemechanism AT mizuguchikenji integratingsequenceandgeneexpressioninformationpredictsgenomewidednabindingproteinsandsuggestsacooperativemechanism |