Cargando…
Towards enhanced and interpretable clustering/classification in integrative genomics
High-throughput technologies have led to large collections of different types of biological data that provide unprecedented opportunities to unravel molecular heterogeneity of biological processes. Nevertheless, how to jointly explore data from multiple sources into a holistic, biologically meaningf...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5714251/ https://www.ncbi.nlm.nih.gov/pubmed/28977511 http://dx.doi.org/10.1093/nar/gkx767 |
_version_ | 1783283554321432576 |
---|---|
author | Lu, Yang Young Lv, Jinchi Fuhrman, Jed A. Sun, Fengzhu |
author_facet | Lu, Yang Young Lv, Jinchi Fuhrman, Jed A. Sun, Fengzhu |
author_sort | Lu, Yang Young |
collection | PubMed |
description | High-throughput technologies have led to large collections of different types of biological data that provide unprecedented opportunities to unravel molecular heterogeneity of biological processes. Nevertheless, how to jointly explore data from multiple sources into a holistic, biologically meaningful interpretation remains challenging. In this work, we propose a scalable and tuning-free preprocessing framework, Heterogeneity Rescaling Pursuit (Hetero-RP), which weighs important features more highly than less important ones in accord with implicitly existing auxiliary knowledge. Finally, we demonstrate effectiveness of Hetero-RP in diverse clustering and classification applications. More importantly, Hetero-RP offers an interpretation of feature importance, shedding light on the driving forces of the underlying biology. In metagenomic contig binning, Hetero-RP automatically weighs abundance and composition profiles according to the varying number of samples, resulting in markedly improved performance of contig binning. In RNA-binding protein (RBP) binding site prediction, Hetero-RP not only improves the prediction performance measured by the area under the receiver operating characteristic curves (AUC), but also uncovers the evidence supported by independent studies, including the distribution of the binding sites of IGF2BP and PUM2, the binding competition between hnRNPC and U2AF2, and the intron–exon boundary of U2AF2 [availability: https://github.com/younglululu/Hetero-RP]. |
format | Online Article Text |
id | pubmed-5714251 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-57142512017-12-08 Towards enhanced and interpretable clustering/classification in integrative genomics Lu, Yang Young Lv, Jinchi Fuhrman, Jed A. Sun, Fengzhu Nucleic Acids Res Methods Online High-throughput technologies have led to large collections of different types of biological data that provide unprecedented opportunities to unravel molecular heterogeneity of biological processes. Nevertheless, how to jointly explore data from multiple sources into a holistic, biologically meaningful interpretation remains challenging. In this work, we propose a scalable and tuning-free preprocessing framework, Heterogeneity Rescaling Pursuit (Hetero-RP), which weighs important features more highly than less important ones in accord with implicitly existing auxiliary knowledge. Finally, we demonstrate effectiveness of Hetero-RP in diverse clustering and classification applications. More importantly, Hetero-RP offers an interpretation of feature importance, shedding light on the driving forces of the underlying biology. In metagenomic contig binning, Hetero-RP automatically weighs abundance and composition profiles according to the varying number of samples, resulting in markedly improved performance of contig binning. In RNA-binding protein (RBP) binding site prediction, Hetero-RP not only improves the prediction performance measured by the area under the receiver operating characteristic curves (AUC), but also uncovers the evidence supported by independent studies, including the distribution of the binding sites of IGF2BP and PUM2, the binding competition between hnRNPC and U2AF2, and the intron–exon boundary of U2AF2 [availability: https://github.com/younglululu/Hetero-RP]. Oxford University Press 2017-11-16 2017-08-30 /pmc/articles/PMC5714251/ /pubmed/28977511 http://dx.doi.org/10.1093/nar/gkx767 Text en © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Methods Online Lu, Yang Young Lv, Jinchi Fuhrman, Jed A. Sun, Fengzhu Towards enhanced and interpretable clustering/classification in integrative genomics |
title | Towards enhanced and interpretable clustering/classification in integrative genomics |
title_full | Towards enhanced and interpretable clustering/classification in integrative genomics |
title_fullStr | Towards enhanced and interpretable clustering/classification in integrative genomics |
title_full_unstemmed | Towards enhanced and interpretable clustering/classification in integrative genomics |
title_short | Towards enhanced and interpretable clustering/classification in integrative genomics |
title_sort | towards enhanced and interpretable clustering/classification in integrative genomics |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5714251/ https://www.ncbi.nlm.nih.gov/pubmed/28977511 http://dx.doi.org/10.1093/nar/gkx767 |
work_keys_str_mv | AT luyangyoung towardsenhancedandinterpretableclusteringclassificationinintegrativegenomics AT lvjinchi towardsenhancedandinterpretableclusteringclassificationinintegrativegenomics AT fuhrmanjeda towardsenhancedandinterpretableclusteringclassificationinintegrativegenomics AT sunfengzhu towardsenhancedandinterpretableclusteringclassificationinintegrativegenomics |