Cargando…

Ensemble Modeling Approach Targeting Heterogeneous RNA-Seq data: Application to Melanoma Pseudogenes

We studied the transcriptome landscape of skin cutaneous melanoma (SKCM) using 103 primary tumor samples from TCGA, and measured the expression levels of both protein coding genes and non-coding RNAs (ncRNAs). In particular, we emphasized pseudogenes potentially relevant to this cancer. While catalo...

Descripción completa

Detalles Bibliográficos
Autores principales: Capobianco, Enrico, Valdes, Camilo, Sarti, Samanta, Jiang, Zhijie, Poliseno, Laura, Tsinoremas, Nicolas F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5725464/
https://www.ncbi.nlm.nih.gov/pubmed/29229974
http://dx.doi.org/10.1038/s41598-017-17337-7
_version_ 1783285528753340416
author Capobianco, Enrico
Valdes, Camilo
Sarti, Samanta
Jiang, Zhijie
Poliseno, Laura
Tsinoremas, Nicolas F.
author_facet Capobianco, Enrico
Valdes, Camilo
Sarti, Samanta
Jiang, Zhijie
Poliseno, Laura
Tsinoremas, Nicolas F.
author_sort Capobianco, Enrico
collection PubMed
description We studied the transcriptome landscape of skin cutaneous melanoma (SKCM) using 103 primary tumor samples from TCGA, and measured the expression levels of both protein coding genes and non-coding RNAs (ncRNAs). In particular, we emphasized pseudogenes potentially relevant to this cancer. While cataloguing the profiles based on the known biotypes, all the employed RNA-Seq methods generated just a small consensus of significant biotypes. We thus designed an approach to reconcile the profiles from all methods following a simple strategy: we selected genes that were confirmed as differentially expressed by the ensemble predictions obtained in a regression model. The main advantages of this approach are: 1) Selection of a high-confidence gene set identifying relevant pathways; 2) Use of a regression model whose covariates embed all method-driven outcomes to predict an averaged profile; 3) Method-specific assessment of prediction power and significance. Furthermore, the approach can be generalized to any biological system for which noisy RNA-Seq profiles are computed. As our analyses concerned bio-annotations of both high-quality protein coding genes and ncRNAs, we considered the associations between pseudogenes and parental genes (targets). Among the candidate targets that were validated, we identified PINK1, which is studied in patients with Parkinson and cancer (especially melanoma).
format Online
Article
Text
id pubmed-5725464
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-57254642017-12-13 Ensemble Modeling Approach Targeting Heterogeneous RNA-Seq data: Application to Melanoma Pseudogenes Capobianco, Enrico Valdes, Camilo Sarti, Samanta Jiang, Zhijie Poliseno, Laura Tsinoremas, Nicolas F. Sci Rep Article We studied the transcriptome landscape of skin cutaneous melanoma (SKCM) using 103 primary tumor samples from TCGA, and measured the expression levels of both protein coding genes and non-coding RNAs (ncRNAs). In particular, we emphasized pseudogenes potentially relevant to this cancer. While cataloguing the profiles based on the known biotypes, all the employed RNA-Seq methods generated just a small consensus of significant biotypes. We thus designed an approach to reconcile the profiles from all methods following a simple strategy: we selected genes that were confirmed as differentially expressed by the ensemble predictions obtained in a regression model. The main advantages of this approach are: 1) Selection of a high-confidence gene set identifying relevant pathways; 2) Use of a regression model whose covariates embed all method-driven outcomes to predict an averaged profile; 3) Method-specific assessment of prediction power and significance. Furthermore, the approach can be generalized to any biological system for which noisy RNA-Seq profiles are computed. As our analyses concerned bio-annotations of both high-quality protein coding genes and ncRNAs, we considered the associations between pseudogenes and parental genes (targets). Among the candidate targets that were validated, we identified PINK1, which is studied in patients with Parkinson and cancer (especially melanoma). Nature Publishing Group UK 2017-12-11 /pmc/articles/PMC5725464/ /pubmed/29229974 http://dx.doi.org/10.1038/s41598-017-17337-7 Text en © The Author(s) 2017 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Capobianco, Enrico
Valdes, Camilo
Sarti, Samanta
Jiang, Zhijie
Poliseno, Laura
Tsinoremas, Nicolas F.
Ensemble Modeling Approach Targeting Heterogeneous RNA-Seq data: Application to Melanoma Pseudogenes
title Ensemble Modeling Approach Targeting Heterogeneous RNA-Seq data: Application to Melanoma Pseudogenes
title_full Ensemble Modeling Approach Targeting Heterogeneous RNA-Seq data: Application to Melanoma Pseudogenes
title_fullStr Ensemble Modeling Approach Targeting Heterogeneous RNA-Seq data: Application to Melanoma Pseudogenes
title_full_unstemmed Ensemble Modeling Approach Targeting Heterogeneous RNA-Seq data: Application to Melanoma Pseudogenes
title_short Ensemble Modeling Approach Targeting Heterogeneous RNA-Seq data: Application to Melanoma Pseudogenes
title_sort ensemble modeling approach targeting heterogeneous rna-seq data: application to melanoma pseudogenes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5725464/
https://www.ncbi.nlm.nih.gov/pubmed/29229974
http://dx.doi.org/10.1038/s41598-017-17337-7
work_keys_str_mv AT capobiancoenrico ensemblemodelingapproachtargetingheterogeneousrnaseqdataapplicationtomelanomapseudogenes
AT valdescamilo ensemblemodelingapproachtargetingheterogeneousrnaseqdataapplicationtomelanomapseudogenes
AT sartisamanta ensemblemodelingapproachtargetingheterogeneousrnaseqdataapplicationtomelanomapseudogenes
AT jiangzhijie ensemblemodelingapproachtargetingheterogeneousrnaseqdataapplicationtomelanomapseudogenes
AT polisenolaura ensemblemodelingapproachtargetingheterogeneousrnaseqdataapplicationtomelanomapseudogenes
AT tsinoremasnicolasf ensemblemodelingapproachtargetingheterogeneousrnaseqdataapplicationtomelanomapseudogenes