Cargando…

New functionalities in the TCGAbiolinks package for the study and integration of cancer data from GDC and GTEx

The advent of Next-Generation Sequencing (NGS) technologies has opened new perspectives in deciphering the genetic mechanisms underlying complex diseases. Nowadays, the amount of genomic data is massive and substantial efforts and new tools are required to unveil the information hidden in the data....

Descripción completa

Detalles Bibliográficos
Autores principales: Mounir, Mohamed, Lucchetta, Marta, Silva, Tiago C., Olsen, Catharina, Bontempi, Gianluca, Chen, Xi, Noushmehr, Houtan, Colaprico, Antonio, Papaleo, Elena
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6420023/
https://www.ncbi.nlm.nih.gov/pubmed/30835723
http://dx.doi.org/10.1371/journal.pcbi.1006701
_version_ 1783404044148015104
author Mounir, Mohamed
Lucchetta, Marta
Silva, Tiago C.
Olsen, Catharina
Bontempi, Gianluca
Chen, Xi
Noushmehr, Houtan
Colaprico, Antonio
Papaleo, Elena
author_facet Mounir, Mohamed
Lucchetta, Marta
Silva, Tiago C.
Olsen, Catharina
Bontempi, Gianluca
Chen, Xi
Noushmehr, Houtan
Colaprico, Antonio
Papaleo, Elena
author_sort Mounir, Mohamed
collection PubMed
description The advent of Next-Generation Sequencing (NGS) technologies has opened new perspectives in deciphering the genetic mechanisms underlying complex diseases. Nowadays, the amount of genomic data is massive and substantial efforts and new tools are required to unveil the information hidden in the data. The Genomic Data Commons (GDC) Data Portal is a platform that contains different genomic studies including the ones from The Cancer Genome Atlas (TCGA) and the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiatives, accounting for more than 40 tumor types originating from nearly 30000 patients. Such platforms, although very attractive, must make sure the stored data are easily accessible and adequately harmonized. Moreover, they have the primary focus on the data storage in a unique place, and they do not provide a comprehensive toolkit for analyses and interpretation of the data. To fulfill this urgent need, comprehensive but easily accessible computational methods for integrative analyses of genomic data that do not renounce a robust statistical and theoretical framework are required. In this context, the R/Bioconductor package TCGAbiolinks was developed, offering a variety of bioinformatics functionalities. Here we introduce new features and enhancements of TCGAbiolinks in terms of i) more accurate and flexible pipelines for differential expression analyses, ii) different methods for tumor purity estimation and filtering, iii) integration of normal samples from other platforms iv) support for other genomics datasets, exemplified here by the TARGET data. Evidence has shown that accounting for tumor purity is essential in the study of tumorigenesis, as these factors promote confounding behavior regarding differential expression analysis. With this in mind, we implemented these filtering procedures in TCGAbiolinks. Moreover, a limitation of some of the TCGA datasets is the unavailability or paucity of corresponding normal samples. We thus integrated into TCGAbiolinks the possibility to use normal samples from the Genotype-Tissue Expression (GTEx) project, which is another large-scale repository cataloging gene expression from healthy individuals. The new functionalities are available in the TCGAbiolinks version 2.8 and higher released in Bioconductor version 3.7.
format Online
Article
Text
id pubmed-6420023
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-64200232019-04-01 New functionalities in the TCGAbiolinks package for the study and integration of cancer data from GDC and GTEx Mounir, Mohamed Lucchetta, Marta Silva, Tiago C. Olsen, Catharina Bontempi, Gianluca Chen, Xi Noushmehr, Houtan Colaprico, Antonio Papaleo, Elena PLoS Comput Biol Research Article The advent of Next-Generation Sequencing (NGS) technologies has opened new perspectives in deciphering the genetic mechanisms underlying complex diseases. Nowadays, the amount of genomic data is massive and substantial efforts and new tools are required to unveil the information hidden in the data. The Genomic Data Commons (GDC) Data Portal is a platform that contains different genomic studies including the ones from The Cancer Genome Atlas (TCGA) and the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiatives, accounting for more than 40 tumor types originating from nearly 30000 patients. Such platforms, although very attractive, must make sure the stored data are easily accessible and adequately harmonized. Moreover, they have the primary focus on the data storage in a unique place, and they do not provide a comprehensive toolkit for analyses and interpretation of the data. To fulfill this urgent need, comprehensive but easily accessible computational methods for integrative analyses of genomic data that do not renounce a robust statistical and theoretical framework are required. In this context, the R/Bioconductor package TCGAbiolinks was developed, offering a variety of bioinformatics functionalities. Here we introduce new features and enhancements of TCGAbiolinks in terms of i) more accurate and flexible pipelines for differential expression analyses, ii) different methods for tumor purity estimation and filtering, iii) integration of normal samples from other platforms iv) support for other genomics datasets, exemplified here by the TARGET data. Evidence has shown that accounting for tumor purity is essential in the study of tumorigenesis, as these factors promote confounding behavior regarding differential expression analysis. With this in mind, we implemented these filtering procedures in TCGAbiolinks. Moreover, a limitation of some of the TCGA datasets is the unavailability or paucity of corresponding normal samples. We thus integrated into TCGAbiolinks the possibility to use normal samples from the Genotype-Tissue Expression (GTEx) project, which is another large-scale repository cataloging gene expression from healthy individuals. The new functionalities are available in the TCGAbiolinks version 2.8 and higher released in Bioconductor version 3.7. Public Library of Science 2019-03-05 /pmc/articles/PMC6420023/ /pubmed/30835723 http://dx.doi.org/10.1371/journal.pcbi.1006701 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication.
spellingShingle Research Article
Mounir, Mohamed
Lucchetta, Marta
Silva, Tiago C.
Olsen, Catharina
Bontempi, Gianluca
Chen, Xi
Noushmehr, Houtan
Colaprico, Antonio
Papaleo, Elena
New functionalities in the TCGAbiolinks package for the study and integration of cancer data from GDC and GTEx
title New functionalities in the TCGAbiolinks package for the study and integration of cancer data from GDC and GTEx
title_full New functionalities in the TCGAbiolinks package for the study and integration of cancer data from GDC and GTEx
title_fullStr New functionalities in the TCGAbiolinks package for the study and integration of cancer data from GDC and GTEx
title_full_unstemmed New functionalities in the TCGAbiolinks package for the study and integration of cancer data from GDC and GTEx
title_short New functionalities in the TCGAbiolinks package for the study and integration of cancer data from GDC and GTEx
title_sort new functionalities in the tcgabiolinks package for the study and integration of cancer data from gdc and gtex
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6420023/
https://www.ncbi.nlm.nih.gov/pubmed/30835723
http://dx.doi.org/10.1371/journal.pcbi.1006701
work_keys_str_mv AT mounirmohamed newfunctionalitiesinthetcgabiolinkspackageforthestudyandintegrationofcancerdatafromgdcandgtex
AT lucchettamarta newfunctionalitiesinthetcgabiolinkspackageforthestudyandintegrationofcancerdatafromgdcandgtex
AT silvatiagoc newfunctionalitiesinthetcgabiolinkspackageforthestudyandintegrationofcancerdatafromgdcandgtex
AT olsencatharina newfunctionalitiesinthetcgabiolinkspackageforthestudyandintegrationofcancerdatafromgdcandgtex
AT bontempigianluca newfunctionalitiesinthetcgabiolinkspackageforthestudyandintegrationofcancerdatafromgdcandgtex
AT chenxi newfunctionalitiesinthetcgabiolinkspackageforthestudyandintegrationofcancerdatafromgdcandgtex
AT noushmehrhoutan newfunctionalitiesinthetcgabiolinkspackageforthestudyandintegrationofcancerdatafromgdcandgtex
AT colapricoantonio newfunctionalitiesinthetcgabiolinkspackageforthestudyandintegrationofcancerdatafromgdcandgtex
AT papaleoelena newfunctionalitiesinthetcgabiolinkspackageforthestudyandintegrationofcancerdatafromgdcandgtex