Cargando…

A genome-wide approach to link genotype to clinical outcome by utilizing next generation sequencing and gene chip data of 6,697 breast cancer patients

BACKGROUND: The use of somatic mutations for predicting clinical outcome is difficult because a mutation can indirectly influence the function of many genes, and also because clinical follow-up is sparse in the relatively young next generation sequencing (NGS) databanks. Here we approach this proble...

Descripción completa

Detalles Bibliográficos
Autores principales: Pongor, Lőrinc, Kormos, Máté, Hatzis, Christos, Pusztai, Lajos, Szabó, András, Győrffy, Balázs
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4609150/
https://www.ncbi.nlm.nih.gov/pubmed/26474971
http://dx.doi.org/10.1186/s13073-015-0228-1
_version_ 1782395778854027264
author Pongor, Lőrinc
Kormos, Máté
Hatzis, Christos
Pusztai, Lajos
Szabó, András
Győrffy, Balázs
author_facet Pongor, Lőrinc
Kormos, Máté
Hatzis, Christos
Pusztai, Lajos
Szabó, András
Győrffy, Balázs
author_sort Pongor, Lőrinc
collection PubMed
description BACKGROUND: The use of somatic mutations for predicting clinical outcome is difficult because a mutation can indirectly influence the function of many genes, and also because clinical follow-up is sparse in the relatively young next generation sequencing (NGS) databanks. Here we approach this problem by linking sequence databanks to well annotated gene-chip datasets, using a multigene transcriptomic fingerprint as a link between gene mutations and gene expression in breast cancer patients. METHODS: The database consists of 763 NGS samples containing mutational status for 22,938 genes and RNA-seq data for 10,987 genes. The gene chip database contains 5,934 patients with 10,987 genes plus clinical characteristics. For the prediction, mutations present in a sample are first translated into a ‘transcriptomic fingerprint’ by running ROC analysis on mutation and RNA-seq data. Then correlation to survival is assessed by computing Cox regression for both up- and downregulated signatures. RESULTS: According to this approach, the top driver oncogenes having a mutation prevalence over 5 % included AKT1, TRANK1, TRAPPC10, RPGR, COL6A2, RAPGEF4, ATG2B, CNTRL, NAA38, OSBPL10, POTEF, SCLT1, SUN1, VWDE, MTUS2, and PIK3CA, and the top tumor suppressor genes included PHEX, TP53, GGA3, RGS22, PXDNL, ARFGEF1, BRCA2, CHD8, GCC2, and ARMC4. The system was validated by computing correlation between RNA-seq and microarray data (r(2) = 0.73, P < 1E-16). Cross-validation using 20 genes with a prevalence of approximately 5 % confirmed analysis reproducibility. CONCLUSIONS: We established a pipeline enabling rapid clinical validation of a discovered mutation in a large breast cancer cohort. An online interface is available for evaluating any human gene mutation or combinations of maximum three such genes (http://www.g-2-o.com). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13073-015-0228-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4609150
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-46091502015-10-18 A genome-wide approach to link genotype to clinical outcome by utilizing next generation sequencing and gene chip data of 6,697 breast cancer patients Pongor, Lőrinc Kormos, Máté Hatzis, Christos Pusztai, Lajos Szabó, András Győrffy, Balázs Genome Med Research BACKGROUND: The use of somatic mutations for predicting clinical outcome is difficult because a mutation can indirectly influence the function of many genes, and also because clinical follow-up is sparse in the relatively young next generation sequencing (NGS) databanks. Here we approach this problem by linking sequence databanks to well annotated gene-chip datasets, using a multigene transcriptomic fingerprint as a link between gene mutations and gene expression in breast cancer patients. METHODS: The database consists of 763 NGS samples containing mutational status for 22,938 genes and RNA-seq data for 10,987 genes. The gene chip database contains 5,934 patients with 10,987 genes plus clinical characteristics. For the prediction, mutations present in a sample are first translated into a ‘transcriptomic fingerprint’ by running ROC analysis on mutation and RNA-seq data. Then correlation to survival is assessed by computing Cox regression for both up- and downregulated signatures. RESULTS: According to this approach, the top driver oncogenes having a mutation prevalence over 5 % included AKT1, TRANK1, TRAPPC10, RPGR, COL6A2, RAPGEF4, ATG2B, CNTRL, NAA38, OSBPL10, POTEF, SCLT1, SUN1, VWDE, MTUS2, and PIK3CA, and the top tumor suppressor genes included PHEX, TP53, GGA3, RGS22, PXDNL, ARFGEF1, BRCA2, CHD8, GCC2, and ARMC4. The system was validated by computing correlation between RNA-seq and microarray data (r(2) = 0.73, P < 1E-16). Cross-validation using 20 genes with a prevalence of approximately 5 % confirmed analysis reproducibility. CONCLUSIONS: We established a pipeline enabling rapid clinical validation of a discovered mutation in a large breast cancer cohort. An online interface is available for evaluating any human gene mutation or combinations of maximum three such genes (http://www.g-2-o.com). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13073-015-0228-1) contains supplementary material, which is available to authorized users. BioMed Central 2015-10-16 /pmc/articles/PMC4609150/ /pubmed/26474971 http://dx.doi.org/10.1186/s13073-015-0228-1 Text en © Pongor et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Pongor, Lőrinc
Kormos, Máté
Hatzis, Christos
Pusztai, Lajos
Szabó, András
Győrffy, Balázs
A genome-wide approach to link genotype to clinical outcome by utilizing next generation sequencing and gene chip data of 6,697 breast cancer patients
title A genome-wide approach to link genotype to clinical outcome by utilizing next generation sequencing and gene chip data of 6,697 breast cancer patients
title_full A genome-wide approach to link genotype to clinical outcome by utilizing next generation sequencing and gene chip data of 6,697 breast cancer patients
title_fullStr A genome-wide approach to link genotype to clinical outcome by utilizing next generation sequencing and gene chip data of 6,697 breast cancer patients
title_full_unstemmed A genome-wide approach to link genotype to clinical outcome by utilizing next generation sequencing and gene chip data of 6,697 breast cancer patients
title_short A genome-wide approach to link genotype to clinical outcome by utilizing next generation sequencing and gene chip data of 6,697 breast cancer patients
title_sort genome-wide approach to link genotype to clinical outcome by utilizing next generation sequencing and gene chip data of 6,697 breast cancer patients
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4609150/
https://www.ncbi.nlm.nih.gov/pubmed/26474971
http://dx.doi.org/10.1186/s13073-015-0228-1
work_keys_str_mv AT pongorlorinc agenomewideapproachtolinkgenotypetoclinicaloutcomebyutilizingnextgenerationsequencingandgenechipdataof6697breastcancerpatients
AT kormosmate agenomewideapproachtolinkgenotypetoclinicaloutcomebyutilizingnextgenerationsequencingandgenechipdataof6697breastcancerpatients
AT hatzischristos agenomewideapproachtolinkgenotypetoclinicaloutcomebyutilizingnextgenerationsequencingandgenechipdataof6697breastcancerpatients
AT pusztailajos agenomewideapproachtolinkgenotypetoclinicaloutcomebyutilizingnextgenerationsequencingandgenechipdataof6697breastcancerpatients
AT szaboandras agenomewideapproachtolinkgenotypetoclinicaloutcomebyutilizingnextgenerationsequencingandgenechipdataof6697breastcancerpatients
AT gyorffybalazs agenomewideapproachtolinkgenotypetoclinicaloutcomebyutilizingnextgenerationsequencingandgenechipdataof6697breastcancerpatients
AT pongorlorinc genomewideapproachtolinkgenotypetoclinicaloutcomebyutilizingnextgenerationsequencingandgenechipdataof6697breastcancerpatients
AT kormosmate genomewideapproachtolinkgenotypetoclinicaloutcomebyutilizingnextgenerationsequencingandgenechipdataof6697breastcancerpatients
AT hatzischristos genomewideapproachtolinkgenotypetoclinicaloutcomebyutilizingnextgenerationsequencingandgenechipdataof6697breastcancerpatients
AT pusztailajos genomewideapproachtolinkgenotypetoclinicaloutcomebyutilizingnextgenerationsequencingandgenechipdataof6697breastcancerpatients
AT szaboandras genomewideapproachtolinkgenotypetoclinicaloutcomebyutilizingnextgenerationsequencingandgenechipdataof6697breastcancerpatients
AT gyorffybalazs genomewideapproachtolinkgenotypetoclinicaloutcomebyutilizingnextgenerationsequencingandgenechipdataof6697breastcancerpatients