Cargando…

Bioinformatic pipelines for whole transcriptome sequencing data exploitation in leukemia patients with complex structural variants

BACKGROUND: Extensive genome rearrangements, known as chromothripsis, have been recently identified in several cancer types. Chromothripsis leads to complex structural variants (cSVs) causing aberrant gene expression and the formation of de novo fusion genes, which can trigger cancer development, or...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hynst, Jakub, Plevova, Karla, Radova, Lenka, Bystry, Vojtech, Pal, Karol, Pospisilova, Sarka
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2019
Materias:	Bioinformatics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6571010/ https://www.ncbi.nlm.nih.gov/pubmed/31223530 http://dx.doi.org/10.7717/peerj.7071

_version_	1783427347323551744
author	Hynst, Jakub Plevova, Karla Radova, Lenka Bystry, Vojtech Pal, Karol Pospisilova, Sarka
author_facet	Hynst, Jakub Plevova, Karla Radova, Lenka Bystry, Vojtech Pal, Karol Pospisilova, Sarka
author_sort	Hynst, Jakub
collection	PubMed
description	BACKGROUND: Extensive genome rearrangements, known as chromothripsis, have been recently identified in several cancer types. Chromothripsis leads to complex structural variants (cSVs) causing aberrant gene expression and the formation of de novo fusion genes, which can trigger cancer development, or worsen its clinical course. The functional impact of cSVs can be studied at the RNA level using whole transcriptome sequencing (total RNA-Seq). It represents a powerful tool for discovering, profiling, and quantifying changes of gene expression in the overall genomic context. However, bioinformatic analysis of transcriptomic data, especially in cases with cSVs, is a complex and challenging task, and the development of proper bioinformatic tools for transcriptome studies is necessary. METHODS: We designed a bioinformatic workflow for the analysis of total RNA-Seq data consisting of two separate parts (pipelines): The first pipeline incorporates a statistical solution for differential gene expression analysis in a biologically heterogeneous sample set. We utilized results from transcriptomic arrays which were carried out in parallel to increase the precision of the analysis. The second pipeline is used for the identification of de novo fusion genes. Special attention was given to the filtering of false positives (FPs), which was achieved through consensus fusion calling with several fusion gene callers. We applied the workflow to the data obtained from ten patients with chronic lymphocytic leukemia (CLL) to describe the consequences of their cSVs in detail. The fusion genes identified by our pipeline were correlated with genomic break-points detected by genomic arrays. RESULTS: We set up a novel solution for differential gene expression analysis of individual samples and de novo fusion gene detection from total RNA-Seq data. The results of the differential gene expression analysis were concordant with results obtained by transcriptomic arrays, which demonstrates the analytical capabilities of our method. We also showed that the consensus fusion gene detection approach was able to identify true positives (TPs) efficiently. Detected coordinates of fusion gene junctions were in concordance with genomic breakpoints assessed using genomic arrays. DISCUSSION: Byapplying our methods to real clinical samples, we proved that our approach for total RNA-Seq data analysis generates results consistent with other genomic analytical techniques. The data obtained by our analyses provided clues for the study of the biological consequences of cSVs with far-reaching implications for clinical outcome and management of cancer patients. The bioinformatic workflow is also widely applicable for addressing other research questions in different contexts, for which transcriptomic data are generated.
format	Online Article Text
id	pubmed-6571010
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-65710102019-06-20 Bioinformatic pipelines for whole transcriptome sequencing data exploitation in leukemia patients with complex structural variants Hynst, Jakub Plevova, Karla Radova, Lenka Bystry, Vojtech Pal, Karol Pospisilova, Sarka PeerJ Bioinformatics BACKGROUND: Extensive genome rearrangements, known as chromothripsis, have been recently identified in several cancer types. Chromothripsis leads to complex structural variants (cSVs) causing aberrant gene expression and the formation of de novo fusion genes, which can trigger cancer development, or worsen its clinical course. The functional impact of cSVs can be studied at the RNA level using whole transcriptome sequencing (total RNA-Seq). It represents a powerful tool for discovering, profiling, and quantifying changes of gene expression in the overall genomic context. However, bioinformatic analysis of transcriptomic data, especially in cases with cSVs, is a complex and challenging task, and the development of proper bioinformatic tools for transcriptome studies is necessary. METHODS: We designed a bioinformatic workflow for the analysis of total RNA-Seq data consisting of two separate parts (pipelines): The first pipeline incorporates a statistical solution for differential gene expression analysis in a biologically heterogeneous sample set. We utilized results from transcriptomic arrays which were carried out in parallel to increase the precision of the analysis. The second pipeline is used for the identification of de novo fusion genes. Special attention was given to the filtering of false positives (FPs), which was achieved through consensus fusion calling with several fusion gene callers. We applied the workflow to the data obtained from ten patients with chronic lymphocytic leukemia (CLL) to describe the consequences of their cSVs in detail. The fusion genes identified by our pipeline were correlated with genomic break-points detected by genomic arrays. RESULTS: We set up a novel solution for differential gene expression analysis of individual samples and de novo fusion gene detection from total RNA-Seq data. The results of the differential gene expression analysis were concordant with results obtained by transcriptomic arrays, which demonstrates the analytical capabilities of our method. We also showed that the consensus fusion gene detection approach was able to identify true positives (TPs) efficiently. Detected coordinates of fusion gene junctions were in concordance with genomic breakpoints assessed using genomic arrays. DISCUSSION: Byapplying our methods to real clinical samples, we proved that our approach for total RNA-Seq data analysis generates results consistent with other genomic analytical techniques. The data obtained by our analyses provided clues for the study of the biological consequences of cSVs with far-reaching implications for clinical outcome and management of cancer patients. The bioinformatic workflow is also widely applicable for addressing other research questions in different contexts, for which transcriptomic data are generated. PeerJ Inc. 2019-06-12 /pmc/articles/PMC6571010/ /pubmed/31223530 http://dx.doi.org/10.7717/peerj.7071 Text en ©2019 Hynst et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle	Bioinformatics Hynst, Jakub Plevova, Karla Radova, Lenka Bystry, Vojtech Pal, Karol Pospisilova, Sarka Bioinformatic pipelines for whole transcriptome sequencing data exploitation in leukemia patients with complex structural variants
title	Bioinformatic pipelines for whole transcriptome sequencing data exploitation in leukemia patients with complex structural variants
title_full	Bioinformatic pipelines for whole transcriptome sequencing data exploitation in leukemia patients with complex structural variants
title_fullStr	Bioinformatic pipelines for whole transcriptome sequencing data exploitation in leukemia patients with complex structural variants
title_full_unstemmed	Bioinformatic pipelines for whole transcriptome sequencing data exploitation in leukemia patients with complex structural variants
title_short	Bioinformatic pipelines for whole transcriptome sequencing data exploitation in leukemia patients with complex structural variants
title_sort	bioinformatic pipelines for whole transcriptome sequencing data exploitation in leukemia patients with complex structural variants
topic	Bioinformatics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6571010/ https://www.ncbi.nlm.nih.gov/pubmed/31223530 http://dx.doi.org/10.7717/peerj.7071
work_keys_str_mv	AT hynstjakub bioinformaticpipelinesforwholetranscriptomesequencingdataexploitationinleukemiapatientswithcomplexstructuralvariants AT plevovakarla bioinformaticpipelinesforwholetranscriptomesequencingdataexploitationinleukemiapatientswithcomplexstructuralvariants AT radovalenka bioinformaticpipelinesforwholetranscriptomesequencingdataexploitationinleukemiapatientswithcomplexstructuralvariants AT bystryvojtech bioinformaticpipelinesforwholetranscriptomesequencingdataexploitationinleukemiapatientswithcomplexstructuralvariants AT palkarol bioinformaticpipelinesforwholetranscriptomesequencingdataexploitationinleukemiapatientswithcomplexstructuralvariants AT pospisilovasarka bioinformaticpipelinesforwholetranscriptomesequencingdataexploitationinleukemiapatientswithcomplexstructuralvariants

Bioinformatic pipelines for whole transcriptome sequencing data exploitation in leukemia patients with complex structural variants

Ejemplares similares