Cargando…

TAP: a targeted clinical genomics pipeline for detecting transcript variants using RNA-seq data

BACKGROUND: RNA-seq is a powerful and cost-effective technology for molecular diagnostics of cancer and other diseases, and it can reach its full potential when coupled with validated clinical-grade informatics tools. Despite recent advances in long-read sequencing, transcriptome assembly of short r...

Descripción completa

Detalles Bibliográficos
Autores principales: Chiu, Readman, Nip, Ka Ming, Chu, Justin, Birol, Inanc
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6131862/
https://www.ncbi.nlm.nih.gov/pubmed/30200994
http://dx.doi.org/10.1186/s12920-018-0402-6
_version_ 1783354211657842688
author Chiu, Readman
Nip, Ka Ming
Chu, Justin
Birol, Inanc
author_facet Chiu, Readman
Nip, Ka Ming
Chu, Justin
Birol, Inanc
author_sort Chiu, Readman
collection PubMed
description BACKGROUND: RNA-seq is a powerful and cost-effective technology for molecular diagnostics of cancer and other diseases, and it can reach its full potential when coupled with validated clinical-grade informatics tools. Despite recent advances in long-read sequencing, transcriptome assembly of short reads remains a useful and cost-effective methodology for unveiling transcript-level rearrangements and novel isoforms. One of the major concerns for adopting the proven de novo assembly approach for RNA-seq data in clinical settings has been the analysis turnaround time. To address this concern, we have developed a targeted approach to expedite assembly and analysis of RNA-seq data. RESULTS: Here we present our Targeted Assembly Pipeline (TAP), which consists of four stages: 1) alignment-free gene-level classification of RNA-seq reads using BioBloomTools, 2) de novo assembly of individual targets using Trans-ABySS, 3) alignment of assembled contigs to the reference genome and transcriptome with GMAP and BWA and 4) structural and splicing variant detection using PAVFinder. We show that PAVFinder is a robust gene fusion detection tool when compared to established methods such as Tophat-Fusion and deFuse on simulated data of 448 events. Using the Leucegene acute myeloid leukemia (AML) RNA-seq data and a set of 580 COSMIC target genes, TAP identified a wide range of hallmark molecular anomalies including gene fusions, tandem duplications, insertions and deletions in agreement with published literature results. Moreover, also in this dataset, TAP captured AML-specific splicing variants such as skipped exons and novel splice sites reported in studies elsewhere. Running time of TAP on 100–150 million read pairs and a 580-gene set is one to 2 hours on a 48-core machine. CONCLUSIONS: We demonstrated that TAP is a fast and robust RNA-seq variant detection pipeline that is potentially amenable to clinical applications. TAP is available at http://www.bcgsc.ca/platform/bioinfo/software/pavfinder ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12920-018-0402-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6131862
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-61318622018-09-13 TAP: a targeted clinical genomics pipeline for detecting transcript variants using RNA-seq data Chiu, Readman Nip, Ka Ming Chu, Justin Birol, Inanc BMC Med Genomics Software BACKGROUND: RNA-seq is a powerful and cost-effective technology for molecular diagnostics of cancer and other diseases, and it can reach its full potential when coupled with validated clinical-grade informatics tools. Despite recent advances in long-read sequencing, transcriptome assembly of short reads remains a useful and cost-effective methodology for unveiling transcript-level rearrangements and novel isoforms. One of the major concerns for adopting the proven de novo assembly approach for RNA-seq data in clinical settings has been the analysis turnaround time. To address this concern, we have developed a targeted approach to expedite assembly and analysis of RNA-seq data. RESULTS: Here we present our Targeted Assembly Pipeline (TAP), which consists of four stages: 1) alignment-free gene-level classification of RNA-seq reads using BioBloomTools, 2) de novo assembly of individual targets using Trans-ABySS, 3) alignment of assembled contigs to the reference genome and transcriptome with GMAP and BWA and 4) structural and splicing variant detection using PAVFinder. We show that PAVFinder is a robust gene fusion detection tool when compared to established methods such as Tophat-Fusion and deFuse on simulated data of 448 events. Using the Leucegene acute myeloid leukemia (AML) RNA-seq data and a set of 580 COSMIC target genes, TAP identified a wide range of hallmark molecular anomalies including gene fusions, tandem duplications, insertions and deletions in agreement with published literature results. Moreover, also in this dataset, TAP captured AML-specific splicing variants such as skipped exons and novel splice sites reported in studies elsewhere. Running time of TAP on 100–150 million read pairs and a 580-gene set is one to 2 hours on a 48-core machine. CONCLUSIONS: We demonstrated that TAP is a fast and robust RNA-seq variant detection pipeline that is potentially amenable to clinical applications. TAP is available at http://www.bcgsc.ca/platform/bioinfo/software/pavfinder ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12920-018-0402-6) contains supplementary material, which is available to authorized users. BioMed Central 2018-09-10 /pmc/articles/PMC6131862/ /pubmed/30200994 http://dx.doi.org/10.1186/s12920-018-0402-6 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Chiu, Readman
Nip, Ka Ming
Chu, Justin
Birol, Inanc
TAP: a targeted clinical genomics pipeline for detecting transcript variants using RNA-seq data
title TAP: a targeted clinical genomics pipeline for detecting transcript variants using RNA-seq data
title_full TAP: a targeted clinical genomics pipeline for detecting transcript variants using RNA-seq data
title_fullStr TAP: a targeted clinical genomics pipeline for detecting transcript variants using RNA-seq data
title_full_unstemmed TAP: a targeted clinical genomics pipeline for detecting transcript variants using RNA-seq data
title_short TAP: a targeted clinical genomics pipeline for detecting transcript variants using RNA-seq data
title_sort tap: a targeted clinical genomics pipeline for detecting transcript variants using rna-seq data
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6131862/
https://www.ncbi.nlm.nih.gov/pubmed/30200994
http://dx.doi.org/10.1186/s12920-018-0402-6
work_keys_str_mv AT chiureadman tapatargetedclinicalgenomicspipelinefordetectingtranscriptvariantsusingrnaseqdata
AT nipkaming tapatargetedclinicalgenomicspipelinefordetectingtranscriptvariantsusingrnaseqdata
AT chujustin tapatargetedclinicalgenomicspipelinefordetectingtranscriptvariantsusingrnaseqdata
AT birolinanc tapatargetedclinicalgenomicspipelinefordetectingtranscriptvariantsusingrnaseqdata