Cargando…

LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing

BACKGROUND: Long-read RNA-Seq techniques can generate reads that encompass a large proportion or the entire mRNA/cDNA molecules, so they are expected to address inherited limitations of short-read RNA-Seq techniques that typically generate < 150 bp reads. However, there is a general lack of softw...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Qian, Hu, Yu, Stucky, Andres, Fang, Li, Zhong, Jiang F., Wang, Kai
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7771079/
https://www.ncbi.nlm.nih.gov/pubmed/33372596
http://dx.doi.org/10.1186/s12864-020-07207-4
_version_ 1783629642624663552
author Liu, Qian
Hu, Yu
Stucky, Andres
Fang, Li
Zhong, Jiang F.
Wang, Kai
author_facet Liu, Qian
Hu, Yu
Stucky, Andres
Fang, Li
Zhong, Jiang F.
Wang, Kai
author_sort Liu, Qian
collection PubMed
description BACKGROUND: Long-read RNA-Seq techniques can generate reads that encompass a large proportion or the entire mRNA/cDNA molecules, so they are expected to address inherited limitations of short-read RNA-Seq techniques that typically generate < 150 bp reads. However, there is a general lack of software tools for gene fusion detection from long-read RNA-seq data, which takes into account the high basecalling error rates and the presence of alignment errors. RESULTS: In this study, we developed a fast computational tool, LongGF, to efficiently detect candidate gene fusions from long-read RNA-seq data, including cDNA sequencing data and direct mRNA sequencing data. We evaluated LongGF on tens of simulated long-read RNA-seq datasets, and demonstrated its superior performance in gene fusion detection. We also tested LongGF on a Nanopore direct mRNA sequencing dataset and a PacBio sequencing dataset generated on a mixture of 10 cancer cell lines, and found that LongGF achieved better performance to detect known gene fusions over existing computational tools. Furthermore, we tested LongGF on a Nanopore cDNA sequencing dataset on acute myeloid leukemia, and pinpointed the exact location of a translocation (previously known in cytogenetic resolution) in base resolution, which was further validated by Sanger sequencing. CONCLUSIONS: In summary, LongGF will greatly facilitate the discovery of candidate gene fusion events from long-read RNA-Seq data, especially in cancer samples. LongGF is implemented in C++ and is available at https://github.com/WGLab/LongGF.
format Online
Article
Text
id pubmed-7771079
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-77710792020-12-30 LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing Liu, Qian Hu, Yu Stucky, Andres Fang, Li Zhong, Jiang F. Wang, Kai BMC Genomics Research BACKGROUND: Long-read RNA-Seq techniques can generate reads that encompass a large proportion or the entire mRNA/cDNA molecules, so they are expected to address inherited limitations of short-read RNA-Seq techniques that typically generate < 150 bp reads. However, there is a general lack of software tools for gene fusion detection from long-read RNA-seq data, which takes into account the high basecalling error rates and the presence of alignment errors. RESULTS: In this study, we developed a fast computational tool, LongGF, to efficiently detect candidate gene fusions from long-read RNA-seq data, including cDNA sequencing data and direct mRNA sequencing data. We evaluated LongGF on tens of simulated long-read RNA-seq datasets, and demonstrated its superior performance in gene fusion detection. We also tested LongGF on a Nanopore direct mRNA sequencing dataset and a PacBio sequencing dataset generated on a mixture of 10 cancer cell lines, and found that LongGF achieved better performance to detect known gene fusions over existing computational tools. Furthermore, we tested LongGF on a Nanopore cDNA sequencing dataset on acute myeloid leukemia, and pinpointed the exact location of a translocation (previously known in cytogenetic resolution) in base resolution, which was further validated by Sanger sequencing. CONCLUSIONS: In summary, LongGF will greatly facilitate the discovery of candidate gene fusion events from long-read RNA-Seq data, especially in cancer samples. LongGF is implemented in C++ and is available at https://github.com/WGLab/LongGF. BioMed Central 2020-12-29 /pmc/articles/PMC7771079/ /pubmed/33372596 http://dx.doi.org/10.1186/s12864-020-07207-4 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Liu, Qian
Hu, Yu
Stucky, Andres
Fang, Li
Zhong, Jiang F.
Wang, Kai
LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing
title LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing
title_full LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing
title_fullStr LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing
title_full_unstemmed LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing
title_short LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing
title_sort longgf: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7771079/
https://www.ncbi.nlm.nih.gov/pubmed/33372596
http://dx.doi.org/10.1186/s12864-020-07207-4
work_keys_str_mv AT liuqian longgfcomputationalalgorithmandsoftwaretoolforfastandaccuratedetectionofgenefusionsbylongreadtranscriptomesequencing
AT huyu longgfcomputationalalgorithmandsoftwaretoolforfastandaccuratedetectionofgenefusionsbylongreadtranscriptomesequencing
AT stuckyandres longgfcomputationalalgorithmandsoftwaretoolforfastandaccuratedetectionofgenefusionsbylongreadtranscriptomesequencing
AT fangli longgfcomputationalalgorithmandsoftwaretoolforfastandaccuratedetectionofgenefusionsbylongreadtranscriptomesequencing
AT zhongjiangf longgfcomputationalalgorithmandsoftwaretoolforfastandaccuratedetectionofgenefusionsbylongreadtranscriptomesequencing
AT wangkai longgfcomputationalalgorithmandsoftwaretoolforfastandaccuratedetectionofgenefusionsbylongreadtranscriptomesequencing