Cargando…
LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing
BACKGROUND: Long-read RNA-Seq techniques can generate reads that encompass a large proportion or the entire mRNA/cDNA molecules, so they are expected to address inherited limitations of short-read RNA-Seq techniques that typically generate < 150 bp reads. However, there is a general lack of softw...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7771079/ https://www.ncbi.nlm.nih.gov/pubmed/33372596 http://dx.doi.org/10.1186/s12864-020-07207-4 |
_version_ | 1783629642624663552 |
---|---|
author | Liu, Qian Hu, Yu Stucky, Andres Fang, Li Zhong, Jiang F. Wang, Kai |
author_facet | Liu, Qian Hu, Yu Stucky, Andres Fang, Li Zhong, Jiang F. Wang, Kai |
author_sort | Liu, Qian |
collection | PubMed |
description | BACKGROUND: Long-read RNA-Seq techniques can generate reads that encompass a large proportion or the entire mRNA/cDNA molecules, so they are expected to address inherited limitations of short-read RNA-Seq techniques that typically generate < 150 bp reads. However, there is a general lack of software tools for gene fusion detection from long-read RNA-seq data, which takes into account the high basecalling error rates and the presence of alignment errors. RESULTS: In this study, we developed a fast computational tool, LongGF, to efficiently detect candidate gene fusions from long-read RNA-seq data, including cDNA sequencing data and direct mRNA sequencing data. We evaluated LongGF on tens of simulated long-read RNA-seq datasets, and demonstrated its superior performance in gene fusion detection. We also tested LongGF on a Nanopore direct mRNA sequencing dataset and a PacBio sequencing dataset generated on a mixture of 10 cancer cell lines, and found that LongGF achieved better performance to detect known gene fusions over existing computational tools. Furthermore, we tested LongGF on a Nanopore cDNA sequencing dataset on acute myeloid leukemia, and pinpointed the exact location of a translocation (previously known in cytogenetic resolution) in base resolution, which was further validated by Sanger sequencing. CONCLUSIONS: In summary, LongGF will greatly facilitate the discovery of candidate gene fusion events from long-read RNA-Seq data, especially in cancer samples. LongGF is implemented in C++ and is available at https://github.com/WGLab/LongGF. |
format | Online Article Text |
id | pubmed-7771079 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-77710792020-12-30 LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing Liu, Qian Hu, Yu Stucky, Andres Fang, Li Zhong, Jiang F. Wang, Kai BMC Genomics Research BACKGROUND: Long-read RNA-Seq techniques can generate reads that encompass a large proportion or the entire mRNA/cDNA molecules, so they are expected to address inherited limitations of short-read RNA-Seq techniques that typically generate < 150 bp reads. However, there is a general lack of software tools for gene fusion detection from long-read RNA-seq data, which takes into account the high basecalling error rates and the presence of alignment errors. RESULTS: In this study, we developed a fast computational tool, LongGF, to efficiently detect candidate gene fusions from long-read RNA-seq data, including cDNA sequencing data and direct mRNA sequencing data. We evaluated LongGF on tens of simulated long-read RNA-seq datasets, and demonstrated its superior performance in gene fusion detection. We also tested LongGF on a Nanopore direct mRNA sequencing dataset and a PacBio sequencing dataset generated on a mixture of 10 cancer cell lines, and found that LongGF achieved better performance to detect known gene fusions over existing computational tools. Furthermore, we tested LongGF on a Nanopore cDNA sequencing dataset on acute myeloid leukemia, and pinpointed the exact location of a translocation (previously known in cytogenetic resolution) in base resolution, which was further validated by Sanger sequencing. CONCLUSIONS: In summary, LongGF will greatly facilitate the discovery of candidate gene fusion events from long-read RNA-Seq data, especially in cancer samples. LongGF is implemented in C++ and is available at https://github.com/WGLab/LongGF. BioMed Central 2020-12-29 /pmc/articles/PMC7771079/ /pubmed/33372596 http://dx.doi.org/10.1186/s12864-020-07207-4 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Liu, Qian Hu, Yu Stucky, Andres Fang, Li Zhong, Jiang F. Wang, Kai LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing |
title | LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing |
title_full | LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing |
title_fullStr | LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing |
title_full_unstemmed | LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing |
title_short | LongGF: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing |
title_sort | longgf: computational algorithm and software tool for fast and accurate detection of gene fusions by long-read transcriptome sequencing |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7771079/ https://www.ncbi.nlm.nih.gov/pubmed/33372596 http://dx.doi.org/10.1186/s12864-020-07207-4 |
work_keys_str_mv | AT liuqian longgfcomputationalalgorithmandsoftwaretoolforfastandaccuratedetectionofgenefusionsbylongreadtranscriptomesequencing AT huyu longgfcomputationalalgorithmandsoftwaretoolforfastandaccuratedetectionofgenefusionsbylongreadtranscriptomesequencing AT stuckyandres longgfcomputationalalgorithmandsoftwaretoolforfastandaccuratedetectionofgenefusionsbylongreadtranscriptomesequencing AT fangli longgfcomputationalalgorithmandsoftwaretoolforfastandaccuratedetectionofgenefusionsbylongreadtranscriptomesequencing AT zhongjiangf longgfcomputationalalgorithmandsoftwaretoolforfastandaccuratedetectionofgenefusionsbylongreadtranscriptomesequencing AT wangkai longgfcomputationalalgorithmandsoftwaretoolforfastandaccuratedetectionofgenefusionsbylongreadtranscriptomesequencing |