Cargando…

ChiTaH: a fast and accurate tool for identifying known human chimeric sequences from high-throughput sequencing data

Fusion genes or chimeras typically comprise sequences from two different genes. The chimeric RNAs of such joined sequences often serve as cancer drivers. Identifying such driver fusions in a given cancer or complex disease is important for diagnosis and treatment. The advent of next-generation seque...

Descripción completa

Detalles Bibliográficos
Autores principales: Detroja, Rajesh, Gorohovski, Alessandro, Giwa, Olawumi, Baum, Gideon, Frenkel-Morgenstern, Milana
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8633610/
https://www.ncbi.nlm.nih.gov/pubmed/34859212
http://dx.doi.org/10.1093/nargab/lqab112
_version_ 1784607965734502400
author Detroja, Rajesh
Gorohovski, Alessandro
Giwa, Olawumi
Baum, Gideon
Frenkel-Morgenstern, Milana
author_facet Detroja, Rajesh
Gorohovski, Alessandro
Giwa, Olawumi
Baum, Gideon
Frenkel-Morgenstern, Milana
author_sort Detroja, Rajesh
collection PubMed
description Fusion genes or chimeras typically comprise sequences from two different genes. The chimeric RNAs of such joined sequences often serve as cancer drivers. Identifying such driver fusions in a given cancer or complex disease is important for diagnosis and treatment. The advent of next-generation sequencing technologies, such as DNA-Seq or RNA-Seq, together with the development of suitable computational tools, has made the global identification of chimeras in tumors possible. However, the testing of over 20 computational methods showed these to be limited in terms of chimera prediction sensitivity, specificity, and accurate quantification of junction reads. These shortcomings motivated us to develop the first ‘reference-based’ approach termed ChiTaH (Chimeric Transcripts from High–throughput sequencing data). ChiTaH uses 43,466 non–redundant known human chimeras as a reference database to map sequencing reads and to accurately identify chimeric reads. We benchmarked ChiTaH and four other methods to identify human chimeras, leveraging both simulated and real sequencing datasets. ChiTaH was found to be the most accurate and fastest method for identifying known human chimeras from simulated and sequencing datasets. Moreover, especially ChiTaH uncovered heterogeneity of the BCR-ABL1 chimera in both bulk and single-cells of the K-562 cell line, which was confirmed experimentally.
format Online
Article
Text
id pubmed-8633610
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-86336102021-12-01 ChiTaH: a fast and accurate tool for identifying known human chimeric sequences from high-throughput sequencing data Detroja, Rajesh Gorohovski, Alessandro Giwa, Olawumi Baum, Gideon Frenkel-Morgenstern, Milana NAR Genom Bioinform Methods Article Fusion genes or chimeras typically comprise sequences from two different genes. The chimeric RNAs of such joined sequences often serve as cancer drivers. Identifying such driver fusions in a given cancer or complex disease is important for diagnosis and treatment. The advent of next-generation sequencing technologies, such as DNA-Seq or RNA-Seq, together with the development of suitable computational tools, has made the global identification of chimeras in tumors possible. However, the testing of over 20 computational methods showed these to be limited in terms of chimera prediction sensitivity, specificity, and accurate quantification of junction reads. These shortcomings motivated us to develop the first ‘reference-based’ approach termed ChiTaH (Chimeric Transcripts from High–throughput sequencing data). ChiTaH uses 43,466 non–redundant known human chimeras as a reference database to map sequencing reads and to accurately identify chimeric reads. We benchmarked ChiTaH and four other methods to identify human chimeras, leveraging both simulated and real sequencing datasets. ChiTaH was found to be the most accurate and fastest method for identifying known human chimeras from simulated and sequencing datasets. Moreover, especially ChiTaH uncovered heterogeneity of the BCR-ABL1 chimera in both bulk and single-cells of the K-562 cell line, which was confirmed experimentally. Oxford University Press 2021-11-26 /pmc/articles/PMC8633610/ /pubmed/34859212 http://dx.doi.org/10.1093/nargab/lqab112 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Article
Detroja, Rajesh
Gorohovski, Alessandro
Giwa, Olawumi
Baum, Gideon
Frenkel-Morgenstern, Milana
ChiTaH: a fast and accurate tool for identifying known human chimeric sequences from high-throughput sequencing data
title ChiTaH: a fast and accurate tool for identifying known human chimeric sequences from high-throughput sequencing data
title_full ChiTaH: a fast and accurate tool for identifying known human chimeric sequences from high-throughput sequencing data
title_fullStr ChiTaH: a fast and accurate tool for identifying known human chimeric sequences from high-throughput sequencing data
title_full_unstemmed ChiTaH: a fast and accurate tool for identifying known human chimeric sequences from high-throughput sequencing data
title_short ChiTaH: a fast and accurate tool for identifying known human chimeric sequences from high-throughput sequencing data
title_sort chitah: a fast and accurate tool for identifying known human chimeric sequences from high-throughput sequencing data
topic Methods Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8633610/
https://www.ncbi.nlm.nih.gov/pubmed/34859212
http://dx.doi.org/10.1093/nargab/lqab112
work_keys_str_mv AT detrojarajesh chitahafastandaccuratetoolforidentifyingknownhumanchimericsequencesfromhighthroughputsequencingdata
AT gorohovskialessandro chitahafastandaccuratetoolforidentifyingknownhumanchimericsequencesfromhighthroughputsequencingdata
AT giwaolawumi chitahafastandaccuratetoolforidentifyingknownhumanchimericsequencesfromhighthroughputsequencingdata
AT baumgideon chitahafastandaccuratetoolforidentifyingknownhumanchimericsequencesfromhighthroughputsequencingdata
AT frenkelmorgensternmilana chitahafastandaccuratetoolforidentifyingknownhumanchimericsequencesfromhighthroughputsequencingdata