Cargando…
ChiTaH: a fast and accurate tool for identifying known human chimeric sequences from high-throughput sequencing data
Fusion genes or chimeras typically comprise sequences from two different genes. The chimeric RNAs of such joined sequences often serve as cancer drivers. Identifying such driver fusions in a given cancer or complex disease is important for diagnosis and treatment. The advent of next-generation seque...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8633610/ https://www.ncbi.nlm.nih.gov/pubmed/34859212 http://dx.doi.org/10.1093/nargab/lqab112 |
_version_ | 1784607965734502400 |
---|---|
author | Detroja, Rajesh Gorohovski, Alessandro Giwa, Olawumi Baum, Gideon Frenkel-Morgenstern, Milana |
author_facet | Detroja, Rajesh Gorohovski, Alessandro Giwa, Olawumi Baum, Gideon Frenkel-Morgenstern, Milana |
author_sort | Detroja, Rajesh |
collection | PubMed |
description | Fusion genes or chimeras typically comprise sequences from two different genes. The chimeric RNAs of such joined sequences often serve as cancer drivers. Identifying such driver fusions in a given cancer or complex disease is important for diagnosis and treatment. The advent of next-generation sequencing technologies, such as DNA-Seq or RNA-Seq, together with the development of suitable computational tools, has made the global identification of chimeras in tumors possible. However, the testing of over 20 computational methods showed these to be limited in terms of chimera prediction sensitivity, specificity, and accurate quantification of junction reads. These shortcomings motivated us to develop the first ‘reference-based’ approach termed ChiTaH (Chimeric Transcripts from High–throughput sequencing data). ChiTaH uses 43,466 non–redundant known human chimeras as a reference database to map sequencing reads and to accurately identify chimeric reads. We benchmarked ChiTaH and four other methods to identify human chimeras, leveraging both simulated and real sequencing datasets. ChiTaH was found to be the most accurate and fastest method for identifying known human chimeras from simulated and sequencing datasets. Moreover, especially ChiTaH uncovered heterogeneity of the BCR-ABL1 chimera in both bulk and single-cells of the K-562 cell line, which was confirmed experimentally. |
format | Online Article Text |
id | pubmed-8633610 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-86336102021-12-01 ChiTaH: a fast and accurate tool for identifying known human chimeric sequences from high-throughput sequencing data Detroja, Rajesh Gorohovski, Alessandro Giwa, Olawumi Baum, Gideon Frenkel-Morgenstern, Milana NAR Genom Bioinform Methods Article Fusion genes or chimeras typically comprise sequences from two different genes. The chimeric RNAs of such joined sequences often serve as cancer drivers. Identifying such driver fusions in a given cancer or complex disease is important for diagnosis and treatment. The advent of next-generation sequencing technologies, such as DNA-Seq or RNA-Seq, together with the development of suitable computational tools, has made the global identification of chimeras in tumors possible. However, the testing of over 20 computational methods showed these to be limited in terms of chimera prediction sensitivity, specificity, and accurate quantification of junction reads. These shortcomings motivated us to develop the first ‘reference-based’ approach termed ChiTaH (Chimeric Transcripts from High–throughput sequencing data). ChiTaH uses 43,466 non–redundant known human chimeras as a reference database to map sequencing reads and to accurately identify chimeric reads. We benchmarked ChiTaH and four other methods to identify human chimeras, leveraging both simulated and real sequencing datasets. ChiTaH was found to be the most accurate and fastest method for identifying known human chimeras from simulated and sequencing datasets. Moreover, especially ChiTaH uncovered heterogeneity of the BCR-ABL1 chimera in both bulk and single-cells of the K-562 cell line, which was confirmed experimentally. Oxford University Press 2021-11-26 /pmc/articles/PMC8633610/ /pubmed/34859212 http://dx.doi.org/10.1093/nargab/lqab112 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Methods Article Detroja, Rajesh Gorohovski, Alessandro Giwa, Olawumi Baum, Gideon Frenkel-Morgenstern, Milana ChiTaH: a fast and accurate tool for identifying known human chimeric sequences from high-throughput sequencing data |
title | ChiTaH: a fast and accurate tool for identifying known human chimeric sequences from high-throughput sequencing data |
title_full | ChiTaH: a fast and accurate tool for identifying known human chimeric sequences from high-throughput sequencing data |
title_fullStr | ChiTaH: a fast and accurate tool for identifying known human chimeric sequences from high-throughput sequencing data |
title_full_unstemmed | ChiTaH: a fast and accurate tool for identifying known human chimeric sequences from high-throughput sequencing data |
title_short | ChiTaH: a fast and accurate tool for identifying known human chimeric sequences from high-throughput sequencing data |
title_sort | chitah: a fast and accurate tool for identifying known human chimeric sequences from high-throughput sequencing data |
topic | Methods Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8633610/ https://www.ncbi.nlm.nih.gov/pubmed/34859212 http://dx.doi.org/10.1093/nargab/lqab112 |
work_keys_str_mv | AT detrojarajesh chitahafastandaccuratetoolforidentifyingknownhumanchimericsequencesfromhighthroughputsequencingdata AT gorohovskialessandro chitahafastandaccuratetoolforidentifyingknownhumanchimericsequencesfromhighthroughputsequencingdata AT giwaolawumi chitahafastandaccuratetoolforidentifyingknownhumanchimericsequencesfromhighthroughputsequencingdata AT baumgideon chitahafastandaccuratetoolforidentifyingknownhumanchimericsequencesfromhighthroughputsequencingdata AT frenkelmorgensternmilana chitahafastandaccuratetoolforidentifyingknownhumanchimericsequencesfromhighthroughputsequencingdata |