Cargando…
Detecting transcriptomic structural variants in heterogeneous contexts via the Multiple Compatible Arrangements Problem
BACKGROUND: Transcriptomic structural variants (TSVs)—large-scale transcriptome sequence change due to structural variation - are common in cancer. TSV detection from high-throughput sequencing data is a computationally challenging problem. Among all the confounding factors, sample heterogeneity, wh...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7227063/ https://www.ncbi.nlm.nih.gov/pubmed/32467720 http://dx.doi.org/10.1186/s13015-020-00170-5 |
Sumario: | BACKGROUND: Transcriptomic structural variants (TSVs)—large-scale transcriptome sequence change due to structural variation - are common in cancer. TSV detection from high-throughput sequencing data is a computationally challenging problem. Among all the confounding factors, sample heterogeneity, where each sample contains multiple distinct alleles, poses a critical obstacle to accurate TSV prediction. RESULTS: To improve TSV detection in heterogeneous RNA-seq samples, we introduce the Multiple Compatible Arrangements Problem (MCAP), which seeks k genome arrangements that maximize the number of reads that are concordant with at least one arrangement. This models a heterogeneous or diploid sample. We prove that MCAP is NP-complete and provide a [Formula: see text] -approximation algorithm for [Formula: see text] and a [Formula: see text] -approximation algorithm for the diploid case ([Formula: see text] ) assuming an oracle for [Formula: see text] . Combining these, we obtain a [Formula: see text] -approximation algorithm for MCAP when [Formula: see text] (without an oracle). We also present an integer linear programming formulation for general k. We characterize the conflict structures in the graph that require [Formula: see text] alleles to satisfy read concordancy and show that such structures are prevalent. CONCLUSIONS: We show that the solution to MCAP accurately addresses sample heterogeneity during TSV detection. Our algorithms have improved performance on TCGA cancer samples and cancer cell line samples compared to a TSV calling tool, SQUID. The software is available at https://github.com/Kingsford-Group/diploidsquid. |
---|