Cargando…

Detecting transcriptomic structural variants in heterogeneous contexts via the Multiple Compatible Arrangements Problem

BACKGROUND: Transcriptomic structural variants (TSVs)—large-scale transcriptome sequence change due to structural variation - are common in cancer. TSV detection from high-throughput sequencing data is a computationally challenging problem. Among all the confounding factors, sample heterogeneity, wh...

Descripción completa

Detalles Bibliográficos
Autores principales: Qiu, Yutong, Ma, Cong, Xie, Han, Kingsford, Carl
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7227063/
https://www.ncbi.nlm.nih.gov/pubmed/32467720
http://dx.doi.org/10.1186/s13015-020-00170-5
Descripción
Sumario:BACKGROUND: Transcriptomic structural variants (TSVs)—large-scale transcriptome sequence change due to structural variation - are common in cancer. TSV detection from high-throughput sequencing data is a computationally challenging problem. Among all the confounding factors, sample heterogeneity, where each sample contains multiple distinct alleles, poses a critical obstacle to accurate TSV prediction. RESULTS: To improve TSV detection in heterogeneous RNA-seq samples, we introduce the Multiple Compatible Arrangements Problem (MCAP), which seeks k genome arrangements that maximize the number of reads that are concordant with at least one arrangement. This models a heterogeneous or diploid sample. We prove that MCAP is NP-complete and provide a [Formula: see text] -approximation algorithm for [Formula: see text] and a [Formula: see text] -approximation algorithm for the diploid case ([Formula: see text] ) assuming an oracle for [Formula: see text] . Combining these, we obtain a [Formula: see text] -approximation algorithm for MCAP when [Formula: see text] (without an oracle). We also present an integer linear programming formulation for general k. We characterize the conflict structures in the graph that require [Formula: see text] alleles to satisfy read concordancy and show that such structures are prevalent. CONCLUSIONS: We show that the solution to MCAP accurately addresses sample heterogeneity during TSV detection. Our algorithms have improved performance on TCGA cancer samples and cancer cell line samples compared to a TSV calling tool, SQUID. The software is available at https://github.com/Kingsford-Group/diploidsquid.