Cargando…

A graph-based algorithm for estimating clonal haplotypes of tumor sample from sequencing data

BACKGROUND: Haplotype phasing is an important step in many bioinformatics workflows. In cancer genomics, it is suggested that reconstructing the clonal haplotypes of a tumor sample could facilitate a comprehensive understanding of its clonal architecture and further provide valuable reference in cli...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Yixuan, Zhang, Xuanping, Ding, Shuai, Geng, Yu, Liu, Jianye, Zhao, Zhongmeng, Zhang, Rong, Xiao, Xiao, Wang, Jiayin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6357344/
https://www.ncbi.nlm.nih.gov/pubmed/30704456
http://dx.doi.org/10.1186/s12920-018-0457-4
_version_ 1783391764549206016
author Wang, Yixuan
Zhang, Xuanping
Ding, Shuai
Geng, Yu
Liu, Jianye
Zhao, Zhongmeng
Zhang, Rong
Xiao, Xiao
Wang, Jiayin
author_facet Wang, Yixuan
Zhang, Xuanping
Ding, Shuai
Geng, Yu
Liu, Jianye
Zhao, Zhongmeng
Zhang, Rong
Xiao, Xiao
Wang, Jiayin
author_sort Wang, Yixuan
collection PubMed
description BACKGROUND: Haplotype phasing is an important step in many bioinformatics workflows. In cancer genomics, it is suggested that reconstructing the clonal haplotypes of a tumor sample could facilitate a comprehensive understanding of its clonal architecture and further provide valuable reference in clinical diagnosis and treatment. However, the sequencing data is an admixture of reads sampled from different clonal haplotypes, which complicates the computational problem by exponentially increasing the solution-space and leads the existing algorithms to an unacceptable time-/space- complexity. In addition, the evolutionary process among clonal haplotypes further weakens those algorithms by bringing indistinguishable candidate solutions. RESULTS: To improve the algorithmic performance of phasing clonal haplotypes, in this article, we propose MixSubHap, which is a graph-based computational pipeline working on cancer sequencing data. To reduce the computation complexity, MixSubHap adopts three bounding strategies to limit the solution space and filter out false positive candidates. It first estimates the global clonal structure by clustering the variant allelic frequencies on sampled point mutations. This offers a priori on the number of clonal haplotypes when copy-number variations are not considered. Then, it utilizes a greedy extension algorithm to approximately find the longest linkage of the locally assembled contigs. Finally, it incorporates a read-depth stripping algorithm to filter out false linkages according to the posterior estimation of tumor purity and the estimated percentage of each sub-clone in the sample. A series of experiments are conducted to verify the performance of the proposed pipeline. CONCLUSIONS: The results demonstrate that MixSubHap is able to identify about 90% on average of the preset clonal haplotypes under different simulation configurations. Especially, MixSubHap is robust when decreasing the mutation rates, in which cases the longest assembled contig could reach to 10kbps, while the accuracy of assigning a mutation to its haplotype still keeps more than 60% on average. MixSubHap is considered as a practical algorithm to reconstruct clonal haplotypes from cancer sequencing data. The source codes have been uploaded and maintained at https://github.com/YixuanWang1120/MixSubHap for academic use only.
format Online
Article
Text
id pubmed-6357344
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63573442019-02-07 A graph-based algorithm for estimating clonal haplotypes of tumor sample from sequencing data Wang, Yixuan Zhang, Xuanping Ding, Shuai Geng, Yu Liu, Jianye Zhao, Zhongmeng Zhang, Rong Xiao, Xiao Wang, Jiayin BMC Med Genomics Research BACKGROUND: Haplotype phasing is an important step in many bioinformatics workflows. In cancer genomics, it is suggested that reconstructing the clonal haplotypes of a tumor sample could facilitate a comprehensive understanding of its clonal architecture and further provide valuable reference in clinical diagnosis and treatment. However, the sequencing data is an admixture of reads sampled from different clonal haplotypes, which complicates the computational problem by exponentially increasing the solution-space and leads the existing algorithms to an unacceptable time-/space- complexity. In addition, the evolutionary process among clonal haplotypes further weakens those algorithms by bringing indistinguishable candidate solutions. RESULTS: To improve the algorithmic performance of phasing clonal haplotypes, in this article, we propose MixSubHap, which is a graph-based computational pipeline working on cancer sequencing data. To reduce the computation complexity, MixSubHap adopts three bounding strategies to limit the solution space and filter out false positive candidates. It first estimates the global clonal structure by clustering the variant allelic frequencies on sampled point mutations. This offers a priori on the number of clonal haplotypes when copy-number variations are not considered. Then, it utilizes a greedy extension algorithm to approximately find the longest linkage of the locally assembled contigs. Finally, it incorporates a read-depth stripping algorithm to filter out false linkages according to the posterior estimation of tumor purity and the estimated percentage of each sub-clone in the sample. A series of experiments are conducted to verify the performance of the proposed pipeline. CONCLUSIONS: The results demonstrate that MixSubHap is able to identify about 90% on average of the preset clonal haplotypes under different simulation configurations. Especially, MixSubHap is robust when decreasing the mutation rates, in which cases the longest assembled contig could reach to 10kbps, while the accuracy of assigning a mutation to its haplotype still keeps more than 60% on average. MixSubHap is considered as a practical algorithm to reconstruct clonal haplotypes from cancer sequencing data. The source codes have been uploaded and maintained at https://github.com/YixuanWang1120/MixSubHap for academic use only. BioMed Central 2019-01-31 /pmc/articles/PMC6357344/ /pubmed/30704456 http://dx.doi.org/10.1186/s12920-018-0457-4 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Wang, Yixuan
Zhang, Xuanping
Ding, Shuai
Geng, Yu
Liu, Jianye
Zhao, Zhongmeng
Zhang, Rong
Xiao, Xiao
Wang, Jiayin
A graph-based algorithm for estimating clonal haplotypes of tumor sample from sequencing data
title A graph-based algorithm for estimating clonal haplotypes of tumor sample from sequencing data
title_full A graph-based algorithm for estimating clonal haplotypes of tumor sample from sequencing data
title_fullStr A graph-based algorithm for estimating clonal haplotypes of tumor sample from sequencing data
title_full_unstemmed A graph-based algorithm for estimating clonal haplotypes of tumor sample from sequencing data
title_short A graph-based algorithm for estimating clonal haplotypes of tumor sample from sequencing data
title_sort graph-based algorithm for estimating clonal haplotypes of tumor sample from sequencing data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6357344/
https://www.ncbi.nlm.nih.gov/pubmed/30704456
http://dx.doi.org/10.1186/s12920-018-0457-4
work_keys_str_mv AT wangyixuan agraphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata
AT zhangxuanping agraphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata
AT dingshuai agraphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata
AT gengyu agraphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata
AT liujianye agraphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata
AT zhaozhongmeng agraphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata
AT zhangrong agraphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata
AT xiaoxiao agraphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata
AT wangjiayin agraphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata
AT wangyixuan graphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata
AT zhangxuanping graphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata
AT dingshuai graphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata
AT gengyu graphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata
AT liujianye graphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata
AT zhaozhongmeng graphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata
AT zhangrong graphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata
AT xiaoxiao graphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata
AT wangjiayin graphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata