Cargando…
A graph-based algorithm for estimating clonal haplotypes of tumor sample from sequencing data
BACKGROUND: Haplotype phasing is an important step in many bioinformatics workflows. In cancer genomics, it is suggested that reconstructing the clonal haplotypes of a tumor sample could facilitate a comprehensive understanding of its clonal architecture and further provide valuable reference in cli...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6357344/ https://www.ncbi.nlm.nih.gov/pubmed/30704456 http://dx.doi.org/10.1186/s12920-018-0457-4 |
_version_ | 1783391764549206016 |
---|---|
author | Wang, Yixuan Zhang, Xuanping Ding, Shuai Geng, Yu Liu, Jianye Zhao, Zhongmeng Zhang, Rong Xiao, Xiao Wang, Jiayin |
author_facet | Wang, Yixuan Zhang, Xuanping Ding, Shuai Geng, Yu Liu, Jianye Zhao, Zhongmeng Zhang, Rong Xiao, Xiao Wang, Jiayin |
author_sort | Wang, Yixuan |
collection | PubMed |
description | BACKGROUND: Haplotype phasing is an important step in many bioinformatics workflows. In cancer genomics, it is suggested that reconstructing the clonal haplotypes of a tumor sample could facilitate a comprehensive understanding of its clonal architecture and further provide valuable reference in clinical diagnosis and treatment. However, the sequencing data is an admixture of reads sampled from different clonal haplotypes, which complicates the computational problem by exponentially increasing the solution-space and leads the existing algorithms to an unacceptable time-/space- complexity. In addition, the evolutionary process among clonal haplotypes further weakens those algorithms by bringing indistinguishable candidate solutions. RESULTS: To improve the algorithmic performance of phasing clonal haplotypes, in this article, we propose MixSubHap, which is a graph-based computational pipeline working on cancer sequencing data. To reduce the computation complexity, MixSubHap adopts three bounding strategies to limit the solution space and filter out false positive candidates. It first estimates the global clonal structure by clustering the variant allelic frequencies on sampled point mutations. This offers a priori on the number of clonal haplotypes when copy-number variations are not considered. Then, it utilizes a greedy extension algorithm to approximately find the longest linkage of the locally assembled contigs. Finally, it incorporates a read-depth stripping algorithm to filter out false linkages according to the posterior estimation of tumor purity and the estimated percentage of each sub-clone in the sample. A series of experiments are conducted to verify the performance of the proposed pipeline. CONCLUSIONS: The results demonstrate that MixSubHap is able to identify about 90% on average of the preset clonal haplotypes under different simulation configurations. Especially, MixSubHap is robust when decreasing the mutation rates, in which cases the longest assembled contig could reach to 10kbps, while the accuracy of assigning a mutation to its haplotype still keeps more than 60% on average. MixSubHap is considered as a practical algorithm to reconstruct clonal haplotypes from cancer sequencing data. The source codes have been uploaded and maintained at https://github.com/YixuanWang1120/MixSubHap for academic use only. |
format | Online Article Text |
id | pubmed-6357344 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-63573442019-02-07 A graph-based algorithm for estimating clonal haplotypes of tumor sample from sequencing data Wang, Yixuan Zhang, Xuanping Ding, Shuai Geng, Yu Liu, Jianye Zhao, Zhongmeng Zhang, Rong Xiao, Xiao Wang, Jiayin BMC Med Genomics Research BACKGROUND: Haplotype phasing is an important step in many bioinformatics workflows. In cancer genomics, it is suggested that reconstructing the clonal haplotypes of a tumor sample could facilitate a comprehensive understanding of its clonal architecture and further provide valuable reference in clinical diagnosis and treatment. However, the sequencing data is an admixture of reads sampled from different clonal haplotypes, which complicates the computational problem by exponentially increasing the solution-space and leads the existing algorithms to an unacceptable time-/space- complexity. In addition, the evolutionary process among clonal haplotypes further weakens those algorithms by bringing indistinguishable candidate solutions. RESULTS: To improve the algorithmic performance of phasing clonal haplotypes, in this article, we propose MixSubHap, which is a graph-based computational pipeline working on cancer sequencing data. To reduce the computation complexity, MixSubHap adopts three bounding strategies to limit the solution space and filter out false positive candidates. It first estimates the global clonal structure by clustering the variant allelic frequencies on sampled point mutations. This offers a priori on the number of clonal haplotypes when copy-number variations are not considered. Then, it utilizes a greedy extension algorithm to approximately find the longest linkage of the locally assembled contigs. Finally, it incorporates a read-depth stripping algorithm to filter out false linkages according to the posterior estimation of tumor purity and the estimated percentage of each sub-clone in the sample. A series of experiments are conducted to verify the performance of the proposed pipeline. CONCLUSIONS: The results demonstrate that MixSubHap is able to identify about 90% on average of the preset clonal haplotypes under different simulation configurations. Especially, MixSubHap is robust when decreasing the mutation rates, in which cases the longest assembled contig could reach to 10kbps, while the accuracy of assigning a mutation to its haplotype still keeps more than 60% on average. MixSubHap is considered as a practical algorithm to reconstruct clonal haplotypes from cancer sequencing data. The source codes have been uploaded and maintained at https://github.com/YixuanWang1120/MixSubHap for academic use only. BioMed Central 2019-01-31 /pmc/articles/PMC6357344/ /pubmed/30704456 http://dx.doi.org/10.1186/s12920-018-0457-4 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Wang, Yixuan Zhang, Xuanping Ding, Shuai Geng, Yu Liu, Jianye Zhao, Zhongmeng Zhang, Rong Xiao, Xiao Wang, Jiayin A graph-based algorithm for estimating clonal haplotypes of tumor sample from sequencing data |
title | A graph-based algorithm for estimating clonal haplotypes of tumor sample from sequencing data |
title_full | A graph-based algorithm for estimating clonal haplotypes of tumor sample from sequencing data |
title_fullStr | A graph-based algorithm for estimating clonal haplotypes of tumor sample from sequencing data |
title_full_unstemmed | A graph-based algorithm for estimating clonal haplotypes of tumor sample from sequencing data |
title_short | A graph-based algorithm for estimating clonal haplotypes of tumor sample from sequencing data |
title_sort | graph-based algorithm for estimating clonal haplotypes of tumor sample from sequencing data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6357344/ https://www.ncbi.nlm.nih.gov/pubmed/30704456 http://dx.doi.org/10.1186/s12920-018-0457-4 |
work_keys_str_mv | AT wangyixuan agraphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata AT zhangxuanping agraphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata AT dingshuai agraphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata AT gengyu agraphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata AT liujianye agraphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata AT zhaozhongmeng agraphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata AT zhangrong agraphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata AT xiaoxiao agraphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata AT wangjiayin agraphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata AT wangyixuan graphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata AT zhangxuanping graphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata AT dingshuai graphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata AT gengyu graphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata AT liujianye graphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata AT zhaozhongmeng graphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata AT zhangrong graphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata AT xiaoxiao graphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata AT wangjiayin graphbasedalgorithmforestimatingclonalhaplotypesoftumorsamplefromsequencingdata |