Cargando…

SomaticCombiner: improving the performance of somatic variant calling based on evaluation tests and a consensus approach

It is challenging to identify somatic variants from high-throughput sequence reads due to tumor heterogeneity, sub-clonality, and sequencing artifacts. In this study, we evaluated the performance of eight primary somatic variant callers and multiple ensemble methods using both real and synthetic who...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Mingyi, Luo, Wen, Jones, Kristine, Bian, Xiaopeng, Williams, Russell, Higson, Herbert, Wu, Dongjing, Hicks, Belynda, Yeager, Meredith, Zhu, Bin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7393490/
https://www.ncbi.nlm.nih.gov/pubmed/32732891
http://dx.doi.org/10.1038/s41598-020-69772-8
Descripción
Sumario:It is challenging to identify somatic variants from high-throughput sequence reads due to tumor heterogeneity, sub-clonality, and sequencing artifacts. In this study, we evaluated the performance of eight primary somatic variant callers and multiple ensemble methods using both real and synthetic whole-genome sequencing, whole-exome sequencing, and deep targeted sequencing datasets with the NA12878 cell line. The test results showed that a simple consensus approach can significantly improve performance even with a limited number of callers and is more robust and stable than machine learning based ensemble approaches. To fully exploit the multi-callers, we also developed a software package, SomaticCombiner, that can combine multiple callers and integrates a new variant allelic frequency (VAF) adaptive majority voting approach, which can maintain sensitive detection for variants with low VAFs.