Cargando…

Katdetectr: an R/bioconductor package utilizing unsupervised changepoint analysis for robust kataegis detection

BACKGROUND: Kataegis refers to the occurrence of regional genomic hypermutation in cancer and is a phenomenon that has been observed in a wide range of malignancies. A kataegis locus constitutes a genomic region with a high mutation rate (i.e., a higher frequency of closely interspersed somatic vari...

Descripción completa

Detalles Bibliográficos
Autores principales: Hazelaar, Daan M, van Riet, Job, Hoogstrate, Youri, van de Werken, Harmen J G
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10580377/
https://www.ncbi.nlm.nih.gov/pubmed/37848617
http://dx.doi.org/10.1093/gigascience/giad081
_version_ 1785121931398217728
author Hazelaar, Daan M
van Riet, Job
Hoogstrate, Youri
van de Werken, Harmen J G
author_facet Hazelaar, Daan M
van Riet, Job
Hoogstrate, Youri
van de Werken, Harmen J G
author_sort Hazelaar, Daan M
collection PubMed
description BACKGROUND: Kataegis refers to the occurrence of regional genomic hypermutation in cancer and is a phenomenon that has been observed in a wide range of malignancies. A kataegis locus constitutes a genomic region with a high mutation rate (i.e., a higher frequency of closely interspersed somatic variants than the overall mutational background). It has been shown that kataegis is of biological significance and possibly clinically relevant. Therefore, an accurate and robust workflow for kataegis detection is paramount. FINDINGS: Here we present Katdetectr, an open-source R/Bioconductor-based package for the robust yet flexible and fast detection of kataegis loci in genomic data. In addition, Katdetectr houses functionalities to characterize and visualize kataegis and provides results in a standardized format useful for subsequent analysis. In brief, Katdetectr imports industry-standard formats (MAF, VCF, and VRanges), determines the intermutation distance of the genomic variants, and performs unsupervised changepoint analysis utilizing the Pruned Exact Linear Time search algorithm followed by kataegis calling according to user-defined parameters. We used synthetic data and an a priori labeled pan-cancer dataset of whole-genome sequenced malignancies for the performance evaluation of Katdetectr and 5 publicly available kataegis detection packages. Our performance evaluation shows that Katdetectr is robust regarding tumor mutational burden and shows the fastest mean computation time. Additionally, Katdetectr reveals the highest accuracy (0.99, 0.99) and normalized Matthews correlation coefficient (0.98, 0.92) of all evaluated tools for both datasets. CONCLUSIONS: Katdetectr is a robust workflow for the detection, characterization, and visualization of kataegis and is available on Bioconductor: https://doi.org/doi:10.18129/B9.bioc.katdetectr.
format Online
Article
Text
id pubmed-10580377
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-105803772023-10-18 Katdetectr: an R/bioconductor package utilizing unsupervised changepoint analysis for robust kataegis detection Hazelaar, Daan M van Riet, Job Hoogstrate, Youri van de Werken, Harmen J G Gigascience Technical Note BACKGROUND: Kataegis refers to the occurrence of regional genomic hypermutation in cancer and is a phenomenon that has been observed in a wide range of malignancies. A kataegis locus constitutes a genomic region with a high mutation rate (i.e., a higher frequency of closely interspersed somatic variants than the overall mutational background). It has been shown that kataegis is of biological significance and possibly clinically relevant. Therefore, an accurate and robust workflow for kataegis detection is paramount. FINDINGS: Here we present Katdetectr, an open-source R/Bioconductor-based package for the robust yet flexible and fast detection of kataegis loci in genomic data. In addition, Katdetectr houses functionalities to characterize and visualize kataegis and provides results in a standardized format useful for subsequent analysis. In brief, Katdetectr imports industry-standard formats (MAF, VCF, and VRanges), determines the intermutation distance of the genomic variants, and performs unsupervised changepoint analysis utilizing the Pruned Exact Linear Time search algorithm followed by kataegis calling according to user-defined parameters. We used synthetic data and an a priori labeled pan-cancer dataset of whole-genome sequenced malignancies for the performance evaluation of Katdetectr and 5 publicly available kataegis detection packages. Our performance evaluation shows that Katdetectr is robust regarding tumor mutational burden and shows the fastest mean computation time. Additionally, Katdetectr reveals the highest accuracy (0.99, 0.99) and normalized Matthews correlation coefficient (0.98, 0.92) of all evaluated tools for both datasets. CONCLUSIONS: Katdetectr is a robust workflow for the detection, characterization, and visualization of kataegis and is available on Bioconductor: https://doi.org/doi:10.18129/B9.bioc.katdetectr. Oxford University Press 2023-10-17 /pmc/articles/PMC10580377/ /pubmed/37848617 http://dx.doi.org/10.1093/gigascience/giad081 Text en © The Author(s) 2023. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Hazelaar, Daan M
van Riet, Job
Hoogstrate, Youri
van de Werken, Harmen J G
Katdetectr: an R/bioconductor package utilizing unsupervised changepoint analysis for robust kataegis detection
title Katdetectr: an R/bioconductor package utilizing unsupervised changepoint analysis for robust kataegis detection
title_full Katdetectr: an R/bioconductor package utilizing unsupervised changepoint analysis for robust kataegis detection
title_fullStr Katdetectr: an R/bioconductor package utilizing unsupervised changepoint analysis for robust kataegis detection
title_full_unstemmed Katdetectr: an R/bioconductor package utilizing unsupervised changepoint analysis for robust kataegis detection
title_short Katdetectr: an R/bioconductor package utilizing unsupervised changepoint analysis for robust kataegis detection
title_sort katdetectr: an r/bioconductor package utilizing unsupervised changepoint analysis for robust kataegis detection
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10580377/
https://www.ncbi.nlm.nih.gov/pubmed/37848617
http://dx.doi.org/10.1093/gigascience/giad081
work_keys_str_mv AT hazelaardaanm katdetectranrbioconductorpackageutilizingunsupervisedchangepointanalysisforrobustkataegisdetection
AT vanrietjob katdetectranrbioconductorpackageutilizingunsupervisedchangepointanalysisforrobustkataegisdetection
AT hoogstrateyouri katdetectranrbioconductorpackageutilizingunsupervisedchangepointanalysisforrobustkataegisdetection
AT vandewerkenharmenjg katdetectranrbioconductorpackageutilizingunsupervisedchangepointanalysisforrobustkataegisdetection