Cargando…

DeltaMSI: artificial intelligence-based modeling of microsatellite instability scoring on next-generation sequencing data

BACKGROUND: DNA mismatch repair deficiency (dMMR) testing is crucial for detection of microsatellite unstable (MSI) tumors. MSI is detected by aberrant indel length distributions of microsatellite markers, either by visual inspection of PCR-fragment length profiles or by automated bioinformatic scor...

Descripción completa

Detalles Bibliográficos
Autores principales: Swaerts, Koen, Dedeurwaerdere, Franceska, De Smet, Dieter, De Jaeger, Peter, Martens, Geert A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9976396/
https://www.ncbi.nlm.nih.gov/pubmed/36859168
http://dx.doi.org/10.1186/s12859-023-05186-3
_version_ 1784899062050324480
author Swaerts, Koen
Dedeurwaerdere, Franceska
De Smet, Dieter
De Jaeger, Peter
Martens, Geert A.
author_facet Swaerts, Koen
Dedeurwaerdere, Franceska
De Smet, Dieter
De Jaeger, Peter
Martens, Geert A.
author_sort Swaerts, Koen
collection PubMed
description BACKGROUND: DNA mismatch repair deficiency (dMMR) testing is crucial for detection of microsatellite unstable (MSI) tumors. MSI is detected by aberrant indel length distributions of microsatellite markers, either by visual inspection of PCR-fragment length profiles or by automated bioinformatic scoring on next-generation sequencing (NGS) data. The former is time-consuming and low-throughput while the latter typically relies on simplified binary scoring of a single parameter of the indel distribution. The purpose of this study was to use machine learning to process the full complexity of indel distributions and integrate it into a robust script for screening of dMMR on small gene panel-based NGS data of clinical tumor samples without paired normal tissue. METHODS: Scikit-learn was used to train 7 models on normalized read depth data of 36 microsatellite loci in a cohort of 133 MMR proficient (pMMR) and 46 dMMR tumor samples, taking loss of MLH1/MSH2/PMS2/MSH6 protein expression as reference method. After selection of the optimal model and microsatellite panel the two top-performing models per locus (logistic regression and support vector machine) were integrated into a novel script (DeltaMSI) for combined prediction of MSI status on 28 marker loci at sample level. Diagnostic performance of DeltaMSI was compared to that of mSINGS, a widely used script for MSI detection on unpaired tumor samples. The robustness of DeltaMSI was evaluated on 1072 unselected, consecutive solid tumor samples in a real-world setting sequenced using capture chemistry, and 116 solid tumor samples sequenced by amplicon chemistry. Likelihood ratios were used to select result intervals with clinical validity. RESULTS: DeltaMSI achieved higher robustness at equal diagnostic power (AUC = 0.950; 95% CI 0.910–0.975) as compared to mSINGS (AUC = 0.876; 95% CI 0.823–0.918). Its sensitivity of 90% at 100% specificity indicated its clinical potential for high-throughput MSI screening in all tumor types. Clinical Trial Number/IRB B1172020000040, Ethical Committee, AZ Delta General Hospital. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05186-3.
format Online
Article
Text
id pubmed-9976396
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-99763962023-03-02 DeltaMSI: artificial intelligence-based modeling of microsatellite instability scoring on next-generation sequencing data Swaerts, Koen Dedeurwaerdere, Franceska De Smet, Dieter De Jaeger, Peter Martens, Geert A. BMC Bioinformatics Research BACKGROUND: DNA mismatch repair deficiency (dMMR) testing is crucial for detection of microsatellite unstable (MSI) tumors. MSI is detected by aberrant indel length distributions of microsatellite markers, either by visual inspection of PCR-fragment length profiles or by automated bioinformatic scoring on next-generation sequencing (NGS) data. The former is time-consuming and low-throughput while the latter typically relies on simplified binary scoring of a single parameter of the indel distribution. The purpose of this study was to use machine learning to process the full complexity of indel distributions and integrate it into a robust script for screening of dMMR on small gene panel-based NGS data of clinical tumor samples without paired normal tissue. METHODS: Scikit-learn was used to train 7 models on normalized read depth data of 36 microsatellite loci in a cohort of 133 MMR proficient (pMMR) and 46 dMMR tumor samples, taking loss of MLH1/MSH2/PMS2/MSH6 protein expression as reference method. After selection of the optimal model and microsatellite panel the two top-performing models per locus (logistic regression and support vector machine) were integrated into a novel script (DeltaMSI) for combined prediction of MSI status on 28 marker loci at sample level. Diagnostic performance of DeltaMSI was compared to that of mSINGS, a widely used script for MSI detection on unpaired tumor samples. The robustness of DeltaMSI was evaluated on 1072 unselected, consecutive solid tumor samples in a real-world setting sequenced using capture chemistry, and 116 solid tumor samples sequenced by amplicon chemistry. Likelihood ratios were used to select result intervals with clinical validity. RESULTS: DeltaMSI achieved higher robustness at equal diagnostic power (AUC = 0.950; 95% CI 0.910–0.975) as compared to mSINGS (AUC = 0.876; 95% CI 0.823–0.918). Its sensitivity of 90% at 100% specificity indicated its clinical potential for high-throughput MSI screening in all tumor types. Clinical Trial Number/IRB B1172020000040, Ethical Committee, AZ Delta General Hospital. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05186-3. BioMed Central 2023-03-01 /pmc/articles/PMC9976396/ /pubmed/36859168 http://dx.doi.org/10.1186/s12859-023-05186-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Swaerts, Koen
Dedeurwaerdere, Franceska
De Smet, Dieter
De Jaeger, Peter
Martens, Geert A.
DeltaMSI: artificial intelligence-based modeling of microsatellite instability scoring on next-generation sequencing data
title DeltaMSI: artificial intelligence-based modeling of microsatellite instability scoring on next-generation sequencing data
title_full DeltaMSI: artificial intelligence-based modeling of microsatellite instability scoring on next-generation sequencing data
title_fullStr DeltaMSI: artificial intelligence-based modeling of microsatellite instability scoring on next-generation sequencing data
title_full_unstemmed DeltaMSI: artificial intelligence-based modeling of microsatellite instability scoring on next-generation sequencing data
title_short DeltaMSI: artificial intelligence-based modeling of microsatellite instability scoring on next-generation sequencing data
title_sort deltamsi: artificial intelligence-based modeling of microsatellite instability scoring on next-generation sequencing data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9976396/
https://www.ncbi.nlm.nih.gov/pubmed/36859168
http://dx.doi.org/10.1186/s12859-023-05186-3
work_keys_str_mv AT swaertskoen deltamsiartificialintelligencebasedmodelingofmicrosatelliteinstabilityscoringonnextgenerationsequencingdata
AT dedeurwaerderefranceska deltamsiartificialintelligencebasedmodelingofmicrosatelliteinstabilityscoringonnextgenerationsequencingdata
AT desmetdieter deltamsiartificialintelligencebasedmodelingofmicrosatelliteinstabilityscoringonnextgenerationsequencingdata
AT dejaegerpeter deltamsiartificialintelligencebasedmodelingofmicrosatelliteinstabilityscoringonnextgenerationsequencingdata
AT martensgeerta deltamsiartificialintelligencebasedmodelingofmicrosatelliteinstabilityscoringonnextgenerationsequencingdata