Cargando…

Performance comparisons between clustering models for reconstructing NGS results from technical replicates

To improve the performance of individual DNA sequencing results, researchers often use replicates from the same individual and various statistical clustering models to reconstruct a high-performance callset. Here, three technical replicates of genome NA12878 were considered and five model types were...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhai, Yue, Bardel, Claire, Vallée, Maxime, Iwaz, Jean, Roy, Pascal
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10060969/
https://www.ncbi.nlm.nih.gov/pubmed/37007945
http://dx.doi.org/10.3389/fgene.2023.1148147
_version_ 1785017198142554112
author Zhai, Yue
Bardel, Claire
Vallée, Maxime
Iwaz, Jean
Roy, Pascal
author_facet Zhai, Yue
Bardel, Claire
Vallée, Maxime
Iwaz, Jean
Roy, Pascal
author_sort Zhai, Yue
collection PubMed
description To improve the performance of individual DNA sequencing results, researchers often use replicates from the same individual and various statistical clustering models to reconstruct a high-performance callset. Here, three technical replicates of genome NA12878 were considered and five model types were compared (consensus, latent class, Gaussian mixture, Kamila–adapted k-means, and random forest) regarding four performance indicators: sensitivity, precision, accuracy, and F1-score. In comparison with no use of a combination model, i) the consensus model improved precision by 0.1%; ii) the latent class model brought 1% precision improvement (97%–98%) without compromising sensitivity (= 98.9%); iii) the Gaussian mixture model and random forest provided callsets with higher precisions (both >99%) but lower sensitivities; iv) Kamila increased precision (>99%) and kept a high sensitivity (98.8%); it showed the best overall performance. According to precision and F1-score indicators, the compared non-supervised clustering models that combine multiple callsets are able to improve sequencing performance vs. previously used supervised models. Among the models compared, the Gaussian mixture model and Kamila offered non-negligible precision and F1-score improvements. These models may be thus recommended for callset reconstruction (from either biological or technical replicates) for diagnostic or precision medicine purposes.
format Online
Article
Text
id pubmed-10060969
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-100609692023-03-31 Performance comparisons between clustering models for reconstructing NGS results from technical replicates Zhai, Yue Bardel, Claire Vallée, Maxime Iwaz, Jean Roy, Pascal Front Genet Genetics To improve the performance of individual DNA sequencing results, researchers often use replicates from the same individual and various statistical clustering models to reconstruct a high-performance callset. Here, three technical replicates of genome NA12878 were considered and five model types were compared (consensus, latent class, Gaussian mixture, Kamila–adapted k-means, and random forest) regarding four performance indicators: sensitivity, precision, accuracy, and F1-score. In comparison with no use of a combination model, i) the consensus model improved precision by 0.1%; ii) the latent class model brought 1% precision improvement (97%–98%) without compromising sensitivity (= 98.9%); iii) the Gaussian mixture model and random forest provided callsets with higher precisions (both >99%) but lower sensitivities; iv) Kamila increased precision (>99%) and kept a high sensitivity (98.8%); it showed the best overall performance. According to precision and F1-score indicators, the compared non-supervised clustering models that combine multiple callsets are able to improve sequencing performance vs. previously used supervised models. Among the models compared, the Gaussian mixture model and Kamila offered non-negligible precision and F1-score improvements. These models may be thus recommended for callset reconstruction (from either biological or technical replicates) for diagnostic or precision medicine purposes. Frontiers Media S.A. 2023-03-16 /pmc/articles/PMC10060969/ /pubmed/37007945 http://dx.doi.org/10.3389/fgene.2023.1148147 Text en Copyright © 2023 Zhai, Bardel, Vallée, Iwaz and Roy. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Zhai, Yue
Bardel, Claire
Vallée, Maxime
Iwaz, Jean
Roy, Pascal
Performance comparisons between clustering models for reconstructing NGS results from technical replicates
title Performance comparisons between clustering models for reconstructing NGS results from technical replicates
title_full Performance comparisons between clustering models for reconstructing NGS results from technical replicates
title_fullStr Performance comparisons between clustering models for reconstructing NGS results from technical replicates
title_full_unstemmed Performance comparisons between clustering models for reconstructing NGS results from technical replicates
title_short Performance comparisons between clustering models for reconstructing NGS results from technical replicates
title_sort performance comparisons between clustering models for reconstructing ngs results from technical replicates
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10060969/
https://www.ncbi.nlm.nih.gov/pubmed/37007945
http://dx.doi.org/10.3389/fgene.2023.1148147
work_keys_str_mv AT zhaiyue performancecomparisonsbetweenclusteringmodelsforreconstructingngsresultsfromtechnicalreplicates
AT bardelclaire performancecomparisonsbetweenclusteringmodelsforreconstructingngsresultsfromtechnicalreplicates
AT valleemaxime performancecomparisonsbetweenclusteringmodelsforreconstructingngsresultsfromtechnicalreplicates
AT iwazjean performancecomparisonsbetweenclusteringmodelsforreconstructingngsresultsfromtechnicalreplicates
AT roypascal performancecomparisonsbetweenclusteringmodelsforreconstructingngsresultsfromtechnicalreplicates