Cargando…
Performance comparisons between clustering models for reconstructing NGS results from technical replicates
To improve the performance of individual DNA sequencing results, researchers often use replicates from the same individual and various statistical clustering models to reconstruct a high-performance callset. Here, three technical replicates of genome NA12878 were considered and five model types were...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10060969/ https://www.ncbi.nlm.nih.gov/pubmed/37007945 http://dx.doi.org/10.3389/fgene.2023.1148147 |
_version_ | 1785017198142554112 |
---|---|
author | Zhai, Yue Bardel, Claire Vallée, Maxime Iwaz, Jean Roy, Pascal |
author_facet | Zhai, Yue Bardel, Claire Vallée, Maxime Iwaz, Jean Roy, Pascal |
author_sort | Zhai, Yue |
collection | PubMed |
description | To improve the performance of individual DNA sequencing results, researchers often use replicates from the same individual and various statistical clustering models to reconstruct a high-performance callset. Here, three technical replicates of genome NA12878 were considered and five model types were compared (consensus, latent class, Gaussian mixture, Kamila–adapted k-means, and random forest) regarding four performance indicators: sensitivity, precision, accuracy, and F1-score. In comparison with no use of a combination model, i) the consensus model improved precision by 0.1%; ii) the latent class model brought 1% precision improvement (97%–98%) without compromising sensitivity (= 98.9%); iii) the Gaussian mixture model and random forest provided callsets with higher precisions (both >99%) but lower sensitivities; iv) Kamila increased precision (>99%) and kept a high sensitivity (98.8%); it showed the best overall performance. According to precision and F1-score indicators, the compared non-supervised clustering models that combine multiple callsets are able to improve sequencing performance vs. previously used supervised models. Among the models compared, the Gaussian mixture model and Kamila offered non-negligible precision and F1-score improvements. These models may be thus recommended for callset reconstruction (from either biological or technical replicates) for diagnostic or precision medicine purposes. |
format | Online Article Text |
id | pubmed-10060969 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-100609692023-03-31 Performance comparisons between clustering models for reconstructing NGS results from technical replicates Zhai, Yue Bardel, Claire Vallée, Maxime Iwaz, Jean Roy, Pascal Front Genet Genetics To improve the performance of individual DNA sequencing results, researchers often use replicates from the same individual and various statistical clustering models to reconstruct a high-performance callset. Here, three technical replicates of genome NA12878 were considered and five model types were compared (consensus, latent class, Gaussian mixture, Kamila–adapted k-means, and random forest) regarding four performance indicators: sensitivity, precision, accuracy, and F1-score. In comparison with no use of a combination model, i) the consensus model improved precision by 0.1%; ii) the latent class model brought 1% precision improvement (97%–98%) without compromising sensitivity (= 98.9%); iii) the Gaussian mixture model and random forest provided callsets with higher precisions (both >99%) but lower sensitivities; iv) Kamila increased precision (>99%) and kept a high sensitivity (98.8%); it showed the best overall performance. According to precision and F1-score indicators, the compared non-supervised clustering models that combine multiple callsets are able to improve sequencing performance vs. previously used supervised models. Among the models compared, the Gaussian mixture model and Kamila offered non-negligible precision and F1-score improvements. These models may be thus recommended for callset reconstruction (from either biological or technical replicates) for diagnostic or precision medicine purposes. Frontiers Media S.A. 2023-03-16 /pmc/articles/PMC10060969/ /pubmed/37007945 http://dx.doi.org/10.3389/fgene.2023.1148147 Text en Copyright © 2023 Zhai, Bardel, Vallée, Iwaz and Roy. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Zhai, Yue Bardel, Claire Vallée, Maxime Iwaz, Jean Roy, Pascal Performance comparisons between clustering models for reconstructing NGS results from technical replicates |
title | Performance comparisons between clustering models for reconstructing NGS results from technical replicates |
title_full | Performance comparisons between clustering models for reconstructing NGS results from technical replicates |
title_fullStr | Performance comparisons between clustering models for reconstructing NGS results from technical replicates |
title_full_unstemmed | Performance comparisons between clustering models for reconstructing NGS results from technical replicates |
title_short | Performance comparisons between clustering models for reconstructing NGS results from technical replicates |
title_sort | performance comparisons between clustering models for reconstructing ngs results from technical replicates |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10060969/ https://www.ncbi.nlm.nih.gov/pubmed/37007945 http://dx.doi.org/10.3389/fgene.2023.1148147 |
work_keys_str_mv | AT zhaiyue performancecomparisonsbetweenclusteringmodelsforreconstructingngsresultsfromtechnicalreplicates AT bardelclaire performancecomparisonsbetweenclusteringmodelsforreconstructingngsresultsfromtechnicalreplicates AT valleemaxime performancecomparisonsbetweenclusteringmodelsforreconstructingngsresultsfromtechnicalreplicates AT iwazjean performancecomparisonsbetweenclusteringmodelsforreconstructingngsresultsfromtechnicalreplicates AT roypascal performancecomparisonsbetweenclusteringmodelsforreconstructingngsresultsfromtechnicalreplicates |