Cargando…

Set-theory based benchmarking of three different variant callers for targeted sequencing

BACKGROUND: Next generation sequencing (NGS) technologies have improved the study of hereditary diseases. Since the evaluation of bioinformatics pipelines is not straightforward, NGS demands effective strategies to analyze data that is of paramount relevance for decision making under a clinical scen...

Descripción completa

Detalles Bibliográficos
Autores principales: Molina-Mora, Jose Arturo, Solano-Vargas, Mariela
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7791862/
https://www.ncbi.nlm.nih.gov/pubmed/33413082
http://dx.doi.org/10.1186/s12859-020-03926-3
_version_ 1783633682902286336
author Molina-Mora, Jose Arturo
Solano-Vargas, Mariela
author_facet Molina-Mora, Jose Arturo
Solano-Vargas, Mariela
author_sort Molina-Mora, Jose Arturo
collection PubMed
description BACKGROUND: Next generation sequencing (NGS) technologies have improved the study of hereditary diseases. Since the evaluation of bioinformatics pipelines is not straightforward, NGS demands effective strategies to analyze data that is of paramount relevance for decision making under a clinical scenario. According to the benchmarking framework of the Global Alliance for Genomics and Health (GA4GH), we implemented a new simple and user-friendly set-theory based method to assess variant callers using a gold standard variant set and high confidence regions. As model, we used TruSight Cardio kit sequencing data of the reference genome NA12878. This targeted sequencing kit is used to identify variants in key genes related to Inherited Cardiac Conditions (ICCs), a group of cardiovascular diseases with high rates of morbidity and mortality. RESULTS: We implemented and compared three variant calling pipelines (Isaac, Freebayes, and VarScan). Performance metrics using our set-theory approach showed high-resolution pipelines and revealed: (1) a perfect recall of 1.000 for all three pipelines, (2) very high precision values, i.e. 0.987 for Freebayes, 0.928 for VarScan, and 1.000 for Isaac, when compared with the reference material, and (3) a ROC curve analysis with AUC > 0.94 for all cases. Moreover, significant differences were obtained between the three pipelines. In general, results indicate that the three pipelines were able to recognize the expected variants in the gold standard data set. CONCLUSIONS: Our set-theory approach to calculate metrics was able to identify the expected ICCs related variants by the three selected pipelines, but results were completely dependent on the algorithms. We emphasize the importance to assess pipelines using gold standard materials to achieve the most reliable results for clinical application.
format Online
Article
Text
id pubmed-7791862
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-77918622021-01-11 Set-theory based benchmarking of three different variant callers for targeted sequencing Molina-Mora, Jose Arturo Solano-Vargas, Mariela BMC Bioinformatics Methodology Article BACKGROUND: Next generation sequencing (NGS) technologies have improved the study of hereditary diseases. Since the evaluation of bioinformatics pipelines is not straightforward, NGS demands effective strategies to analyze data that is of paramount relevance for decision making under a clinical scenario. According to the benchmarking framework of the Global Alliance for Genomics and Health (GA4GH), we implemented a new simple and user-friendly set-theory based method to assess variant callers using a gold standard variant set and high confidence regions. As model, we used TruSight Cardio kit sequencing data of the reference genome NA12878. This targeted sequencing kit is used to identify variants in key genes related to Inherited Cardiac Conditions (ICCs), a group of cardiovascular diseases with high rates of morbidity and mortality. RESULTS: We implemented and compared three variant calling pipelines (Isaac, Freebayes, and VarScan). Performance metrics using our set-theory approach showed high-resolution pipelines and revealed: (1) a perfect recall of 1.000 for all three pipelines, (2) very high precision values, i.e. 0.987 for Freebayes, 0.928 for VarScan, and 1.000 for Isaac, when compared with the reference material, and (3) a ROC curve analysis with AUC > 0.94 for all cases. Moreover, significant differences were obtained between the three pipelines. In general, results indicate that the three pipelines were able to recognize the expected variants in the gold standard data set. CONCLUSIONS: Our set-theory approach to calculate metrics was able to identify the expected ICCs related variants by the three selected pipelines, but results were completely dependent on the algorithms. We emphasize the importance to assess pipelines using gold standard materials to achieve the most reliable results for clinical application. BioMed Central 2021-01-07 /pmc/articles/PMC7791862/ /pubmed/33413082 http://dx.doi.org/10.1186/s12859-020-03926-3 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Molina-Mora, Jose Arturo
Solano-Vargas, Mariela
Set-theory based benchmarking of three different variant callers for targeted sequencing
title Set-theory based benchmarking of three different variant callers for targeted sequencing
title_full Set-theory based benchmarking of three different variant callers for targeted sequencing
title_fullStr Set-theory based benchmarking of three different variant callers for targeted sequencing
title_full_unstemmed Set-theory based benchmarking of three different variant callers for targeted sequencing
title_short Set-theory based benchmarking of three different variant callers for targeted sequencing
title_sort set-theory based benchmarking of three different variant callers for targeted sequencing
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7791862/
https://www.ncbi.nlm.nih.gov/pubmed/33413082
http://dx.doi.org/10.1186/s12859-020-03926-3
work_keys_str_mv AT molinamorajosearturo settheorybasedbenchmarkingofthreedifferentvariantcallersfortargetedsequencing
AT solanovargasmariela settheorybasedbenchmarkingofthreedifferentvariantcallersfortargetedsequencing