Cargando…

Performance evaluation of pipelines for mapping, variant calling and interval padding, for the analysis of NGS germline panels

BACKGROUND: Next-generation sequencing (NGS) represents a significant advancement in clinical genetics. However, its use creates several technical, data interpretation and management challenges. It is essential to follow a consistent data analysis pipeline to achieve the highest possible accuracy an...

Descripción completa

Detalles Bibliográficos
Autores principales: Zanti, Maria, Michailidou, Kyriaki, Loizidou, Maria A., Machattou, Christina, Pirpa, Panagiota, Christodoulou, Kyproula, Spyrou, George M., Kyriacou, Kyriacos, Hadjisavvas, Andreas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8080428/
https://www.ncbi.nlm.nih.gov/pubmed/33910496
http://dx.doi.org/10.1186/s12859-021-04144-1
_version_ 1783685423868936192
author Zanti, Maria
Michailidou, Kyriaki
Loizidou, Maria A.
Machattou, Christina
Pirpa, Panagiota
Christodoulou, Kyproula
Spyrou, George M.
Kyriacou, Kyriacos
Hadjisavvas, Andreas
author_facet Zanti, Maria
Michailidou, Kyriaki
Loizidou, Maria A.
Machattou, Christina
Pirpa, Panagiota
Christodoulou, Kyproula
Spyrou, George M.
Kyriacou, Kyriacos
Hadjisavvas, Andreas
author_sort Zanti, Maria
collection PubMed
description BACKGROUND: Next-generation sequencing (NGS) represents a significant advancement in clinical genetics. However, its use creates several technical, data interpretation and management challenges. It is essential to follow a consistent data analysis pipeline to achieve the highest possible accuracy and avoid false variant calls. Herein, we aimed to compare the performance of twenty-eight combinations of NGS data analysis pipeline compartments, including short-read mapping (BWA-MEM, Bowtie2, Stampy), variant calling (GATK-HaplotypeCaller, GATK-UnifiedGenotyper, SAMtools) and interval padding (null, 50 bp, 100 bp) methods, along with a commercially available pipeline (BWA Enrichment, Illumina®). Fourteen germline DNA samples from breast cancer patients were sequenced using a targeted NGS panel approach and subjected to data analysis. RESULTS: We highlight that interval padding is required for the accurate detection of intronic variants including spliceogenic pathogenic variants (PVs). In addition, using nearly default parameters, the BWA Enrichment algorithm, failed to detect these spliceogenic PVs and a missense PV in the TP53 gene. We also recommend the BWA-MEM algorithm for sequence alignment, whereas variant calling should be performed using a combination of variant calling algorithms; GATK-HaplotypeCaller and SAMtools for the accurate detection of insertions/deletions and GATK-UnifiedGenotyper for the efficient detection of single nucleotide variant calls. CONCLUSIONS: These findings have important implications towards the identification of clinically actionable variants through panel testing in a clinical laboratory setting, when dedicated bioinformatics personnel might not always be available. The results also reveal the necessity of improving the existing tools and/or at the same time developing new pipelines to generate more reliable and more consistent data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04144-1.
format Online
Article
Text
id pubmed-8080428
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-80804282021-04-29 Performance evaluation of pipelines for mapping, variant calling and interval padding, for the analysis of NGS germline panels Zanti, Maria Michailidou, Kyriaki Loizidou, Maria A. Machattou, Christina Pirpa, Panagiota Christodoulou, Kyproula Spyrou, George M. Kyriacou, Kyriacos Hadjisavvas, Andreas BMC Bioinformatics Research BACKGROUND: Next-generation sequencing (NGS) represents a significant advancement in clinical genetics. However, its use creates several technical, data interpretation and management challenges. It is essential to follow a consistent data analysis pipeline to achieve the highest possible accuracy and avoid false variant calls. Herein, we aimed to compare the performance of twenty-eight combinations of NGS data analysis pipeline compartments, including short-read mapping (BWA-MEM, Bowtie2, Stampy), variant calling (GATK-HaplotypeCaller, GATK-UnifiedGenotyper, SAMtools) and interval padding (null, 50 bp, 100 bp) methods, along with a commercially available pipeline (BWA Enrichment, Illumina®). Fourteen germline DNA samples from breast cancer patients were sequenced using a targeted NGS panel approach and subjected to data analysis. RESULTS: We highlight that interval padding is required for the accurate detection of intronic variants including spliceogenic pathogenic variants (PVs). In addition, using nearly default parameters, the BWA Enrichment algorithm, failed to detect these spliceogenic PVs and a missense PV in the TP53 gene. We also recommend the BWA-MEM algorithm for sequence alignment, whereas variant calling should be performed using a combination of variant calling algorithms; GATK-HaplotypeCaller and SAMtools for the accurate detection of insertions/deletions and GATK-UnifiedGenotyper for the efficient detection of single nucleotide variant calls. CONCLUSIONS: These findings have important implications towards the identification of clinically actionable variants through panel testing in a clinical laboratory setting, when dedicated bioinformatics personnel might not always be available. The results also reveal the necessity of improving the existing tools and/or at the same time developing new pipelines to generate more reliable and more consistent data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04144-1. BioMed Central 2021-04-28 /pmc/articles/PMC8080428/ /pubmed/33910496 http://dx.doi.org/10.1186/s12859-021-04144-1 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Zanti, Maria
Michailidou, Kyriaki
Loizidou, Maria A.
Machattou, Christina
Pirpa, Panagiota
Christodoulou, Kyproula
Spyrou, George M.
Kyriacou, Kyriacos
Hadjisavvas, Andreas
Performance evaluation of pipelines for mapping, variant calling and interval padding, for the analysis of NGS germline panels
title Performance evaluation of pipelines for mapping, variant calling and interval padding, for the analysis of NGS germline panels
title_full Performance evaluation of pipelines for mapping, variant calling and interval padding, for the analysis of NGS germline panels
title_fullStr Performance evaluation of pipelines for mapping, variant calling and interval padding, for the analysis of NGS germline panels
title_full_unstemmed Performance evaluation of pipelines for mapping, variant calling and interval padding, for the analysis of NGS germline panels
title_short Performance evaluation of pipelines for mapping, variant calling and interval padding, for the analysis of NGS germline panels
title_sort performance evaluation of pipelines for mapping, variant calling and interval padding, for the analysis of ngs germline panels
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8080428/
https://www.ncbi.nlm.nih.gov/pubmed/33910496
http://dx.doi.org/10.1186/s12859-021-04144-1
work_keys_str_mv AT zantimaria performanceevaluationofpipelinesformappingvariantcallingandintervalpaddingfortheanalysisofngsgermlinepanels
AT michailidoukyriaki performanceevaluationofpipelinesformappingvariantcallingandintervalpaddingfortheanalysisofngsgermlinepanels
AT loizidoumariaa performanceevaluationofpipelinesformappingvariantcallingandintervalpaddingfortheanalysisofngsgermlinepanels
AT machattouchristina performanceevaluationofpipelinesformappingvariantcallingandintervalpaddingfortheanalysisofngsgermlinepanels
AT pirpapanagiota performanceevaluationofpipelinesformappingvariantcallingandintervalpaddingfortheanalysisofngsgermlinepanels
AT christodouloukyproula performanceevaluationofpipelinesformappingvariantcallingandintervalpaddingfortheanalysisofngsgermlinepanels
AT spyrougeorgem performanceevaluationofpipelinesformappingvariantcallingandintervalpaddingfortheanalysisofngsgermlinepanels
AT kyriacoukyriacos performanceevaluationofpipelinesformappingvariantcallingandintervalpaddingfortheanalysisofngsgermlinepanels
AT hadjisavvasandreas performanceevaluationofpipelinesformappingvariantcallingandintervalpaddingfortheanalysisofngsgermlinepanels