Cargando…

Optimized pipeline of MuTect and GATK tools to improve the detection of somatic single nucleotide polymorphisms in whole-exome sequencing data

BACKGROUND: Detecting somatic mutations in whole exome sequencing data of cancer samples has become a popular approach for profiling cancer development, progression and chemotherapy resistance. Several studies have proposed software packages, filters and parametrizations. However, many research grou...

Descripción completa

Detalles Bibliográficos
Autores principales: do Valle, Ítalo Faria, Giampieri, Enrico, Simonetti, Giorgia, Padella, Antonella, Manfrini, Marco, Ferrari, Anna, Papayannidis, Cristina, Zironi, Isabella, Garonzi, Marianna, Bernardi, Simona, Delledonne, Massimo, Martinelli, Giovanni, Remondini, Daniel, Castellani, Gastone
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5123378/
https://www.ncbi.nlm.nih.gov/pubmed/28185561
http://dx.doi.org/10.1186/s12859-016-1190-7
_version_ 1782469723963785216
author do Valle, Ítalo Faria
Giampieri, Enrico
Simonetti, Giorgia
Padella, Antonella
Manfrini, Marco
Ferrari, Anna
Papayannidis, Cristina
Zironi, Isabella
Garonzi, Marianna
Bernardi, Simona
Delledonne, Massimo
Martinelli, Giovanni
Remondini, Daniel
Castellani, Gastone
author_facet do Valle, Ítalo Faria
Giampieri, Enrico
Simonetti, Giorgia
Padella, Antonella
Manfrini, Marco
Ferrari, Anna
Papayannidis, Cristina
Zironi, Isabella
Garonzi, Marianna
Bernardi, Simona
Delledonne, Massimo
Martinelli, Giovanni
Remondini, Daniel
Castellani, Gastone
author_sort do Valle, Ítalo Faria
collection PubMed
description BACKGROUND: Detecting somatic mutations in whole exome sequencing data of cancer samples has become a popular approach for profiling cancer development, progression and chemotherapy resistance. Several studies have proposed software packages, filters and parametrizations. However, many research groups reported low concordance among different methods. We aimed to develop a pipeline which detects a wide range of single nucleotide mutations with high validation rates. We combined two standard tools – Genome Analysis Toolkit (GATK) and MuTect – to create the GATK-LOD(N) method. As proof of principle, we applied our pipeline to exome sequencing data of hematological (Acute Myeloid and Acute Lymphoblastic Leukemias) and solid (Gastrointestinal Stromal Tumor and Lung Adenocarcinoma) tumors. We performed experiments on simulated data to test the sensitivity and specificity of our pipeline. RESULTS: The software MuTect presented the highest validation rate (90 %) for mutation detection, but limited number of somatic mutations detected. The GATK detected a high number of mutations but with low specificity. The GATK-LOD(N) increased the performance of the GATK variant detection (from 5 of 14 to 3 of 4 confirmed variants), while preserving mutations not detected by MuTect. However, GATK-LOD(N) filtered more variants in the hematological samples than in the solid tumors. Experiments in simulated data demonstrated that GATK-LOD(N) increased both specificity and sensitivity of GATK results. CONCLUSION: We presented a pipeline that detects a wide range of somatic single nucleotide variants, with good validation rates, from exome sequencing data of cancer samples. We also showed the advantage of combining standard algorithms to create the GATK-LOD(N) method, that increased specificity and sensitivity of GATK results. This pipeline can be helpful in discovery studies aimed to profile the somatic mutational landscape of cancer genomes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1190-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5123378
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-51233782016-12-08 Optimized pipeline of MuTect and GATK tools to improve the detection of somatic single nucleotide polymorphisms in whole-exome sequencing data do Valle, Ítalo Faria Giampieri, Enrico Simonetti, Giorgia Padella, Antonella Manfrini, Marco Ferrari, Anna Papayannidis, Cristina Zironi, Isabella Garonzi, Marianna Bernardi, Simona Delledonne, Massimo Martinelli, Giovanni Remondini, Daniel Castellani, Gastone BMC Bioinformatics Research BACKGROUND: Detecting somatic mutations in whole exome sequencing data of cancer samples has become a popular approach for profiling cancer development, progression and chemotherapy resistance. Several studies have proposed software packages, filters and parametrizations. However, many research groups reported low concordance among different methods. We aimed to develop a pipeline which detects a wide range of single nucleotide mutations with high validation rates. We combined two standard tools – Genome Analysis Toolkit (GATK) and MuTect – to create the GATK-LOD(N) method. As proof of principle, we applied our pipeline to exome sequencing data of hematological (Acute Myeloid and Acute Lymphoblastic Leukemias) and solid (Gastrointestinal Stromal Tumor and Lung Adenocarcinoma) tumors. We performed experiments on simulated data to test the sensitivity and specificity of our pipeline. RESULTS: The software MuTect presented the highest validation rate (90 %) for mutation detection, but limited number of somatic mutations detected. The GATK detected a high number of mutations but with low specificity. The GATK-LOD(N) increased the performance of the GATK variant detection (from 5 of 14 to 3 of 4 confirmed variants), while preserving mutations not detected by MuTect. However, GATK-LOD(N) filtered more variants in the hematological samples than in the solid tumors. Experiments in simulated data demonstrated that GATK-LOD(N) increased both specificity and sensitivity of GATK results. CONCLUSION: We presented a pipeline that detects a wide range of somatic single nucleotide variants, with good validation rates, from exome sequencing data of cancer samples. We also showed the advantage of combining standard algorithms to create the GATK-LOD(N) method, that increased specificity and sensitivity of GATK results. This pipeline can be helpful in discovery studies aimed to profile the somatic mutational landscape of cancer genomes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1190-7) contains supplementary material, which is available to authorized users. BioMed Central 2016-11-08 /pmc/articles/PMC5123378/ /pubmed/28185561 http://dx.doi.org/10.1186/s12859-016-1190-7 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
do Valle, Ítalo Faria
Giampieri, Enrico
Simonetti, Giorgia
Padella, Antonella
Manfrini, Marco
Ferrari, Anna
Papayannidis, Cristina
Zironi, Isabella
Garonzi, Marianna
Bernardi, Simona
Delledonne, Massimo
Martinelli, Giovanni
Remondini, Daniel
Castellani, Gastone
Optimized pipeline of MuTect and GATK tools to improve the detection of somatic single nucleotide polymorphisms in whole-exome sequencing data
title Optimized pipeline of MuTect and GATK tools to improve the detection of somatic single nucleotide polymorphisms in whole-exome sequencing data
title_full Optimized pipeline of MuTect and GATK tools to improve the detection of somatic single nucleotide polymorphisms in whole-exome sequencing data
title_fullStr Optimized pipeline of MuTect and GATK tools to improve the detection of somatic single nucleotide polymorphisms in whole-exome sequencing data
title_full_unstemmed Optimized pipeline of MuTect and GATK tools to improve the detection of somatic single nucleotide polymorphisms in whole-exome sequencing data
title_short Optimized pipeline of MuTect and GATK tools to improve the detection of somatic single nucleotide polymorphisms in whole-exome sequencing data
title_sort optimized pipeline of mutect and gatk tools to improve the detection of somatic single nucleotide polymorphisms in whole-exome sequencing data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5123378/
https://www.ncbi.nlm.nih.gov/pubmed/28185561
http://dx.doi.org/10.1186/s12859-016-1190-7
work_keys_str_mv AT dovalleitalofaria optimizedpipelineofmutectandgatktoolstoimprovethedetectionofsomaticsinglenucleotidepolymorphismsinwholeexomesequencingdata
AT giampierienrico optimizedpipelineofmutectandgatktoolstoimprovethedetectionofsomaticsinglenucleotidepolymorphismsinwholeexomesequencingdata
AT simonettigiorgia optimizedpipelineofmutectandgatktoolstoimprovethedetectionofsomaticsinglenucleotidepolymorphismsinwholeexomesequencingdata
AT padellaantonella optimizedpipelineofmutectandgatktoolstoimprovethedetectionofsomaticsinglenucleotidepolymorphismsinwholeexomesequencingdata
AT manfrinimarco optimizedpipelineofmutectandgatktoolstoimprovethedetectionofsomaticsinglenucleotidepolymorphismsinwholeexomesequencingdata
AT ferrarianna optimizedpipelineofmutectandgatktoolstoimprovethedetectionofsomaticsinglenucleotidepolymorphismsinwholeexomesequencingdata
AT papayannidiscristina optimizedpipelineofmutectandgatktoolstoimprovethedetectionofsomaticsinglenucleotidepolymorphismsinwholeexomesequencingdata
AT zironiisabella optimizedpipelineofmutectandgatktoolstoimprovethedetectionofsomaticsinglenucleotidepolymorphismsinwholeexomesequencingdata
AT garonzimarianna optimizedpipelineofmutectandgatktoolstoimprovethedetectionofsomaticsinglenucleotidepolymorphismsinwholeexomesequencingdata
AT bernardisimona optimizedpipelineofmutectandgatktoolstoimprovethedetectionofsomaticsinglenucleotidepolymorphismsinwholeexomesequencingdata
AT delledonnemassimo optimizedpipelineofmutectandgatktoolstoimprovethedetectionofsomaticsinglenucleotidepolymorphismsinwholeexomesequencingdata
AT martinelligiovanni optimizedpipelineofmutectandgatktoolstoimprovethedetectionofsomaticsinglenucleotidepolymorphismsinwholeexomesequencingdata
AT remondinidaniel optimizedpipelineofmutectandgatktoolstoimprovethedetectionofsomaticsinglenucleotidepolymorphismsinwholeexomesequencingdata
AT castellanigastone optimizedpipelineofmutectandgatktoolstoimprovethedetectionofsomaticsinglenucleotidepolymorphismsinwholeexomesequencingdata