Cargando…

Comprehensive Analysis to Improve the Validation Rate for Single Nucleotide Variants Detected by Next-Generation Sequencing

Next-generation sequencing (NGS) has enabled the high-throughput discovery of germline and somatic mutations. However, NGS-based variant detection is still prone to errors, resulting in inaccurate variant calls. Here, we categorized the variants detected by NGS according to total read depth (TD) and...

Descripción completa

Detalles Bibliográficos
Autores principales: Park, Mi-Hyun, Rhee, Hwanseok, Park, Jung Hoon, Woo, Hae-Mi, Choi, Byung-Ok, Kim, Bo-Young, Chung, Ki Wha, Cho, Yoo-Bok, Kim, Hyung Jin, Jung, Ji-Won, Koo, Soo Kyung
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3906084/
https://www.ncbi.nlm.nih.gov/pubmed/24489763
http://dx.doi.org/10.1371/journal.pone.0086664
_version_ 1782301435146272768
author Park, Mi-Hyun
Rhee, Hwanseok
Park, Jung Hoon
Woo, Hae-Mi
Choi, Byung-Ok
Kim, Bo-Young
Chung, Ki Wha
Cho, Yoo-Bok
Kim, Hyung Jin
Jung, Ji-Won
Koo, Soo Kyung
author_facet Park, Mi-Hyun
Rhee, Hwanseok
Park, Jung Hoon
Woo, Hae-Mi
Choi, Byung-Ok
Kim, Bo-Young
Chung, Ki Wha
Cho, Yoo-Bok
Kim, Hyung Jin
Jung, Ji-Won
Koo, Soo Kyung
author_sort Park, Mi-Hyun
collection PubMed
description Next-generation sequencing (NGS) has enabled the high-throughput discovery of germline and somatic mutations. However, NGS-based variant detection is still prone to errors, resulting in inaccurate variant calls. Here, we categorized the variants detected by NGS according to total read depth (TD) and SNP quality (SNPQ), and performed Sanger sequencing with 348 selected non-synonymous single nucleotide variants (SNVs) for validation. Using the SAMtools and GATK algorithms, the validation rate was positively correlated with SNPQ but showed no correlation with TD. In addition, common variants called by both programs had a higher validation rate than caller-specific variants. We further examined several parameters to improve the validation rate, and found that strand bias (SB) was a key parameter. SB in NGS data showed a strong difference between the variants passing validation and those that failed validation, showing a validation rate of more than 92% (filtering cutoff value: alternate allele forward [AF]≥20 and AF<80 in SAMtools, SB<–10 in GATK). Moreover, the validation rate increased significantly (up to 97–99%) when the variant was filtered together with the suggested values of mapping quality (MQ), SNPQ and SB. This detailed and systematic study provides comprehensive recommendations for improving validation rates, saving time and lowering cost in NGS analyses.
format Online
Article
Text
id pubmed-3906084
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-39060842014-01-31 Comprehensive Analysis to Improve the Validation Rate for Single Nucleotide Variants Detected by Next-Generation Sequencing Park, Mi-Hyun Rhee, Hwanseok Park, Jung Hoon Woo, Hae-Mi Choi, Byung-Ok Kim, Bo-Young Chung, Ki Wha Cho, Yoo-Bok Kim, Hyung Jin Jung, Ji-Won Koo, Soo Kyung PLoS One Research Article Next-generation sequencing (NGS) has enabled the high-throughput discovery of germline and somatic mutations. However, NGS-based variant detection is still prone to errors, resulting in inaccurate variant calls. Here, we categorized the variants detected by NGS according to total read depth (TD) and SNP quality (SNPQ), and performed Sanger sequencing with 348 selected non-synonymous single nucleotide variants (SNVs) for validation. Using the SAMtools and GATK algorithms, the validation rate was positively correlated with SNPQ but showed no correlation with TD. In addition, common variants called by both programs had a higher validation rate than caller-specific variants. We further examined several parameters to improve the validation rate, and found that strand bias (SB) was a key parameter. SB in NGS data showed a strong difference between the variants passing validation and those that failed validation, showing a validation rate of more than 92% (filtering cutoff value: alternate allele forward [AF]≥20 and AF<80 in SAMtools, SB<–10 in GATK). Moreover, the validation rate increased significantly (up to 97–99%) when the variant was filtered together with the suggested values of mapping quality (MQ), SNPQ and SB. This detailed and systematic study provides comprehensive recommendations for improving validation rates, saving time and lowering cost in NGS analyses. Public Library of Science 2014-01-29 /pmc/articles/PMC3906084/ /pubmed/24489763 http://dx.doi.org/10.1371/journal.pone.0086664 Text en © 2014 Park et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Park, Mi-Hyun
Rhee, Hwanseok
Park, Jung Hoon
Woo, Hae-Mi
Choi, Byung-Ok
Kim, Bo-Young
Chung, Ki Wha
Cho, Yoo-Bok
Kim, Hyung Jin
Jung, Ji-Won
Koo, Soo Kyung
Comprehensive Analysis to Improve the Validation Rate for Single Nucleotide Variants Detected by Next-Generation Sequencing
title Comprehensive Analysis to Improve the Validation Rate for Single Nucleotide Variants Detected by Next-Generation Sequencing
title_full Comprehensive Analysis to Improve the Validation Rate for Single Nucleotide Variants Detected by Next-Generation Sequencing
title_fullStr Comprehensive Analysis to Improve the Validation Rate for Single Nucleotide Variants Detected by Next-Generation Sequencing
title_full_unstemmed Comprehensive Analysis to Improve the Validation Rate for Single Nucleotide Variants Detected by Next-Generation Sequencing
title_short Comprehensive Analysis to Improve the Validation Rate for Single Nucleotide Variants Detected by Next-Generation Sequencing
title_sort comprehensive analysis to improve the validation rate for single nucleotide variants detected by next-generation sequencing
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3906084/
https://www.ncbi.nlm.nih.gov/pubmed/24489763
http://dx.doi.org/10.1371/journal.pone.0086664
work_keys_str_mv AT parkmihyun comprehensiveanalysistoimprovethevalidationrateforsinglenucleotidevariantsdetectedbynextgenerationsequencing
AT rheehwanseok comprehensiveanalysistoimprovethevalidationrateforsinglenucleotidevariantsdetectedbynextgenerationsequencing
AT parkjunghoon comprehensiveanalysistoimprovethevalidationrateforsinglenucleotidevariantsdetectedbynextgenerationsequencing
AT woohaemi comprehensiveanalysistoimprovethevalidationrateforsinglenucleotidevariantsdetectedbynextgenerationsequencing
AT choibyungok comprehensiveanalysistoimprovethevalidationrateforsinglenucleotidevariantsdetectedbynextgenerationsequencing
AT kimboyoung comprehensiveanalysistoimprovethevalidationrateforsinglenucleotidevariantsdetectedbynextgenerationsequencing
AT chungkiwha comprehensiveanalysistoimprovethevalidationrateforsinglenucleotidevariantsdetectedbynextgenerationsequencing
AT choyoobok comprehensiveanalysistoimprovethevalidationrateforsinglenucleotidevariantsdetectedbynextgenerationsequencing
AT kimhyungjin comprehensiveanalysistoimprovethevalidationrateforsinglenucleotidevariantsdetectedbynextgenerationsequencing
AT jungjiwon comprehensiveanalysistoimprovethevalidationrateforsinglenucleotidevariantsdetectedbynextgenerationsequencing
AT koosookyung comprehensiveanalysistoimprovethevalidationrateforsinglenucleotidevariantsdetectedbynextgenerationsequencing