Cargando…

Optimized detection of insertions/deletions (INDELs) in whole-exome sequencing data

Insertion and deletion (INDEL) mutations, the most common type of structural variance, are associated with several human diseases. The detection of INDELs through next-generation sequencing (NGS) is becoming more common due to the decrease in costs, the increase in efficiency, and sensitivity improv...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Bo-Young, Park, Jung Hoon, Jo, Hye-Yeong, Koo, Soo Kyung, Park, Mi-Hyun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5549930/
https://www.ncbi.nlm.nih.gov/pubmed/28792971
http://dx.doi.org/10.1371/journal.pone.0182272
_version_ 1783256048592748544
author Kim, Bo-Young
Park, Jung Hoon
Jo, Hye-Yeong
Koo, Soo Kyung
Park, Mi-Hyun
author_facet Kim, Bo-Young
Park, Jung Hoon
Jo, Hye-Yeong
Koo, Soo Kyung
Park, Mi-Hyun
author_sort Kim, Bo-Young
collection PubMed
description Insertion and deletion (INDEL) mutations, the most common type of structural variance, are associated with several human diseases. The detection of INDELs through next-generation sequencing (NGS) is becoming more common due to the decrease in costs, the increase in efficiency, and sensitivity improvements demonstrated by the various sequencing platforms and analytical tools. However, there are still many errors associated with INDEL variant calling, and distinguishing INDELs from errors in NGS remains challenging. To evaluate INDEL calling from whole-exome sequencing (WES) data, we performed Sanger sequencing for all INDELs called from the several calling algorithm. We compared the performance of the four algorithms (i.e. GATK, SAMtools, Dindel, and Freebayes) for INDEL detection from the same sample. We examined the sensitivity and PPV of GATK (90.2 and 89.5%, respectively), SAMtools (75.3 and 94.4%, respectively), Dindel (90.1 and 88.6%, respectively), and Freebayes (80.1 and 94.4%, respectively). GATK had the highest sensitivity. Furthermore, we identified INDELs with high PPV (4 algorithms intersection: 98.7%, 3 algorithms intersection: 97.6%, and GATK and SAMtools intersection INDELs: 97.6%). We presented two key sources of difficulties in accurate INDEL detection: 1) the presence of repeat, and 2) heterozygous INDELs. Herein we could suggest the accessible algorithms that selectively reduce error rates and thereby facilitate INDEL detection. Our study may also serve as a basis for understanding the accuracy and completeness of INDEL detection.
format Online
Article
Text
id pubmed-5549930
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-55499302017-08-15 Optimized detection of insertions/deletions (INDELs) in whole-exome sequencing data Kim, Bo-Young Park, Jung Hoon Jo, Hye-Yeong Koo, Soo Kyung Park, Mi-Hyun PLoS One Research Article Insertion and deletion (INDEL) mutations, the most common type of structural variance, are associated with several human diseases. The detection of INDELs through next-generation sequencing (NGS) is becoming more common due to the decrease in costs, the increase in efficiency, and sensitivity improvements demonstrated by the various sequencing platforms and analytical tools. However, there are still many errors associated with INDEL variant calling, and distinguishing INDELs from errors in NGS remains challenging. To evaluate INDEL calling from whole-exome sequencing (WES) data, we performed Sanger sequencing for all INDELs called from the several calling algorithm. We compared the performance of the four algorithms (i.e. GATK, SAMtools, Dindel, and Freebayes) for INDEL detection from the same sample. We examined the sensitivity and PPV of GATK (90.2 and 89.5%, respectively), SAMtools (75.3 and 94.4%, respectively), Dindel (90.1 and 88.6%, respectively), and Freebayes (80.1 and 94.4%, respectively). GATK had the highest sensitivity. Furthermore, we identified INDELs with high PPV (4 algorithms intersection: 98.7%, 3 algorithms intersection: 97.6%, and GATK and SAMtools intersection INDELs: 97.6%). We presented two key sources of difficulties in accurate INDEL detection: 1) the presence of repeat, and 2) heterozygous INDELs. Herein we could suggest the accessible algorithms that selectively reduce error rates and thereby facilitate INDEL detection. Our study may also serve as a basis for understanding the accuracy and completeness of INDEL detection. Public Library of Science 2017-08-09 /pmc/articles/PMC5549930/ /pubmed/28792971 http://dx.doi.org/10.1371/journal.pone.0182272 Text en © 2017 Kim et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Kim, Bo-Young
Park, Jung Hoon
Jo, Hye-Yeong
Koo, Soo Kyung
Park, Mi-Hyun
Optimized detection of insertions/deletions (INDELs) in whole-exome sequencing data
title Optimized detection of insertions/deletions (INDELs) in whole-exome sequencing data
title_full Optimized detection of insertions/deletions (INDELs) in whole-exome sequencing data
title_fullStr Optimized detection of insertions/deletions (INDELs) in whole-exome sequencing data
title_full_unstemmed Optimized detection of insertions/deletions (INDELs) in whole-exome sequencing data
title_short Optimized detection of insertions/deletions (INDELs) in whole-exome sequencing data
title_sort optimized detection of insertions/deletions (indels) in whole-exome sequencing data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5549930/
https://www.ncbi.nlm.nih.gov/pubmed/28792971
http://dx.doi.org/10.1371/journal.pone.0182272
work_keys_str_mv AT kimboyoung optimizeddetectionofinsertionsdeletionsindelsinwholeexomesequencingdata
AT parkjunghoon optimizeddetectionofinsertionsdeletionsindelsinwholeexomesequencingdata
AT johyeyeong optimizeddetectionofinsertionsdeletionsindelsinwholeexomesequencingdata
AT koosookyung optimizeddetectionofinsertionsdeletionsindelsinwholeexomesequencingdata
AT parkmihyun optimizeddetectionofinsertionsdeletionsindelsinwholeexomesequencingdata