Cargando…

The impact of post-alignment processing procedures on whole-exome sequencing data

The use of post-alignment procedures has been suggested to prevent the identification of false-positives in massive DNA sequencing data. Insertions and deletions are most likely to be misinterpreted by variant calling algorithms. Using known genetic variants as references for post-processing pipelin...

Descripción completa

Detalles Bibliográficos
Autores principales: Borges, Murilo Guimarães, de Moraes, Helena Tadiello, Rocha, Cristiane de Souza, Lopes-Cendes, Iscia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Sociedade Brasileira de Genética 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7783507/
https://www.ncbi.nlm.nih.gov/pubmed/33306778
http://dx.doi.org/10.1590/1678-4685-GMB-2020-0047
_version_ 1783632130329280512
author Borges, Murilo Guimarães
de Moraes, Helena Tadiello
Rocha, Cristiane de Souza
Lopes-Cendes, Iscia
author_facet Borges, Murilo Guimarães
de Moraes, Helena Tadiello
Rocha, Cristiane de Souza
Lopes-Cendes, Iscia
author_sort Borges, Murilo Guimarães
collection PubMed
description The use of post-alignment procedures has been suggested to prevent the identification of false-positives in massive DNA sequencing data. Insertions and deletions are most likely to be misinterpreted by variant calling algorithms. Using known genetic variants as references for post-processing pipelines can minimize mismatches. They allow reads to be correctly realigned and recalibrated, resulting in more parsimonious variant calling. In this work, we aim to investigate the impact of using different sets of common variants as references to facilitate variant calling from whole-exome sequencing data. We selected reference variants from common insertions and deletions available within the 1K Genomes project data and from databases from the Latin American Database of Genetic Variation (LatinGen). We used the Genome Analysis Toolkit to perform post-processing procedures like local realignment, quality recalibration procedures, and variant calling in whole exome samples. We identified an increased number of variants from the call set for all groups when no post-processing procedure was performed. We found that there was a higher concordance rate between variants called using 1K Genomes and LatinGen. Therefore, we believe that the increased number of rare variants identified in the analysis without realignment or quality recalibration indicated that they were likely false-positives.
format Online
Article
Text
id pubmed-7783507
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Sociedade Brasileira de Genética
record_format MEDLINE/PubMed
spelling pubmed-77835072021-01-14 The impact of post-alignment processing procedures on whole-exome sequencing data Borges, Murilo Guimarães de Moraes, Helena Tadiello Rocha, Cristiane de Souza Lopes-Cendes, Iscia Genet Mol Biol Genomics and Bioinformatics The use of post-alignment procedures has been suggested to prevent the identification of false-positives in massive DNA sequencing data. Insertions and deletions are most likely to be misinterpreted by variant calling algorithms. Using known genetic variants as references for post-processing pipelines can minimize mismatches. They allow reads to be correctly realigned and recalibrated, resulting in more parsimonious variant calling. In this work, we aim to investigate the impact of using different sets of common variants as references to facilitate variant calling from whole-exome sequencing data. We selected reference variants from common insertions and deletions available within the 1K Genomes project data and from databases from the Latin American Database of Genetic Variation (LatinGen). We used the Genome Analysis Toolkit to perform post-processing procedures like local realignment, quality recalibration procedures, and variant calling in whole exome samples. We identified an increased number of variants from the call set for all groups when no post-processing procedure was performed. We found that there was a higher concordance rate between variants called using 1K Genomes and LatinGen. Therefore, we believe that the increased number of rare variants identified in the analysis without realignment or quality recalibration indicated that they were likely false-positives. Sociedade Brasileira de Genética 2020-11-13 /pmc/articles/PMC7783507/ /pubmed/33306778 http://dx.doi.org/10.1590/1678-4685-GMB-2020-0047 Text en https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License
spellingShingle Genomics and Bioinformatics
Borges, Murilo Guimarães
de Moraes, Helena Tadiello
Rocha, Cristiane de Souza
Lopes-Cendes, Iscia
The impact of post-alignment processing procedures on whole-exome sequencing data
title The impact of post-alignment processing procedures on whole-exome sequencing data
title_full The impact of post-alignment processing procedures on whole-exome sequencing data
title_fullStr The impact of post-alignment processing procedures on whole-exome sequencing data
title_full_unstemmed The impact of post-alignment processing procedures on whole-exome sequencing data
title_short The impact of post-alignment processing procedures on whole-exome sequencing data
title_sort impact of post-alignment processing procedures on whole-exome sequencing data
topic Genomics and Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7783507/
https://www.ncbi.nlm.nih.gov/pubmed/33306778
http://dx.doi.org/10.1590/1678-4685-GMB-2020-0047
work_keys_str_mv AT borgesmuriloguimaraes theimpactofpostalignmentprocessingproceduresonwholeexomesequencingdata
AT demoraeshelenatadiello theimpactofpostalignmentprocessingproceduresonwholeexomesequencingdata
AT rochacristianedesouza theimpactofpostalignmentprocessingproceduresonwholeexomesequencingdata
AT lopescendesiscia theimpactofpostalignmentprocessingproceduresonwholeexomesequencingdata
AT borgesmuriloguimaraes impactofpostalignmentprocessingproceduresonwholeexomesequencingdata
AT demoraeshelenatadiello impactofpostalignmentprocessingproceduresonwholeexomesequencingdata
AT rochacristianedesouza impactofpostalignmentprocessingproceduresonwholeexomesequencingdata
AT lopescendesiscia impactofpostalignmentprocessingproceduresonwholeexomesequencingdata