Cargando…

Accurate indel prediction using paired-end short reads

BACKGROUND: One of the major open challenges in next generation sequencing (NGS) is the accurate identification of structural variants such as insertions and deletions (indels). Current methods for indel calling assign scores to different types of evidence or counter-evidence for the presence of an...

Descripción completa

Detalles Bibliográficos
Autores principales:	Grimm, Dominik, Hagmann, Jörg, Koenig, Daniel, Weigel, Detlef, Borgwardt, Karsten
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3614465/ https://www.ncbi.nlm.nih.gov/pubmed/23442375 http://dx.doi.org/10.1186/1471-2164-14-132

_version_	1782264845643546624
author	Grimm, Dominik Hagmann, Jörg Koenig, Daniel Weigel, Detlef Borgwardt, Karsten
author_facet	Grimm, Dominik Hagmann, Jörg Koenig, Daniel Weigel, Detlef Borgwardt, Karsten
author_sort	Grimm, Dominik
collection	PubMed
description	BACKGROUND: One of the major open challenges in next generation sequencing (NGS) is the accurate identification of structural variants such as insertions and deletions (indels). Current methods for indel calling assign scores to different types of evidence or counter-evidence for the presence of an indel, such as the number of split read alignments spanning the boundaries of a deletion candidate or reads that map within a putative deletion. Candidates with a score above a manually defined threshold are then predicted to be true indels. As a consequence, structural variants detected in this manner contain many false positives. RESULTS: Here, we present a machine learning based method which is able to discover and distinguish true from false indel candidates in order to reduce the false positive rate. Our method identifies indel candidates using a discriminative classifier based on features of split read alignment profiles and trained on true and false indel candidates that were validated by Sanger sequencing. We demonstrate the usefulness of our method with paired-end Illumina reads from 80 genomes of the first phase of the 1001 Genomes Project ( http://www.1001genomes.org) in Arabidopsis thaliana. CONCLUSION: In this work we show that indel classification is a necessary step to reduce the number of false positive candidates. We demonstrate that missing classification may lead to spurious biological interpretations. The software is available at: http://agkb.is.tuebingen.mpg.de/Forschung/SV-M/.
format	Online Article Text
id	pubmed-3614465
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-36144652013-04-05 Accurate indel prediction using paired-end short reads Grimm, Dominik Hagmann, Jörg Koenig, Daniel Weigel, Detlef Borgwardt, Karsten BMC Genomics Methodology Article BACKGROUND: One of the major open challenges in next generation sequencing (NGS) is the accurate identification of structural variants such as insertions and deletions (indels). Current methods for indel calling assign scores to different types of evidence or counter-evidence for the presence of an indel, such as the number of split read alignments spanning the boundaries of a deletion candidate or reads that map within a putative deletion. Candidates with a score above a manually defined threshold are then predicted to be true indels. As a consequence, structural variants detected in this manner contain many false positives. RESULTS: Here, we present a machine learning based method which is able to discover and distinguish true from false indel candidates in order to reduce the false positive rate. Our method identifies indel candidates using a discriminative classifier based on features of split read alignment profiles and trained on true and false indel candidates that were validated by Sanger sequencing. We demonstrate the usefulness of our method with paired-end Illumina reads from 80 genomes of the first phase of the 1001 Genomes Project ( http://www.1001genomes.org) in Arabidopsis thaliana. CONCLUSION: In this work we show that indel classification is a necessary step to reduce the number of false positive candidates. We demonstrate that missing classification may lead to spurious biological interpretations. The software is available at: http://agkb.is.tuebingen.mpg.de/Forschung/SV-M/. BioMed Central 2013-02-27 /pmc/articles/PMC3614465/ /pubmed/23442375 http://dx.doi.org/10.1186/1471-2164-14-132 Text en Copyright © 2013 Grimm et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Grimm, Dominik Hagmann, Jörg Koenig, Daniel Weigel, Detlef Borgwardt, Karsten Accurate indel prediction using paired-end short reads
title	Accurate indel prediction using paired-end short reads
title_full	Accurate indel prediction using paired-end short reads
title_fullStr	Accurate indel prediction using paired-end short reads
title_full_unstemmed	Accurate indel prediction using paired-end short reads
title_short	Accurate indel prediction using paired-end short reads
title_sort	accurate indel prediction using paired-end short reads
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3614465/ https://www.ncbi.nlm.nih.gov/pubmed/23442375 http://dx.doi.org/10.1186/1471-2164-14-132
work_keys_str_mv	AT grimmdominik accurateindelpredictionusingpairedendshortreads AT hagmannjorg accurateindelpredictionusingpairedendshortreads AT koenigdaniel accurateindelpredictionusingpairedendshortreads AT weigeldetlef accurateindelpredictionusingpairedendshortreads AT borgwardtkarsten accurateindelpredictionusingpairedendshortreads

Accurate indel prediction using paired-end short reads

Ejemplares similares