Cargando…

Identification of indels in next-generation sequencing data

BACKGROUND: The discovery and mapping of genomic variants is an essential step in most analysis done using sequencing reads. There are a number of mature software packages and associated pipelines that can identify single nucleotide polymorphisms (SNPs) with a high degree of concordance. However, th...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ratan, Aakrosh, Olson, Thomas L, Loughran, Thomas P, Miller, Webb
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4339746/ https://www.ncbi.nlm.nih.gov/pubmed/25879703 http://dx.doi.org/10.1186/s12859-015-0483-6

_version_	1782358912384630784
author	Ratan, Aakrosh Olson, Thomas L Loughran, Thomas P Miller, Webb
author_facet	Ratan, Aakrosh Olson, Thomas L Loughran, Thomas P Miller, Webb
author_sort	Ratan, Aakrosh
collection	PubMed
description	BACKGROUND: The discovery and mapping of genomic variants is an essential step in most analysis done using sequencing reads. There are a number of mature software packages and associated pipelines that can identify single nucleotide polymorphisms (SNPs) with a high degree of concordance. However, the same cannot be said for tools that are used to identify the other types of variants. Indels represent the second most frequent class of variants in the human genome, after single nucleotide polymorphisms. The reliable detection of indels is still a challenging problem, especially for variants that are longer than a few bases. RESULTS: We have developed a set of algorithms and heuristics collectively called indelMINER to identify indels from whole genome resequencing datasets using paired-end reads. indelMINER uses a split-read approach to identify the precise breakpoints for indels of size less than a user specified threshold, and supplements that with a paired-end approach to identify larger variants that are frequently missed with the split-read approach. We use simulated and real datasets to show that an implementation of the algorithm performs favorably when compared to several existing tools. CONCLUSIONS: indelMINER can be used effectively to identify indels in whole-genome resequencing projects. The output is provided in the VCF format along with additional information about the variant, including information about its presence or absence in another sample. The source code and documentation for indelMINER can be freely downloaded from www.bx.psu.edu/miller_lab/indelMINER.tar.gz. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0483-6) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4339746
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-43397462015-02-26 Identification of indels in next-generation sequencing data Ratan, Aakrosh Olson, Thomas L Loughran, Thomas P Miller, Webb BMC Bioinformatics Research Article BACKGROUND: The discovery and mapping of genomic variants is an essential step in most analysis done using sequencing reads. There are a number of mature software packages and associated pipelines that can identify single nucleotide polymorphisms (SNPs) with a high degree of concordance. However, the same cannot be said for tools that are used to identify the other types of variants. Indels represent the second most frequent class of variants in the human genome, after single nucleotide polymorphisms. The reliable detection of indels is still a challenging problem, especially for variants that are longer than a few bases. RESULTS: We have developed a set of algorithms and heuristics collectively called indelMINER to identify indels from whole genome resequencing datasets using paired-end reads. indelMINER uses a split-read approach to identify the precise breakpoints for indels of size less than a user specified threshold, and supplements that with a paired-end approach to identify larger variants that are frequently missed with the split-read approach. We use simulated and real datasets to show that an implementation of the algorithm performs favorably when compared to several existing tools. CONCLUSIONS: indelMINER can be used effectively to identify indels in whole-genome resequencing projects. The output is provided in the VCF format along with additional information about the variant, including information about its presence or absence in another sample. The source code and documentation for indelMINER can be freely downloaded from www.bx.psu.edu/miller_lab/indelMINER.tar.gz. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0483-6) contains supplementary material, which is available to authorized users. BioMed Central 2015-02-13 /pmc/articles/PMC4339746/ /pubmed/25879703 http://dx.doi.org/10.1186/s12859-015-0483-6 Text en © Ratan et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Ratan, Aakrosh Olson, Thomas L Loughran, Thomas P Miller, Webb Identification of indels in next-generation sequencing data
title	Identification of indels in next-generation sequencing data
title_full	Identification of indels in next-generation sequencing data
title_fullStr	Identification of indels in next-generation sequencing data
title_full_unstemmed	Identification of indels in next-generation sequencing data
title_short	Identification of indels in next-generation sequencing data
title_sort	identification of indels in next-generation sequencing data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4339746/ https://www.ncbi.nlm.nih.gov/pubmed/25879703 http://dx.doi.org/10.1186/s12859-015-0483-6
work_keys_str_mv	AT ratanaakrosh identificationofindelsinnextgenerationsequencingdata AT olsonthomasl identificationofindelsinnextgenerationsequencingdata AT loughranthomasp identificationofindelsinnextgenerationsequencingdata AT millerwebb identificationofindelsinnextgenerationsequencingdata

Identification of indels in next-generation sequencing data

Ejemplares similares