Cargando…

The distribution and mutagenesis of short coding INDELs from 1,128 whole exomes

BACKGROUND: Identifying insertion/deletion polymorphisms (INDELs) with high confidence has been intrinsically challenging in short-read sequencing data. Here we report our approach for improving INDEL calling accuracy by using a machine learning algorithm to combine call sets generated with three in...

Descripción completa

Detalles Bibliográficos
Autores principales: Challis, Danny, Antunes, Lilian, Garrison, Erik, Banks, Eric, Evani, Uday S, Muzny, Donna, Poplin, Ryan, Gibbs, Richard A, Marth, Gabor, Yu, Fuli
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4352271/
https://www.ncbi.nlm.nih.gov/pubmed/25765891
http://dx.doi.org/10.1186/s12864-015-1333-7
_version_ 1782360434887622656
author Challis, Danny
Antunes, Lilian
Garrison, Erik
Banks, Eric
Evani, Uday S
Muzny, Donna
Poplin, Ryan
Gibbs, Richard A
Marth, Gabor
Yu, Fuli
author_facet Challis, Danny
Antunes, Lilian
Garrison, Erik
Banks, Eric
Evani, Uday S
Muzny, Donna
Poplin, Ryan
Gibbs, Richard A
Marth, Gabor
Yu, Fuli
author_sort Challis, Danny
collection PubMed
description BACKGROUND: Identifying insertion/deletion polymorphisms (INDELs) with high confidence has been intrinsically challenging in short-read sequencing data. Here we report our approach for improving INDEL calling accuracy by using a machine learning algorithm to combine call sets generated with three independent methods, and by leveraging the strengths of each individual pipeline. Utilizing this approach, we generated a consensus exome INDEL call set from a large dataset generated by the 1000 Genomes Project (1000G), maximizing both the sensitivity and the specificity of the calls. RESULTS: This consensus exome INDEL call set features 7,210 INDELs, from 1,128 individuals across 13 populations included in the 1000 Genomes Phase 1 dataset, with a false discovery rate (FDR) of about 7.0%. CONCLUSIONS: In our study we further characterize the patterns and distributions of these exonic INDELs with respect to density, allele length, and site frequency spectrum, as well as the potential mutagenic mechanisms of coding INDELs in humans. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1333-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4352271
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43522712015-03-08 The distribution and mutagenesis of short coding INDELs from 1,128 whole exomes Challis, Danny Antunes, Lilian Garrison, Erik Banks, Eric Evani, Uday S Muzny, Donna Poplin, Ryan Gibbs, Richard A Marth, Gabor Yu, Fuli BMC Genomics Research Article BACKGROUND: Identifying insertion/deletion polymorphisms (INDELs) with high confidence has been intrinsically challenging in short-read sequencing data. Here we report our approach for improving INDEL calling accuracy by using a machine learning algorithm to combine call sets generated with three independent methods, and by leveraging the strengths of each individual pipeline. Utilizing this approach, we generated a consensus exome INDEL call set from a large dataset generated by the 1000 Genomes Project (1000G), maximizing both the sensitivity and the specificity of the calls. RESULTS: This consensus exome INDEL call set features 7,210 INDELs, from 1,128 individuals across 13 populations included in the 1000 Genomes Phase 1 dataset, with a false discovery rate (FDR) of about 7.0%. CONCLUSIONS: In our study we further characterize the patterns and distributions of these exonic INDELs with respect to density, allele length, and site frequency spectrum, as well as the potential mutagenic mechanisms of coding INDELs in humans. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1333-7) contains supplementary material, which is available to authorized users. BioMed Central 2015-02-28 /pmc/articles/PMC4352271/ /pubmed/25765891 http://dx.doi.org/10.1186/s12864-015-1333-7 Text en © Challis et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Challis, Danny
Antunes, Lilian
Garrison, Erik
Banks, Eric
Evani, Uday S
Muzny, Donna
Poplin, Ryan
Gibbs, Richard A
Marth, Gabor
Yu, Fuli
The distribution and mutagenesis of short coding INDELs from 1,128 whole exomes
title The distribution and mutagenesis of short coding INDELs from 1,128 whole exomes
title_full The distribution and mutagenesis of short coding INDELs from 1,128 whole exomes
title_fullStr The distribution and mutagenesis of short coding INDELs from 1,128 whole exomes
title_full_unstemmed The distribution and mutagenesis of short coding INDELs from 1,128 whole exomes
title_short The distribution and mutagenesis of short coding INDELs from 1,128 whole exomes
title_sort distribution and mutagenesis of short coding indels from 1,128 whole exomes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4352271/
https://www.ncbi.nlm.nih.gov/pubmed/25765891
http://dx.doi.org/10.1186/s12864-015-1333-7
work_keys_str_mv AT challisdanny thedistributionandmutagenesisofshortcodingindelsfrom1128wholeexomes
AT antuneslilian thedistributionandmutagenesisofshortcodingindelsfrom1128wholeexomes
AT garrisonerik thedistributionandmutagenesisofshortcodingindelsfrom1128wholeexomes
AT bankseric thedistributionandmutagenesisofshortcodingindelsfrom1128wholeexomes
AT evaniudays thedistributionandmutagenesisofshortcodingindelsfrom1128wholeexomes
AT muznydonna thedistributionandmutagenesisofshortcodingindelsfrom1128wholeexomes
AT poplinryan thedistributionandmutagenesisofshortcodingindelsfrom1128wholeexomes
AT gibbsricharda thedistributionandmutagenesisofshortcodingindelsfrom1128wholeexomes
AT marthgabor thedistributionandmutagenesisofshortcodingindelsfrom1128wholeexomes
AT yufuli thedistributionandmutagenesisofshortcodingindelsfrom1128wholeexomes
AT challisdanny distributionandmutagenesisofshortcodingindelsfrom1128wholeexomes
AT antuneslilian distributionandmutagenesisofshortcodingindelsfrom1128wholeexomes
AT garrisonerik distributionandmutagenesisofshortcodingindelsfrom1128wholeexomes
AT bankseric distributionandmutagenesisofshortcodingindelsfrom1128wholeexomes
AT evaniudays distributionandmutagenesisofshortcodingindelsfrom1128wholeexomes
AT muznydonna distributionandmutagenesisofshortcodingindelsfrom1128wholeexomes
AT poplinryan distributionandmutagenesisofshortcodingindelsfrom1128wholeexomes
AT gibbsricharda distributionandmutagenesisofshortcodingindelsfrom1128wholeexomes
AT marthgabor distributionandmutagenesisofshortcodingindelsfrom1128wholeexomes
AT yufuli distributionandmutagenesisofshortcodingindelsfrom1128wholeexomes