Cargando…

GINDEL: Accurate Genotype Calling of Insertions and Deletions from Low Coverage Population Sequence Reads

Insertions and deletions (indels) are important types of structural variations. Obtaining accurate genotypes of indels may facilitate further genetic study. There are a few existing methods for calling indel genotypes from sequence reads. However, none of these tools can accurately call indel genoty...

Descripción completa

Detalles Bibliográficos
Autores principales: Chu, Chong, Zhang, Jin, Wu, Yufeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4244156/
https://www.ncbi.nlm.nih.gov/pubmed/25423315
http://dx.doi.org/10.1371/journal.pone.0113324
_version_ 1782346198632366080
author Chu, Chong
Zhang, Jin
Wu, Yufeng
author_facet Chu, Chong
Zhang, Jin
Wu, Yufeng
author_sort Chu, Chong
collection PubMed
description Insertions and deletions (indels) are important types of structural variations. Obtaining accurate genotypes of indels may facilitate further genetic study. There are a few existing methods for calling indel genotypes from sequence reads. However, none of these tools can accurately call indel genotypes for indels of all lengths, especially for low coverage sequence data. In this paper, we present GINDEL, an approach for calling genotypes of both insertions and deletions from sequence reads. GINDEL uses a machine learning approach which combines multiple features extracted from next generation sequencing data. We test our approach on both simulated and real data and compare with existing tools, including Genome STRiP, Pindel and Clever-sv. Results show that GINDEL works well for deletions larger than 50 bp on both high and low coverage data. Also, GINDEL performs well for insertion genotyping on both simulated and real data. For comparison, Genome STRiP performs less well for shorter deletions (50–200 bp) on both simulated and real sequence data from the 1000 Genomes Project. Clever-sv performs well for intermediate deletions (200–1500 bp) but is less accurate when coverage is low. Pindel only works well for high coverage data, but does not perform well at low coverage. To summarize, we show that GINDEL not only can call genotypes of insertions and deletions (both short and long) for high and low coverage population sequence data, but also is more accurate and efficient than other approaches. The program GINDEL can be downloaded at: http://sourceforge.net/p/gindel
format Online
Article
Text
id pubmed-4244156
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-42441562014-12-05 GINDEL: Accurate Genotype Calling of Insertions and Deletions from Low Coverage Population Sequence Reads Chu, Chong Zhang, Jin Wu, Yufeng PLoS One Research Article Insertions and deletions (indels) are important types of structural variations. Obtaining accurate genotypes of indels may facilitate further genetic study. There are a few existing methods for calling indel genotypes from sequence reads. However, none of these tools can accurately call indel genotypes for indels of all lengths, especially for low coverage sequence data. In this paper, we present GINDEL, an approach for calling genotypes of both insertions and deletions from sequence reads. GINDEL uses a machine learning approach which combines multiple features extracted from next generation sequencing data. We test our approach on both simulated and real data and compare with existing tools, including Genome STRiP, Pindel and Clever-sv. Results show that GINDEL works well for deletions larger than 50 bp on both high and low coverage data. Also, GINDEL performs well for insertion genotyping on both simulated and real data. For comparison, Genome STRiP performs less well for shorter deletions (50–200 bp) on both simulated and real sequence data from the 1000 Genomes Project. Clever-sv performs well for intermediate deletions (200–1500 bp) but is less accurate when coverage is low. Pindel only works well for high coverage data, but does not perform well at low coverage. To summarize, we show that GINDEL not only can call genotypes of insertions and deletions (both short and long) for high and low coverage population sequence data, but also is more accurate and efficient than other approaches. The program GINDEL can be downloaded at: http://sourceforge.net/p/gindel Public Library of Science 2014-11-25 /pmc/articles/PMC4244156/ /pubmed/25423315 http://dx.doi.org/10.1371/journal.pone.0113324 Text en © 2014 Chu et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Chu, Chong
Zhang, Jin
Wu, Yufeng
GINDEL: Accurate Genotype Calling of Insertions and Deletions from Low Coverage Population Sequence Reads
title GINDEL: Accurate Genotype Calling of Insertions and Deletions from Low Coverage Population Sequence Reads
title_full GINDEL: Accurate Genotype Calling of Insertions and Deletions from Low Coverage Population Sequence Reads
title_fullStr GINDEL: Accurate Genotype Calling of Insertions and Deletions from Low Coverage Population Sequence Reads
title_full_unstemmed GINDEL: Accurate Genotype Calling of Insertions and Deletions from Low Coverage Population Sequence Reads
title_short GINDEL: Accurate Genotype Calling of Insertions and Deletions from Low Coverage Population Sequence Reads
title_sort gindel: accurate genotype calling of insertions and deletions from low coverage population sequence reads
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4244156/
https://www.ncbi.nlm.nih.gov/pubmed/25423315
http://dx.doi.org/10.1371/journal.pone.0113324
work_keys_str_mv AT chuchong gindelaccurategenotypecallingofinsertionsanddeletionsfromlowcoveragepopulationsequencereads
AT zhangjin gindelaccurategenotypecallingofinsertionsanddeletionsfromlowcoveragepopulationsequencereads
AT wuyufeng gindelaccurategenotypecallingofinsertionsanddeletionsfromlowcoveragepopulationsequencereads