Cargando…

MindTheGap: integrated detection and assembly of short and long insertions

Motivation: Insertions play an important role in genome evolution. However, such variants are difficult to detect from short-read sequencing data, especially when they exceed the paired-end insert size. Many approaches have been proposed to call short insertion variants based on paired-end mapping....

Descripción completa

Detalles Bibliográficos
Autores principales: Rizk, Guillaume, Gouin, Anaïs, Chikhi, Rayan, Lemaitre, Claire
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4253827/
https://www.ncbi.nlm.nih.gov/pubmed/25123898
http://dx.doi.org/10.1093/bioinformatics/btu545
_version_ 1782347294910185472
author Rizk, Guillaume
Gouin, Anaïs
Chikhi, Rayan
Lemaitre, Claire
author_facet Rizk, Guillaume
Gouin, Anaïs
Chikhi, Rayan
Lemaitre, Claire
author_sort Rizk, Guillaume
collection PubMed
description Motivation: Insertions play an important role in genome evolution. However, such variants are difficult to detect from short-read sequencing data, especially when they exceed the paired-end insert size. Many approaches have been proposed to call short insertion variants based on paired-end mapping. However, there remains a lack of practical methods to detect and assemble long variants. Results: We propose here an original method, called MindTheGap, for the integrated detection and assembly of insertion variants from re-sequencing data. Importantly, it is designed to call insertions of any size, whether they are novel or duplicated, homozygous or heterozygous in the donor genome. MindTheGap uses an efficient k-mer-based method to detect insertion sites in a reference genome, and subsequently assemble them from the donor reads. MindTheGap showed high recall and precision on simulated datasets of various genome complexities. When applied to real Caenorhabditis elegans and human NA12878 datasets, MindTheGap detected and correctly assembled insertions >1 kb, using at most 14 GB of memory. Availability and implementation: http://mindthegap.genouest.org Contact: guillaume.rizk@inria.fr or claire.lemaitre@inria.fr
format Online
Article
Text
id pubmed-4253827
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-42538272014-12-04 MindTheGap: integrated detection and assembly of short and long insertions Rizk, Guillaume Gouin, Anaïs Chikhi, Rayan Lemaitre, Claire Bioinformatics Hitseq Papers Motivation: Insertions play an important role in genome evolution. However, such variants are difficult to detect from short-read sequencing data, especially when they exceed the paired-end insert size. Many approaches have been proposed to call short insertion variants based on paired-end mapping. However, there remains a lack of practical methods to detect and assemble long variants. Results: We propose here an original method, called MindTheGap, for the integrated detection and assembly of insertion variants from re-sequencing data. Importantly, it is designed to call insertions of any size, whether they are novel or duplicated, homozygous or heterozygous in the donor genome. MindTheGap uses an efficient k-mer-based method to detect insertion sites in a reference genome, and subsequently assemble them from the donor reads. MindTheGap showed high recall and precision on simulated datasets of various genome complexities. When applied to real Caenorhabditis elegans and human NA12878 datasets, MindTheGap detected and correctly assembled insertions >1 kb, using at most 14 GB of memory. Availability and implementation: http://mindthegap.genouest.org Contact: guillaume.rizk@inria.fr or claire.lemaitre@inria.fr Oxford University Press 2014-12-15 2014-08-14 /pmc/articles/PMC4253827/ /pubmed/25123898 http://dx.doi.org/10.1093/bioinformatics/btu545 Text en © The Author 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Hitseq Papers
Rizk, Guillaume
Gouin, Anaïs
Chikhi, Rayan
Lemaitre, Claire
MindTheGap: integrated detection and assembly of short and long insertions
title MindTheGap: integrated detection and assembly of short and long insertions
title_full MindTheGap: integrated detection and assembly of short and long insertions
title_fullStr MindTheGap: integrated detection and assembly of short and long insertions
title_full_unstemmed MindTheGap: integrated detection and assembly of short and long insertions
title_short MindTheGap: integrated detection and assembly of short and long insertions
title_sort mindthegap: integrated detection and assembly of short and long insertions
topic Hitseq Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4253827/
https://www.ncbi.nlm.nih.gov/pubmed/25123898
http://dx.doi.org/10.1093/bioinformatics/btu545
work_keys_str_mv AT rizkguillaume mindthegapintegrateddetectionandassemblyofshortandlonginsertions
AT gouinanais mindthegapintegrateddetectionandassemblyofshortandlonginsertions
AT chikhirayan mindthegapintegrateddetectionandassemblyofshortandlonginsertions
AT lemaitreclaire mindthegapintegrateddetectionandassemblyofshortandlonginsertions