Cargando…
MindTheGap: integrated detection and assembly of short and long insertions
Motivation: Insertions play an important role in genome evolution. However, such variants are difficult to detect from short-read sequencing data, especially when they exceed the paired-end insert size. Many approaches have been proposed to call short insertion variants based on paired-end mapping....
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4253827/ https://www.ncbi.nlm.nih.gov/pubmed/25123898 http://dx.doi.org/10.1093/bioinformatics/btu545 |
_version_ | 1782347294910185472 |
---|---|
author | Rizk, Guillaume Gouin, Anaïs Chikhi, Rayan Lemaitre, Claire |
author_facet | Rizk, Guillaume Gouin, Anaïs Chikhi, Rayan Lemaitre, Claire |
author_sort | Rizk, Guillaume |
collection | PubMed |
description | Motivation: Insertions play an important role in genome evolution. However, such variants are difficult to detect from short-read sequencing data, especially when they exceed the paired-end insert size. Many approaches have been proposed to call short insertion variants based on paired-end mapping. However, there remains a lack of practical methods to detect and assemble long variants. Results: We propose here an original method, called MindTheGap, for the integrated detection and assembly of insertion variants from re-sequencing data. Importantly, it is designed to call insertions of any size, whether they are novel or duplicated, homozygous or heterozygous in the donor genome. MindTheGap uses an efficient k-mer-based method to detect insertion sites in a reference genome, and subsequently assemble them from the donor reads. MindTheGap showed high recall and precision on simulated datasets of various genome complexities. When applied to real Caenorhabditis elegans and human NA12878 datasets, MindTheGap detected and correctly assembled insertions >1 kb, using at most 14 GB of memory. Availability and implementation: http://mindthegap.genouest.org Contact: guillaume.rizk@inria.fr or claire.lemaitre@inria.fr |
format | Online Article Text |
id | pubmed-4253827 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-42538272014-12-04 MindTheGap: integrated detection and assembly of short and long insertions Rizk, Guillaume Gouin, Anaïs Chikhi, Rayan Lemaitre, Claire Bioinformatics Hitseq Papers Motivation: Insertions play an important role in genome evolution. However, such variants are difficult to detect from short-read sequencing data, especially when they exceed the paired-end insert size. Many approaches have been proposed to call short insertion variants based on paired-end mapping. However, there remains a lack of practical methods to detect and assemble long variants. Results: We propose here an original method, called MindTheGap, for the integrated detection and assembly of insertion variants from re-sequencing data. Importantly, it is designed to call insertions of any size, whether they are novel or duplicated, homozygous or heterozygous in the donor genome. MindTheGap uses an efficient k-mer-based method to detect insertion sites in a reference genome, and subsequently assemble them from the donor reads. MindTheGap showed high recall and precision on simulated datasets of various genome complexities. When applied to real Caenorhabditis elegans and human NA12878 datasets, MindTheGap detected and correctly assembled insertions >1 kb, using at most 14 GB of memory. Availability and implementation: http://mindthegap.genouest.org Contact: guillaume.rizk@inria.fr or claire.lemaitre@inria.fr Oxford University Press 2014-12-15 2014-08-14 /pmc/articles/PMC4253827/ /pubmed/25123898 http://dx.doi.org/10.1093/bioinformatics/btu545 Text en © The Author 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Hitseq Papers Rizk, Guillaume Gouin, Anaïs Chikhi, Rayan Lemaitre, Claire MindTheGap: integrated detection and assembly of short and long insertions |
title | MindTheGap: integrated detection and assembly of short and long insertions |
title_full | MindTheGap: integrated detection and assembly of short and long insertions |
title_fullStr | MindTheGap: integrated detection and assembly of short and long insertions |
title_full_unstemmed | MindTheGap: integrated detection and assembly of short and long insertions |
title_short | MindTheGap: integrated detection and assembly of short and long insertions |
title_sort | mindthegap: integrated detection and assembly of short and long insertions |
topic | Hitseq Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4253827/ https://www.ncbi.nlm.nih.gov/pubmed/25123898 http://dx.doi.org/10.1093/bioinformatics/btu545 |
work_keys_str_mv | AT rizkguillaume mindthegapintegrateddetectionandassemblyofshortandlonginsertions AT gouinanais mindthegapintegrateddetectionandassemblyofshortandlonginsertions AT chikhirayan mindthegapintegrateddetectionandassemblyofshortandlonginsertions AT lemaitreclaire mindthegapintegrateddetectionandassemblyofshortandlonginsertions |