Cargando…

ITD assembler: an algorithm for internal tandem duplication discovery from short-read sequencing data

BACKGROUND: Detection of tandem duplication within coding exons, referred to as internal tandem duplication (ITD), remains challenging due to inefficiencies in alignment of ITD-containing reads to the reference genome. There is a critical need to develop efficient methods to recover these important...

Descripción completa

Detalles Bibliográficos
Autores principales: Rustagi, Navin, Hampton, Oliver A, Li, Jie, Xi, Liu, Gibbs, Richard A., Plon, Sharon E., Kimmel, Marek, Wheeler, David A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4847212/
https://www.ncbi.nlm.nih.gov/pubmed/27121965
http://dx.doi.org/10.1186/s12859-016-1031-8
_version_ 1782429167650865152
author Rustagi, Navin
Hampton, Oliver A
Li, Jie
Xi, Liu
Gibbs, Richard A.
Plon, Sharon E.
Kimmel, Marek
Wheeler, David A.
author_facet Rustagi, Navin
Hampton, Oliver A
Li, Jie
Xi, Liu
Gibbs, Richard A.
Plon, Sharon E.
Kimmel, Marek
Wheeler, David A.
author_sort Rustagi, Navin
collection PubMed
description BACKGROUND: Detection of tandem duplication within coding exons, referred to as internal tandem duplication (ITD), remains challenging due to inefficiencies in alignment of ITD-containing reads to the reference genome. There is a critical need to develop efficient methods to recover these important mutational events. RESULTS: In this paper we introduce ITD Assembler, a novel approach that rapidly evaluates all unmapped and partially mapped reads from whole exome NGS data using a De Bruijn graphs approach to select reads that harbor cycles of appropriate length, followed by assembly using overlap-layout-consensus. We tested ITD Assembler on The Cancer Genome Atlas AML dataset as a truth set. ITD Assembler identified the highest percentage of reported FLT3-ITDs when compared to other ITD detection algorithms, and discovered additional ITDs in FLT3, KIT, CEBPA, WT1 and other genes. Evidence of polymorphic ITDs in 54 genes were also found. Novel ITDs were validated by analyzing the corresponding RNA sequencing data. CONCLUSIONS: ITD Assembler is a very sensitive tool which can detect partial, large and complex tandem duplications. This study highlights the need to more effectively look for ITD’s in other cancers and Mendelian diseases. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1031-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4847212
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-48472122016-05-04 ITD assembler: an algorithm for internal tandem duplication discovery from short-read sequencing data Rustagi, Navin Hampton, Oliver A Li, Jie Xi, Liu Gibbs, Richard A. Plon, Sharon E. Kimmel, Marek Wheeler, David A. BMC Bioinformatics Methodology Article BACKGROUND: Detection of tandem duplication within coding exons, referred to as internal tandem duplication (ITD), remains challenging due to inefficiencies in alignment of ITD-containing reads to the reference genome. There is a critical need to develop efficient methods to recover these important mutational events. RESULTS: In this paper we introduce ITD Assembler, a novel approach that rapidly evaluates all unmapped and partially mapped reads from whole exome NGS data using a De Bruijn graphs approach to select reads that harbor cycles of appropriate length, followed by assembly using overlap-layout-consensus. We tested ITD Assembler on The Cancer Genome Atlas AML dataset as a truth set. ITD Assembler identified the highest percentage of reported FLT3-ITDs when compared to other ITD detection algorithms, and discovered additional ITDs in FLT3, KIT, CEBPA, WT1 and other genes. Evidence of polymorphic ITDs in 54 genes were also found. Novel ITDs were validated by analyzing the corresponding RNA sequencing data. CONCLUSIONS: ITD Assembler is a very sensitive tool which can detect partial, large and complex tandem duplications. This study highlights the need to more effectively look for ITD’s in other cancers and Mendelian diseases. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1031-8) contains supplementary material, which is available to authorized users. BioMed Central 2016-04-27 /pmc/articles/PMC4847212/ /pubmed/27121965 http://dx.doi.org/10.1186/s12859-016-1031-8 Text en © Rustagi et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Rustagi, Navin
Hampton, Oliver A
Li, Jie
Xi, Liu
Gibbs, Richard A.
Plon, Sharon E.
Kimmel, Marek
Wheeler, David A.
ITD assembler: an algorithm for internal tandem duplication discovery from short-read sequencing data
title ITD assembler: an algorithm for internal tandem duplication discovery from short-read sequencing data
title_full ITD assembler: an algorithm for internal tandem duplication discovery from short-read sequencing data
title_fullStr ITD assembler: an algorithm for internal tandem duplication discovery from short-read sequencing data
title_full_unstemmed ITD assembler: an algorithm for internal tandem duplication discovery from short-read sequencing data
title_short ITD assembler: an algorithm for internal tandem duplication discovery from short-read sequencing data
title_sort itd assembler: an algorithm for internal tandem duplication discovery from short-read sequencing data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4847212/
https://www.ncbi.nlm.nih.gov/pubmed/27121965
http://dx.doi.org/10.1186/s12859-016-1031-8
work_keys_str_mv AT rustaginavin itdassembleranalgorithmforinternaltandemduplicationdiscoveryfromshortreadsequencingdata
AT hamptonolivera itdassembleranalgorithmforinternaltandemduplicationdiscoveryfromshortreadsequencingdata
AT lijie itdassembleranalgorithmforinternaltandemduplicationdiscoveryfromshortreadsequencingdata
AT xiliu itdassembleranalgorithmforinternaltandemduplicationdiscoveryfromshortreadsequencingdata
AT gibbsricharda itdassembleranalgorithmforinternaltandemduplicationdiscoveryfromshortreadsequencingdata
AT plonsharone itdassembleranalgorithmforinternaltandemduplicationdiscoveryfromshortreadsequencingdata
AT kimmelmarek itdassembleranalgorithmforinternaltandemduplicationdiscoveryfromshortreadsequencingdata
AT wheelerdavida itdassembleranalgorithmforinternaltandemduplicationdiscoveryfromshortreadsequencingdata