Cargando…

DB(2): a probabilistic approach for accurate detection of tandem duplication breakpoints using paired-end reads

BACKGROUND: With the advent of paired-end high throughput sequencing, it is now possible to identify various types of structural variation on a genome-wide scale. Although many methods have been proposed for structural variation detection, most do not provide precise boundaries for identified varian...

Descripción completa

Detalles Bibliográficos
Autores principales: Yavaş, Gökhan, Koyutürk, Mehmet, Gould, Meetha P, McMahon, Sarah, LaFramboise, Thomas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4234483/
https://www.ncbi.nlm.nih.gov/pubmed/24597945
http://dx.doi.org/10.1186/1471-2164-15-175
_version_ 1782344869686018048
author Yavaş, Gökhan
Koyutürk, Mehmet
Gould, Meetha P
McMahon, Sarah
LaFramboise, Thomas
author_facet Yavaş, Gökhan
Koyutürk, Mehmet
Gould, Meetha P
McMahon, Sarah
LaFramboise, Thomas
author_sort Yavaş, Gökhan
collection PubMed
description BACKGROUND: With the advent of paired-end high throughput sequencing, it is now possible to identify various types of structural variation on a genome-wide scale. Although many methods have been proposed for structural variation detection, most do not provide precise boundaries for identified variants. In this paper, we propose a new method, Distribution Based detection of Duplication Boundaries (DB(2)), for accurate detection of tandem duplication breakpoints, an important class of structural variation, with high precision and recall. RESULTS: Our computational experiments on simulated data show that DB(2) outperforms state-of-the-art methods in terms of finding breakpoints of tandem duplications, with a higher positive predictive value (precision) in calling the duplications’ presence. In particular, DB(2)’s prediction of tandem duplications is correct 99% of the time even for very noisy data, while narrowing down the space of possible breakpoints within a margin of 15 to 20 bps on the average. Most of the existing methods provide boundaries in ranges that extend to hundreds of bases with lower precision values. Our method is also highly robust to varying properties of the sequencing library and to the sizes of the tandem duplications, as shown by its stable precision, recall and mean boundary mismatch performance. We demonstrate our method’s efficacy using both simulated paired-end reads, and those generated from a melanoma sample and two ovarian cancer samples. Newly discovered tandem duplications are validated using PCR and Sanger sequencing. CONCLUSIONS: Our method, DB(2), uses discordantly aligned reads, taking into account the distribution of fragment length to predict tandem duplications along with their breakpoints on a donor genome. The proposed method fine tunes the breakpoint calls by applying a novel probabilistic framework that incorporates the empirical fragment length distribution to score each feasible breakpoint. DB(2) is implemented in Java programming language and is freely available at http://mendel.gene.cwru.edu/laframboiselab/software.php. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-175) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4234483
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42344832014-11-19 DB(2): a probabilistic approach for accurate detection of tandem duplication breakpoints using paired-end reads Yavaş, Gökhan Koyutürk, Mehmet Gould, Meetha P McMahon, Sarah LaFramboise, Thomas BMC Genomics Methodology Article BACKGROUND: With the advent of paired-end high throughput sequencing, it is now possible to identify various types of structural variation on a genome-wide scale. Although many methods have been proposed for structural variation detection, most do not provide precise boundaries for identified variants. In this paper, we propose a new method, Distribution Based detection of Duplication Boundaries (DB(2)), for accurate detection of tandem duplication breakpoints, an important class of structural variation, with high precision and recall. RESULTS: Our computational experiments on simulated data show that DB(2) outperforms state-of-the-art methods in terms of finding breakpoints of tandem duplications, with a higher positive predictive value (precision) in calling the duplications’ presence. In particular, DB(2)’s prediction of tandem duplications is correct 99% of the time even for very noisy data, while narrowing down the space of possible breakpoints within a margin of 15 to 20 bps on the average. Most of the existing methods provide boundaries in ranges that extend to hundreds of bases with lower precision values. Our method is also highly robust to varying properties of the sequencing library and to the sizes of the tandem duplications, as shown by its stable precision, recall and mean boundary mismatch performance. We demonstrate our method’s efficacy using both simulated paired-end reads, and those generated from a melanoma sample and two ovarian cancer samples. Newly discovered tandem duplications are validated using PCR and Sanger sequencing. CONCLUSIONS: Our method, DB(2), uses discordantly aligned reads, taking into account the distribution of fragment length to predict tandem duplications along with their breakpoints on a donor genome. The proposed method fine tunes the breakpoint calls by applying a novel probabilistic framework that incorporates the empirical fragment length distribution to score each feasible breakpoint. DB(2) is implemented in Java programming language and is freely available at http://mendel.gene.cwru.edu/laframboiselab/software.php. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-175) contains supplementary material, which is available to authorized users. BioMed Central 2014-03-05 /pmc/articles/PMC4234483/ /pubmed/24597945 http://dx.doi.org/10.1186/1471-2164-15-175 Text en © Yavaş et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle Methodology Article
Yavaş, Gökhan
Koyutürk, Mehmet
Gould, Meetha P
McMahon, Sarah
LaFramboise, Thomas
DB(2): a probabilistic approach for accurate detection of tandem duplication breakpoints using paired-end reads
title DB(2): a probabilistic approach for accurate detection of tandem duplication breakpoints using paired-end reads
title_full DB(2): a probabilistic approach for accurate detection of tandem duplication breakpoints using paired-end reads
title_fullStr DB(2): a probabilistic approach for accurate detection of tandem duplication breakpoints using paired-end reads
title_full_unstemmed DB(2): a probabilistic approach for accurate detection of tandem duplication breakpoints using paired-end reads
title_short DB(2): a probabilistic approach for accurate detection of tandem duplication breakpoints using paired-end reads
title_sort db(2): a probabilistic approach for accurate detection of tandem duplication breakpoints using paired-end reads
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4234483/
https://www.ncbi.nlm.nih.gov/pubmed/24597945
http://dx.doi.org/10.1186/1471-2164-15-175
work_keys_str_mv AT yavasgokhan db2aprobabilisticapproachforaccuratedetectionoftandemduplicationbreakpointsusingpairedendreads
AT koyuturkmehmet db2aprobabilisticapproachforaccuratedetectionoftandemduplicationbreakpointsusingpairedendreads
AT gouldmeethap db2aprobabilisticapproachforaccuratedetectionoftandemduplicationbreakpointsusingpairedendreads
AT mcmahonsarah db2aprobabilisticapproachforaccuratedetectionoftandemduplicationbreakpointsusingpairedendreads
AT laframboisethomas db2aprobabilisticapproachforaccuratedetectionoftandemduplicationbreakpointsusingpairedendreads