Cargando…

DINTD: Detection and Inference of Tandem Duplications From Short Sequencing Reads

Tandem duplication (TD) is an important type of structural variation (SV) in the human genome and has biological significance for human cancer evolution and tumor genesis. Accurate and reliable detection of TDs plays an important role in advancing early detection, diagnosis, and treatment of disease...

Descripción completa

Detalles Bibliográficos
Autores principales: Dong, Jinxin, Qi, Minyong, Wang, Shaoqiang, Yuan, Xiguo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7433346/
https://www.ncbi.nlm.nih.gov/pubmed/32849857
http://dx.doi.org/10.3389/fgene.2020.00924
_version_ 1783571986565300224
author Dong, Jinxin
Qi, Minyong
Wang, Shaoqiang
Yuan, Xiguo
author_facet Dong, Jinxin
Qi, Minyong
Wang, Shaoqiang
Yuan, Xiguo
author_sort Dong, Jinxin
collection PubMed
description Tandem duplication (TD) is an important type of structural variation (SV) in the human genome and has biological significance for human cancer evolution and tumor genesis. Accurate and reliable detection of TDs plays an important role in advancing early detection, diagnosis, and treatment of disease. The advent of next-generation sequencing technologies has made it possible for the study of TDs. However, detection is still challenging due to the uneven distribution of reads and the uncertain amplitude of TD regions. In this paper, we present a new method, DINTD (Detection and INference of Tandem Duplications), to detect and infer TDs using short sequencing reads. The major principle of the proposed method is that it first extracts read depth and mapping quality signals, then uses the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm to find the possible TD regions. The total variation penalized least squares model is fitted with read depth and mapping quality signals to denoise signals. A 2D binary search tree is used to search the neighbor points effectively. To further identify the exact breakpoints of the TD regions, split-read signals are integrated into DINTD. The experimental results of DINTD on simulated data sets showed that DINTD can outperform other methods for sensitivity, precision, F1-score, and boundary bias. DINTD is further validated on real samples, and the experiment results indicate that it is consistent with other methods. This study indicates that DINTD can be used as an effective tool for detecting TDs.
format Online
Article
Text
id pubmed-7433346
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-74333462020-08-25 DINTD: Detection and Inference of Tandem Duplications From Short Sequencing Reads Dong, Jinxin Qi, Minyong Wang, Shaoqiang Yuan, Xiguo Front Genet Genetics Tandem duplication (TD) is an important type of structural variation (SV) in the human genome and has biological significance for human cancer evolution and tumor genesis. Accurate and reliable detection of TDs plays an important role in advancing early detection, diagnosis, and treatment of disease. The advent of next-generation sequencing technologies has made it possible for the study of TDs. However, detection is still challenging due to the uneven distribution of reads and the uncertain amplitude of TD regions. In this paper, we present a new method, DINTD (Detection and INference of Tandem Duplications), to detect and infer TDs using short sequencing reads. The major principle of the proposed method is that it first extracts read depth and mapping quality signals, then uses the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm to find the possible TD regions. The total variation penalized least squares model is fitted with read depth and mapping quality signals to denoise signals. A 2D binary search tree is used to search the neighbor points effectively. To further identify the exact breakpoints of the TD regions, split-read signals are integrated into DINTD. The experimental results of DINTD on simulated data sets showed that DINTD can outperform other methods for sensitivity, precision, F1-score, and boundary bias. DINTD is further validated on real samples, and the experiment results indicate that it is consistent with other methods. This study indicates that DINTD can be used as an effective tool for detecting TDs. Frontiers Media S.A. 2020-08-11 /pmc/articles/PMC7433346/ /pubmed/32849857 http://dx.doi.org/10.3389/fgene.2020.00924 Text en Copyright © 2020 Dong, Qi, Wang and Yuan. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Dong, Jinxin
Qi, Minyong
Wang, Shaoqiang
Yuan, Xiguo
DINTD: Detection and Inference of Tandem Duplications From Short Sequencing Reads
title DINTD: Detection and Inference of Tandem Duplications From Short Sequencing Reads
title_full DINTD: Detection and Inference of Tandem Duplications From Short Sequencing Reads
title_fullStr DINTD: Detection and Inference of Tandem Duplications From Short Sequencing Reads
title_full_unstemmed DINTD: Detection and Inference of Tandem Duplications From Short Sequencing Reads
title_short DINTD: Detection and Inference of Tandem Duplications From Short Sequencing Reads
title_sort dintd: detection and inference of tandem duplications from short sequencing reads
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7433346/
https://www.ncbi.nlm.nih.gov/pubmed/32849857
http://dx.doi.org/10.3389/fgene.2020.00924
work_keys_str_mv AT dongjinxin dintddetectionandinferenceoftandemduplicationsfromshortsequencingreads
AT qiminyong dintddetectionandinferenceoftandemduplicationsfromshortsequencingreads
AT wangshaoqiang dintddetectionandinferenceoftandemduplicationsfromshortsequencingreads
AT yuanxiguo dintddetectionandinferenceoftandemduplicationsfromshortsequencingreads