Cargando…
Detecting genomic deletions from high-throughput sequence data with unsupervised learning
BACKGROUND: Structural variation (SV), which ranges from 50 bp to [Formula: see text] 3 Mb in size, is an important type of genetic variations. Deletion is a type of SV in which a part of a chromosome or a sequence of DNA is lost during DNA replication. Three types of signals, including discordant...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9881243/ https://www.ncbi.nlm.nih.gov/pubmed/36707775 http://dx.doi.org/10.1186/s12859-023-05139-w |
_version_ | 1784879070906941440 |
---|---|
author | Li, Xin Wu, Yufeng |
author_facet | Li, Xin Wu, Yufeng |
author_sort | Li, Xin |
collection | PubMed |
description | BACKGROUND: Structural variation (SV), which ranges from 50 bp to [Formula: see text] 3 Mb in size, is an important type of genetic variations. Deletion is a type of SV in which a part of a chromosome or a sequence of DNA is lost during DNA replication. Three types of signals, including discordant read-pairs, reads depth and split reads, are commonly used for SV detection from high-throughput sequence data. Many tools have been developed for detecting SVs by using one or multiple of these signals. RESULTS: In this paper, we develop a new method called EigenDel for detecting the germline submicroscopic genomic deletions. EigenDel first takes advantage of discordant read-pairs and clipped reads to get initial deletion candidates, and then it clusters similar candidates by using unsupervised learning methods. After that, EigenDel uses a carefully designed approach for calling true deletions from each cluster. We conduct various experiments to evaluate the performance of EigenDel on low coverage sequence data. CONCLUSIONS: Our results show that EigenDel outperforms other major methods in terms of improving capability of balancing accuracy and sensitivity as well as reducing bias. EigenDel can be downloaded from https://github.com/lxwgcool/EigenDel. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05139-w. |
format | Online Article Text |
id | pubmed-9881243 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-98812432023-01-28 Detecting genomic deletions from high-throughput sequence data with unsupervised learning Li, Xin Wu, Yufeng BMC Bioinformatics Methodology BACKGROUND: Structural variation (SV), which ranges from 50 bp to [Formula: see text] 3 Mb in size, is an important type of genetic variations. Deletion is a type of SV in which a part of a chromosome or a sequence of DNA is lost during DNA replication. Three types of signals, including discordant read-pairs, reads depth and split reads, are commonly used for SV detection from high-throughput sequence data. Many tools have been developed for detecting SVs by using one or multiple of these signals. RESULTS: In this paper, we develop a new method called EigenDel for detecting the germline submicroscopic genomic deletions. EigenDel first takes advantage of discordant read-pairs and clipped reads to get initial deletion candidates, and then it clusters similar candidates by using unsupervised learning methods. After that, EigenDel uses a carefully designed approach for calling true deletions from each cluster. We conduct various experiments to evaluate the performance of EigenDel on low coverage sequence data. CONCLUSIONS: Our results show that EigenDel outperforms other major methods in terms of improving capability of balancing accuracy and sensitivity as well as reducing bias. EigenDel can be downloaded from https://github.com/lxwgcool/EigenDel. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05139-w. BioMed Central 2023-01-27 /pmc/articles/PMC9881243/ /pubmed/36707775 http://dx.doi.org/10.1186/s12859-023-05139-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Methodology Li, Xin Wu, Yufeng Detecting genomic deletions from high-throughput sequence data with unsupervised learning |
title | Detecting genomic deletions from high-throughput sequence data with unsupervised learning |
title_full | Detecting genomic deletions from high-throughput sequence data with unsupervised learning |
title_fullStr | Detecting genomic deletions from high-throughput sequence data with unsupervised learning |
title_full_unstemmed | Detecting genomic deletions from high-throughput sequence data with unsupervised learning |
title_short | Detecting genomic deletions from high-throughput sequence data with unsupervised learning |
title_sort | detecting genomic deletions from high-throughput sequence data with unsupervised learning |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9881243/ https://www.ncbi.nlm.nih.gov/pubmed/36707775 http://dx.doi.org/10.1186/s12859-023-05139-w |
work_keys_str_mv | AT lixin detectinggenomicdeletionsfromhighthroughputsequencedatawithunsupervisedlearning AT wuyufeng detectinggenomicdeletionsfromhighthroughputsequencedatawithunsupervisedlearning |