Cargando…

LSnet: detecting and genotyping deletions using deep learning network

The role and biological impact of structural variation (SV) are increasingly evident. Deletion accounts for 40% of SV and is an important type of SV. Therefore, it is of great significance to detect and genotype deletions. At present, high accurate long reads can be obtained as HiFi reads. And, thro...

Descripción completa

Detalles Bibliográficos
Autores principales: Luo, Junwei, Gao, Runtian, Chang, Wenjing, Wang, Junfeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10301831/
https://www.ncbi.nlm.nih.gov/pubmed/37388936
http://dx.doi.org/10.3389/fgene.2023.1189775
_version_ 1785064905576022016
author Luo, Junwei
Gao, Runtian
Chang, Wenjing
Wang, Junfeng
author_facet Luo, Junwei
Gao, Runtian
Chang, Wenjing
Wang, Junfeng
author_sort Luo, Junwei
collection PubMed
description The role and biological impact of structural variation (SV) are increasingly evident. Deletion accounts for 40% of SV and is an important type of SV. Therefore, it is of great significance to detect and genotype deletions. At present, high accurate long reads can be obtained as HiFi reads. And, through a combination of error-prone long reads and high accurate short reads, we can also get accurate long reads. These accurate long reads are helpful for detecting and genotyping SVs. However, due to the complexity of genome and alignment information, detecting and genotyping SVs remain a challenging task. Here, we propose LSnet, an approach for detecting and genotyping deletions with a deep learning network. Because of the ability of deep learning to learn complex features in labeled datasets, it is beneficial for detecting SV. First, LSnet divides the reference genome into continuous sub-regions. Based on the alignment between the sequencing data (the combination of error-prone long reads and short reads or HiFi reads) and the reference genome, LSnet extracts nine features for each sub-region, and these features are considered as signal of deletion. Second, LSnet uses a convolutional neural network and an attention mechanism to learn critical features in every sub-region. Next, in accordance with the relationship among the continuous sub-regions, LSnet uses a gated recurrent units (GRU) network to further extract more important deletion signatures. And a heuristic algorithm is present to determine the location and length of deletions. Experimental results show that LSnet outperforms other methods in terms of the F1 score. The source code is available from GitHub at https://github.com/eioyuou/LSnet.
format Online
Article
Text
id pubmed-10301831
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-103018312023-06-29 LSnet: detecting and genotyping deletions using deep learning network Luo, Junwei Gao, Runtian Chang, Wenjing Wang, Junfeng Front Genet Genetics The role and biological impact of structural variation (SV) are increasingly evident. Deletion accounts for 40% of SV and is an important type of SV. Therefore, it is of great significance to detect and genotype deletions. At present, high accurate long reads can be obtained as HiFi reads. And, through a combination of error-prone long reads and high accurate short reads, we can also get accurate long reads. These accurate long reads are helpful for detecting and genotyping SVs. However, due to the complexity of genome and alignment information, detecting and genotyping SVs remain a challenging task. Here, we propose LSnet, an approach for detecting and genotyping deletions with a deep learning network. Because of the ability of deep learning to learn complex features in labeled datasets, it is beneficial for detecting SV. First, LSnet divides the reference genome into continuous sub-regions. Based on the alignment between the sequencing data (the combination of error-prone long reads and short reads or HiFi reads) and the reference genome, LSnet extracts nine features for each sub-region, and these features are considered as signal of deletion. Second, LSnet uses a convolutional neural network and an attention mechanism to learn critical features in every sub-region. Next, in accordance with the relationship among the continuous sub-regions, LSnet uses a gated recurrent units (GRU) network to further extract more important deletion signatures. And a heuristic algorithm is present to determine the location and length of deletions. Experimental results show that LSnet outperforms other methods in terms of the F1 score. The source code is available from GitHub at https://github.com/eioyuou/LSnet. Frontiers Media S.A. 2023-06-14 /pmc/articles/PMC10301831/ /pubmed/37388936 http://dx.doi.org/10.3389/fgene.2023.1189775 Text en Copyright © 2023 Luo, Gao, Chang and Wang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Luo, Junwei
Gao, Runtian
Chang, Wenjing
Wang, Junfeng
LSnet: detecting and genotyping deletions using deep learning network
title LSnet: detecting and genotyping deletions using deep learning network
title_full LSnet: detecting and genotyping deletions using deep learning network
title_fullStr LSnet: detecting and genotyping deletions using deep learning network
title_full_unstemmed LSnet: detecting and genotyping deletions using deep learning network
title_short LSnet: detecting and genotyping deletions using deep learning network
title_sort lsnet: detecting and genotyping deletions using deep learning network
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10301831/
https://www.ncbi.nlm.nih.gov/pubmed/37388936
http://dx.doi.org/10.3389/fgene.2023.1189775
work_keys_str_mv AT luojunwei lsnetdetectingandgenotypingdeletionsusingdeeplearningnetwork
AT gaoruntian lsnetdetectingandgenotypingdeletionsusingdeeplearningnetwork
AT changwenjing lsnetdetectingandgenotypingdeletionsusingdeeplearningnetwork
AT wangjunfeng lsnetdetectingandgenotypingdeletionsusingdeeplearningnetwork