Cargando…
LSnet: detecting and genotyping deletions using deep learning network
The role and biological impact of structural variation (SV) are increasingly evident. Deletion accounts for 40% of SV and is an important type of SV. Therefore, it is of great significance to detect and genotype deletions. At present, high accurate long reads can be obtained as HiFi reads. And, thro...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10301831/ https://www.ncbi.nlm.nih.gov/pubmed/37388936 http://dx.doi.org/10.3389/fgene.2023.1189775 |
_version_ | 1785064905576022016 |
---|---|
author | Luo, Junwei Gao, Runtian Chang, Wenjing Wang, Junfeng |
author_facet | Luo, Junwei Gao, Runtian Chang, Wenjing Wang, Junfeng |
author_sort | Luo, Junwei |
collection | PubMed |
description | The role and biological impact of structural variation (SV) are increasingly evident. Deletion accounts for 40% of SV and is an important type of SV. Therefore, it is of great significance to detect and genotype deletions. At present, high accurate long reads can be obtained as HiFi reads. And, through a combination of error-prone long reads and high accurate short reads, we can also get accurate long reads. These accurate long reads are helpful for detecting and genotyping SVs. However, due to the complexity of genome and alignment information, detecting and genotyping SVs remain a challenging task. Here, we propose LSnet, an approach for detecting and genotyping deletions with a deep learning network. Because of the ability of deep learning to learn complex features in labeled datasets, it is beneficial for detecting SV. First, LSnet divides the reference genome into continuous sub-regions. Based on the alignment between the sequencing data (the combination of error-prone long reads and short reads or HiFi reads) and the reference genome, LSnet extracts nine features for each sub-region, and these features are considered as signal of deletion. Second, LSnet uses a convolutional neural network and an attention mechanism to learn critical features in every sub-region. Next, in accordance with the relationship among the continuous sub-regions, LSnet uses a gated recurrent units (GRU) network to further extract more important deletion signatures. And a heuristic algorithm is present to determine the location and length of deletions. Experimental results show that LSnet outperforms other methods in terms of the F1 score. The source code is available from GitHub at https://github.com/eioyuou/LSnet. |
format | Online Article Text |
id | pubmed-10301831 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-103018312023-06-29 LSnet: detecting and genotyping deletions using deep learning network Luo, Junwei Gao, Runtian Chang, Wenjing Wang, Junfeng Front Genet Genetics The role and biological impact of structural variation (SV) are increasingly evident. Deletion accounts for 40% of SV and is an important type of SV. Therefore, it is of great significance to detect and genotype deletions. At present, high accurate long reads can be obtained as HiFi reads. And, through a combination of error-prone long reads and high accurate short reads, we can also get accurate long reads. These accurate long reads are helpful for detecting and genotyping SVs. However, due to the complexity of genome and alignment information, detecting and genotyping SVs remain a challenging task. Here, we propose LSnet, an approach for detecting and genotyping deletions with a deep learning network. Because of the ability of deep learning to learn complex features in labeled datasets, it is beneficial for detecting SV. First, LSnet divides the reference genome into continuous sub-regions. Based on the alignment between the sequencing data (the combination of error-prone long reads and short reads or HiFi reads) and the reference genome, LSnet extracts nine features for each sub-region, and these features are considered as signal of deletion. Second, LSnet uses a convolutional neural network and an attention mechanism to learn critical features in every sub-region. Next, in accordance with the relationship among the continuous sub-regions, LSnet uses a gated recurrent units (GRU) network to further extract more important deletion signatures. And a heuristic algorithm is present to determine the location and length of deletions. Experimental results show that LSnet outperforms other methods in terms of the F1 score. The source code is available from GitHub at https://github.com/eioyuou/LSnet. Frontiers Media S.A. 2023-06-14 /pmc/articles/PMC10301831/ /pubmed/37388936 http://dx.doi.org/10.3389/fgene.2023.1189775 Text en Copyright © 2023 Luo, Gao, Chang and Wang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Luo, Junwei Gao, Runtian Chang, Wenjing Wang, Junfeng LSnet: detecting and genotyping deletions using deep learning network |
title | LSnet: detecting and genotyping deletions using deep learning network |
title_full | LSnet: detecting and genotyping deletions using deep learning network |
title_fullStr | LSnet: detecting and genotyping deletions using deep learning network |
title_full_unstemmed | LSnet: detecting and genotyping deletions using deep learning network |
title_short | LSnet: detecting and genotyping deletions using deep learning network |
title_sort | lsnet: detecting and genotyping deletions using deep learning network |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10301831/ https://www.ncbi.nlm.nih.gov/pubmed/37388936 http://dx.doi.org/10.3389/fgene.2023.1189775 |
work_keys_str_mv | AT luojunwei lsnetdetectingandgenotypingdeletionsusingdeeplearningnetwork AT gaoruntian lsnetdetectingandgenotypingdeletionsusingdeeplearningnetwork AT changwenjing lsnetdetectingandgenotypingdeletionsusingdeeplearningnetwork AT wangjunfeng lsnetdetectingandgenotypingdeletionsusingdeeplearningnetwork |