Cargando…

cnnLSV: detecting structural variants by encoding long-read alignment information and convolutional neural network

BACKGROUND: Genomic structural variant detection is a significant and challenging issue in genome analysis. The existing long-read based structural variant detection methods still have space for improvement in detecting multi-type structural variants. RESULTS: In this paper, we propose a method call...

Descripción completa

Detalles Bibliográficos
Autores principales: Ma, Huidong, Zhong, Cheng, Chen, Danyang, He, Haofa, Yang, Feng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10045035/
https://www.ncbi.nlm.nih.gov/pubmed/36977976
http://dx.doi.org/10.1186/s12859-023-05243-x
_version_ 1784913495946428416
author Ma, Huidong
Zhong, Cheng
Chen, Danyang
He, Haofa
Yang, Feng
author_facet Ma, Huidong
Zhong, Cheng
Chen, Danyang
He, Haofa
Yang, Feng
author_sort Ma, Huidong
collection PubMed
description BACKGROUND: Genomic structural variant detection is a significant and challenging issue in genome analysis. The existing long-read based structural variant detection methods still have space for improvement in detecting multi-type structural variants. RESULTS: In this paper, we propose a method called cnnLSV to obtain detection results with higher quality by eliminating false positives in the detection results merged from the callsets of existing methods. We design an encoding strategy for four types of structural variants to represent long-read alignment information around structural variants into images, input the images into a constructed convolutional neural network to train a filter model, and load the trained model to remove the false positives to improve the detection performance. We also eliminate mislabeled training samples in the training model phase by using principal component analysis algorithm and unsupervised clustering algorithm k-means. Experimental results on both simulated and real datasets show that our proposed method outperforms existing methods overall in detecting insertions, deletions, inversions, and duplications. The program of cnnLSV is available at https://github.com/mhuidong/cnnLSV. CONCLUSIONS: The proposed cnnLSV can detect structural variants by using long-read alignment information and convolutional neural network to achieve overall higher performance, and effectively eliminate incorrectly labeled samples by using the principal component analysis and k-means algorithms in training model stage. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05243-x.
format Online
Article
Text
id pubmed-10045035
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-100450352023-03-29 cnnLSV: detecting structural variants by encoding long-read alignment information and convolutional neural network Ma, Huidong Zhong, Cheng Chen, Danyang He, Haofa Yang, Feng BMC Bioinformatics Research BACKGROUND: Genomic structural variant detection is a significant and challenging issue in genome analysis. The existing long-read based structural variant detection methods still have space for improvement in detecting multi-type structural variants. RESULTS: In this paper, we propose a method called cnnLSV to obtain detection results with higher quality by eliminating false positives in the detection results merged from the callsets of existing methods. We design an encoding strategy for four types of structural variants to represent long-read alignment information around structural variants into images, input the images into a constructed convolutional neural network to train a filter model, and load the trained model to remove the false positives to improve the detection performance. We also eliminate mislabeled training samples in the training model phase by using principal component analysis algorithm and unsupervised clustering algorithm k-means. Experimental results on both simulated and real datasets show that our proposed method outperforms existing methods overall in detecting insertions, deletions, inversions, and duplications. The program of cnnLSV is available at https://github.com/mhuidong/cnnLSV. CONCLUSIONS: The proposed cnnLSV can detect structural variants by using long-read alignment information and convolutional neural network to achieve overall higher performance, and effectively eliminate incorrectly labeled samples by using the principal component analysis and k-means algorithms in training model stage. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05243-x. BioMed Central 2023-03-28 /pmc/articles/PMC10045035/ /pubmed/36977976 http://dx.doi.org/10.1186/s12859-023-05243-x Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Ma, Huidong
Zhong, Cheng
Chen, Danyang
He, Haofa
Yang, Feng
cnnLSV: detecting structural variants by encoding long-read alignment information and convolutional neural network
title cnnLSV: detecting structural variants by encoding long-read alignment information and convolutional neural network
title_full cnnLSV: detecting structural variants by encoding long-read alignment information and convolutional neural network
title_fullStr cnnLSV: detecting structural variants by encoding long-read alignment information and convolutional neural network
title_full_unstemmed cnnLSV: detecting structural variants by encoding long-read alignment information and convolutional neural network
title_short cnnLSV: detecting structural variants by encoding long-read alignment information and convolutional neural network
title_sort cnnlsv: detecting structural variants by encoding long-read alignment information and convolutional neural network
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10045035/
https://www.ncbi.nlm.nih.gov/pubmed/36977976
http://dx.doi.org/10.1186/s12859-023-05243-x
work_keys_str_mv AT mahuidong cnnlsvdetectingstructuralvariantsbyencodinglongreadalignmentinformationandconvolutionalneuralnetwork
AT zhongcheng cnnlsvdetectingstructuralvariantsbyencodinglongreadalignmentinformationandconvolutionalneuralnetwork
AT chendanyang cnnlsvdetectingstructuralvariantsbyencodinglongreadalignmentinformationandconvolutionalneuralnetwork
AT hehaofa cnnlsvdetectingstructuralvariantsbyencodinglongreadalignmentinformationandconvolutionalneuralnetwork
AT yangfeng cnnlsvdetectingstructuralvariantsbyencodinglongreadalignmentinformationandconvolutionalneuralnetwork