Cargando…

CNV-P: a machine-learning framework for predicting high confident copy number variations

BACKGROUND: Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Taifu, Sun, Jinghua, Zhang, Xiuqing, Wang, Wen-Jing, Zhou, Qing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8645205/
https://www.ncbi.nlm.nih.gov/pubmed/34917425
http://dx.doi.org/10.7717/peerj.12564
_version_ 1784610259581534208
author Wang, Taifu
Sun, Jinghua
Zhang, Xiuqing
Wang, Wen-Jing
Zhou, Qing
author_facet Wang, Taifu
Sun, Jinghua
Zhang, Xiuqing
Wang, Wen-Jing
Zhou, Qing
author_sort Wang, Taifu
collection PubMed
description BACKGROUND: Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs further improvement. METHODS: Here, we proposed a novel and post-processing approach for CNVs prediction (CNV-P), a machine-learning framework that could efficiently remove false-positive fragments from results of CNVs detecting tools. A series of CNVs signals such as read depth (RD), split reads (SR) and read pair (RP) around the putative CNV fragments were defined as features to train a classifier. RESULTS: The prediction results on several real biological datasets showed that our models could accurately classify the CNVs at over 90% precision rate and 85% recall rate, which greatly improves the performance of state-of-the-art algorithms. Furthermore, our results indicate that CNV-P is robust to different sizes of CNVs and the platforms of sequencing. CONCLUSIONS: Our framework for classifying high-confident CNVs could improve both basic research and clinical diagnosis of genetic diseases.
format Online
Article
Text
id pubmed-8645205
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-86452052021-12-15 CNV-P: a machine-learning framework for predicting high confident copy number variations Wang, Taifu Sun, Jinghua Zhang, Xiuqing Wang, Wen-Jing Zhou, Qing PeerJ Bioinformatics BACKGROUND: Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs further improvement. METHODS: Here, we proposed a novel and post-processing approach for CNVs prediction (CNV-P), a machine-learning framework that could efficiently remove false-positive fragments from results of CNVs detecting tools. A series of CNVs signals such as read depth (RD), split reads (SR) and read pair (RP) around the putative CNV fragments were defined as features to train a classifier. RESULTS: The prediction results on several real biological datasets showed that our models could accurately classify the CNVs at over 90% precision rate and 85% recall rate, which greatly improves the performance of state-of-the-art algorithms. Furthermore, our results indicate that CNV-P is robust to different sizes of CNVs and the platforms of sequencing. CONCLUSIONS: Our framework for classifying high-confident CNVs could improve both basic research and clinical diagnosis of genetic diseases. PeerJ Inc. 2021-12-02 /pmc/articles/PMC8645205/ /pubmed/34917425 http://dx.doi.org/10.7717/peerj.12564 Text en © 2021 Wang et al. https://creativecommons.org/licenses/by-nc/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by-nc/4.0/) , which permits using, remixing, and building upon the work non-commercially, as long as it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Wang, Taifu
Sun, Jinghua
Zhang, Xiuqing
Wang, Wen-Jing
Zhou, Qing
CNV-P: a machine-learning framework for predicting high confident copy number variations
title CNV-P: a machine-learning framework for predicting high confident copy number variations
title_full CNV-P: a machine-learning framework for predicting high confident copy number variations
title_fullStr CNV-P: a machine-learning framework for predicting high confident copy number variations
title_full_unstemmed CNV-P: a machine-learning framework for predicting high confident copy number variations
title_short CNV-P: a machine-learning framework for predicting high confident copy number variations
title_sort cnv-p: a machine-learning framework for predicting high confident copy number variations
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8645205/
https://www.ncbi.nlm.nih.gov/pubmed/34917425
http://dx.doi.org/10.7717/peerj.12564
work_keys_str_mv AT wangtaifu cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations
AT sunjinghua cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations
AT zhangxiuqing cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations
AT wangwenjing cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations
AT zhouqing cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations