Cargando…
CNV-P: a machine-learning framework for predicting high confident copy number variations
BACKGROUND: Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8645205/ https://www.ncbi.nlm.nih.gov/pubmed/34917425 http://dx.doi.org/10.7717/peerj.12564 |
_version_ | 1784610259581534208 |
---|---|
author | Wang, Taifu Sun, Jinghua Zhang, Xiuqing Wang, Wen-Jing Zhou, Qing |
author_facet | Wang, Taifu Sun, Jinghua Zhang, Xiuqing Wang, Wen-Jing Zhou, Qing |
author_sort | Wang, Taifu |
collection | PubMed |
description | BACKGROUND: Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs further improvement. METHODS: Here, we proposed a novel and post-processing approach for CNVs prediction (CNV-P), a machine-learning framework that could efficiently remove false-positive fragments from results of CNVs detecting tools. A series of CNVs signals such as read depth (RD), split reads (SR) and read pair (RP) around the putative CNV fragments were defined as features to train a classifier. RESULTS: The prediction results on several real biological datasets showed that our models could accurately classify the CNVs at over 90% precision rate and 85% recall rate, which greatly improves the performance of state-of-the-art algorithms. Furthermore, our results indicate that CNV-P is robust to different sizes of CNVs and the platforms of sequencing. CONCLUSIONS: Our framework for classifying high-confident CNVs could improve both basic research and clinical diagnosis of genetic diseases. |
format | Online Article Text |
id | pubmed-8645205 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-86452052021-12-15 CNV-P: a machine-learning framework for predicting high confident copy number variations Wang, Taifu Sun, Jinghua Zhang, Xiuqing Wang, Wen-Jing Zhou, Qing PeerJ Bioinformatics BACKGROUND: Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs further improvement. METHODS: Here, we proposed a novel and post-processing approach for CNVs prediction (CNV-P), a machine-learning framework that could efficiently remove false-positive fragments from results of CNVs detecting tools. A series of CNVs signals such as read depth (RD), split reads (SR) and read pair (RP) around the putative CNV fragments were defined as features to train a classifier. RESULTS: The prediction results on several real biological datasets showed that our models could accurately classify the CNVs at over 90% precision rate and 85% recall rate, which greatly improves the performance of state-of-the-art algorithms. Furthermore, our results indicate that CNV-P is robust to different sizes of CNVs and the platforms of sequencing. CONCLUSIONS: Our framework for classifying high-confident CNVs could improve both basic research and clinical diagnosis of genetic diseases. PeerJ Inc. 2021-12-02 /pmc/articles/PMC8645205/ /pubmed/34917425 http://dx.doi.org/10.7717/peerj.12564 Text en © 2021 Wang et al. https://creativecommons.org/licenses/by-nc/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by-nc/4.0/) , which permits using, remixing, and building upon the work non-commercially, as long as it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Wang, Taifu Sun, Jinghua Zhang, Xiuqing Wang, Wen-Jing Zhou, Qing CNV-P: a machine-learning framework for predicting high confident copy number variations |
title | CNV-P: a machine-learning framework for predicting high confident copy number variations |
title_full | CNV-P: a machine-learning framework for predicting high confident copy number variations |
title_fullStr | CNV-P: a machine-learning framework for predicting high confident copy number variations |
title_full_unstemmed | CNV-P: a machine-learning framework for predicting high confident copy number variations |
title_short | CNV-P: a machine-learning framework for predicting high confident copy number variations |
title_sort | cnv-p: a machine-learning framework for predicting high confident copy number variations |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8645205/ https://www.ncbi.nlm.nih.gov/pubmed/34917425 http://dx.doi.org/10.7717/peerj.12564 |
work_keys_str_mv | AT wangtaifu cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations AT sunjinghua cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations AT zhangxiuqing cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations AT wangwenjing cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations AT zhouqing cnvpamachinelearningframeworkforpredictinghighconfidentcopynumbervariations |