Cargando…
PEcnv: accurate and efficient detection of copy number variations of various lengths
Copy number variation (CNV) is a class of key biomarkers in many complex traits and diseases. Detecting CNV from sequencing data is a substantial bioinformatics problem and a standard requirement in clinical practice. Although many proposed CNV detection approaches exist, the core statistical model...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9487654/ https://www.ncbi.nlm.nih.gov/pubmed/36056740 http://dx.doi.org/10.1093/bib/bbac375 |
_version_ | 1784792498744328192 |
---|---|
author | Wang, Xuwen Xu, Ying Liu, Ruoyu Lai, Xin Liu, Yuqian Wang, Shenjie Zhang, Xuanping Wang, Jiayin |
author_facet | Wang, Xuwen Xu, Ying Liu, Ruoyu Lai, Xin Liu, Yuqian Wang, Shenjie Zhang, Xuanping Wang, Jiayin |
author_sort | Wang, Xuwen |
collection | PubMed |
description | Copy number variation (CNV) is a class of key biomarkers in many complex traits and diseases. Detecting CNV from sequencing data is a substantial bioinformatics problem and a standard requirement in clinical practice. Although many proposed CNV detection approaches exist, the core statistical model at their foundation is weakened by two critical computational issues: (i) identifying the optimal setting on the sliding window and (ii) correcting for bias and noise. We designed a statistical process model to overcome these limitations by calculating regional read depths via an exponentially weighted moving average strategy. A one-run detection of CNVs of various lengths is then achieved by a dynamic sliding window, whose size is self-adopted according to the weighted averages. We also designed a novel bias/noise reduction model, accompanied by the moving average, which can handle complicated patterns and extend training data. This model, called PEcnv, accurately detects CNVs ranging from kb-scale to chromosome-arm level. The model performance was validated with simulation samples and real samples. Comparative analysis showed that PEcnv outperforms current popular approaches. Notably, PEcnv provided considerable advantages in detecting small CNVs (1 kb–1 Mb) in panel sequencing data. Thus, PEcnv fills the gap left by existing methods focusing on large CNVs. PEcnv may have broad applications in clinical testing where panel sequencing is the dominant strategy. Availability and implementation: Source code is freely available at https://github.com/Sherwin-xjtu/PEcnv |
format | Online Article Text |
id | pubmed-9487654 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-94876542022-09-21 PEcnv: accurate and efficient detection of copy number variations of various lengths Wang, Xuwen Xu, Ying Liu, Ruoyu Lai, Xin Liu, Yuqian Wang, Shenjie Zhang, Xuanping Wang, Jiayin Brief Bioinform Problem Solving Protocol Copy number variation (CNV) is a class of key biomarkers in many complex traits and diseases. Detecting CNV from sequencing data is a substantial bioinformatics problem and a standard requirement in clinical practice. Although many proposed CNV detection approaches exist, the core statistical model at their foundation is weakened by two critical computational issues: (i) identifying the optimal setting on the sliding window and (ii) correcting for bias and noise. We designed a statistical process model to overcome these limitations by calculating regional read depths via an exponentially weighted moving average strategy. A one-run detection of CNVs of various lengths is then achieved by a dynamic sliding window, whose size is self-adopted according to the weighted averages. We also designed a novel bias/noise reduction model, accompanied by the moving average, which can handle complicated patterns and extend training data. This model, called PEcnv, accurately detects CNVs ranging from kb-scale to chromosome-arm level. The model performance was validated with simulation samples and real samples. Comparative analysis showed that PEcnv outperforms current popular approaches. Notably, PEcnv provided considerable advantages in detecting small CNVs (1 kb–1 Mb) in panel sequencing data. Thus, PEcnv fills the gap left by existing methods focusing on large CNVs. PEcnv may have broad applications in clinical testing where panel sequencing is the dominant strategy. Availability and implementation: Source code is freely available at https://github.com/Sherwin-xjtu/PEcnv Oxford University Press 2022-09-02 /pmc/articles/PMC9487654/ /pubmed/36056740 http://dx.doi.org/10.1093/bib/bbac375 Text en © The Author(s) 2022. Published by Oxford University Press https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Problem Solving Protocol Wang, Xuwen Xu, Ying Liu, Ruoyu Lai, Xin Liu, Yuqian Wang, Shenjie Zhang, Xuanping Wang, Jiayin PEcnv: accurate and efficient detection of copy number variations of various lengths |
title | PEcnv: accurate and efficient detection of copy number variations of various lengths |
title_full | PEcnv: accurate and efficient detection of copy number variations of various lengths |
title_fullStr | PEcnv: accurate and efficient detection of copy number variations of various lengths |
title_full_unstemmed | PEcnv: accurate and efficient detection of copy number variations of various lengths |
title_short | PEcnv: accurate and efficient detection of copy number variations of various lengths |
title_sort | pecnv: accurate and efficient detection of copy number variations of various lengths |
topic | Problem Solving Protocol |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9487654/ https://www.ncbi.nlm.nih.gov/pubmed/36056740 http://dx.doi.org/10.1093/bib/bbac375 |
work_keys_str_mv | AT wangxuwen pecnvaccurateandefficientdetectionofcopynumbervariationsofvariouslengths AT xuying pecnvaccurateandefficientdetectionofcopynumbervariationsofvariouslengths AT liuruoyu pecnvaccurateandefficientdetectionofcopynumbervariationsofvariouslengths AT laixin pecnvaccurateandefficientdetectionofcopynumbervariationsofvariouslengths AT liuyuqian pecnvaccurateandefficientdetectionofcopynumbervariationsofvariouslengths AT wangshenjie pecnvaccurateandefficientdetectionofcopynumbervariationsofvariouslengths AT zhangxuanping pecnvaccurateandefficientdetectionofcopynumbervariationsofvariouslengths AT wangjiayin pecnvaccurateandefficientdetectionofcopynumbervariationsofvariouslengths |