Cargando…

PEcnv: accurate and efficient detection of copy number variations of various lengths

Copy number variation (CNV) is a class of key biomarkers in many complex traits and diseases. Detecting CNV from sequencing data is a substantial bioinformatics problem and a standard requirement in clinical practice. Although many proposed CNV detection approaches exist, the core statistical model...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Xuwen, Xu, Ying, Liu, Ruoyu, Lai, Xin, Liu, Yuqian, Wang, Shenjie, Zhang, Xuanping, Wang, Jiayin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9487654/
https://www.ncbi.nlm.nih.gov/pubmed/36056740
http://dx.doi.org/10.1093/bib/bbac375
_version_ 1784792498744328192
author Wang, Xuwen
Xu, Ying
Liu, Ruoyu
Lai, Xin
Liu, Yuqian
Wang, Shenjie
Zhang, Xuanping
Wang, Jiayin
author_facet Wang, Xuwen
Xu, Ying
Liu, Ruoyu
Lai, Xin
Liu, Yuqian
Wang, Shenjie
Zhang, Xuanping
Wang, Jiayin
author_sort Wang, Xuwen
collection PubMed
description Copy number variation (CNV) is a class of key biomarkers in many complex traits and diseases. Detecting CNV from sequencing data is a substantial bioinformatics problem and a standard requirement in clinical practice. Although many proposed CNV detection approaches exist, the core statistical model at their foundation is weakened by two critical computational issues: (i) identifying the optimal setting on the sliding window and (ii) correcting for bias and noise. We designed a statistical process model to overcome these limitations by calculating regional read depths via an exponentially weighted moving average strategy. A one-run detection of CNVs of various lengths is then achieved by a dynamic sliding window, whose size is self-adopted according to the weighted averages. We also designed a novel bias/noise reduction model, accompanied by the moving average, which can handle complicated patterns and extend training data. This model, called PEcnv, accurately detects CNVs ranging from kb-scale to chromosome-arm level. The model performance was validated with simulation samples and real samples. Comparative analysis showed that PEcnv outperforms current popular approaches. Notably, PEcnv provided considerable advantages in detecting small CNVs (1 kb–1 Mb) in panel sequencing data. Thus, PEcnv fills the gap left by existing methods focusing on large CNVs. PEcnv may have broad applications in clinical testing where panel sequencing is the dominant strategy. Availability and implementation: Source code is freely available at https://github.com/Sherwin-xjtu/PEcnv
format Online
Article
Text
id pubmed-9487654
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-94876542022-09-21 PEcnv: accurate and efficient detection of copy number variations of various lengths Wang, Xuwen Xu, Ying Liu, Ruoyu Lai, Xin Liu, Yuqian Wang, Shenjie Zhang, Xuanping Wang, Jiayin Brief Bioinform Problem Solving Protocol Copy number variation (CNV) is a class of key biomarkers in many complex traits and diseases. Detecting CNV from sequencing data is a substantial bioinformatics problem and a standard requirement in clinical practice. Although many proposed CNV detection approaches exist, the core statistical model at their foundation is weakened by two critical computational issues: (i) identifying the optimal setting on the sliding window and (ii) correcting for bias and noise. We designed a statistical process model to overcome these limitations by calculating regional read depths via an exponentially weighted moving average strategy. A one-run detection of CNVs of various lengths is then achieved by a dynamic sliding window, whose size is self-adopted according to the weighted averages. We also designed a novel bias/noise reduction model, accompanied by the moving average, which can handle complicated patterns and extend training data. This model, called PEcnv, accurately detects CNVs ranging from kb-scale to chromosome-arm level. The model performance was validated with simulation samples and real samples. Comparative analysis showed that PEcnv outperforms current popular approaches. Notably, PEcnv provided considerable advantages in detecting small CNVs (1 kb–1 Mb) in panel sequencing data. Thus, PEcnv fills the gap left by existing methods focusing on large CNVs. PEcnv may have broad applications in clinical testing where panel sequencing is the dominant strategy. Availability and implementation: Source code is freely available at https://github.com/Sherwin-xjtu/PEcnv Oxford University Press 2022-09-02 /pmc/articles/PMC9487654/ /pubmed/36056740 http://dx.doi.org/10.1093/bib/bbac375 Text en © The Author(s) 2022. Published by Oxford University Press https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Problem Solving Protocol
Wang, Xuwen
Xu, Ying
Liu, Ruoyu
Lai, Xin
Liu, Yuqian
Wang, Shenjie
Zhang, Xuanping
Wang, Jiayin
PEcnv: accurate and efficient detection of copy number variations of various lengths
title PEcnv: accurate and efficient detection of copy number variations of various lengths
title_full PEcnv: accurate and efficient detection of copy number variations of various lengths
title_fullStr PEcnv: accurate and efficient detection of copy number variations of various lengths
title_full_unstemmed PEcnv: accurate and efficient detection of copy number variations of various lengths
title_short PEcnv: accurate and efficient detection of copy number variations of various lengths
title_sort pecnv: accurate and efficient detection of copy number variations of various lengths
topic Problem Solving Protocol
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9487654/
https://www.ncbi.nlm.nih.gov/pubmed/36056740
http://dx.doi.org/10.1093/bib/bbac375
work_keys_str_mv AT wangxuwen pecnvaccurateandefficientdetectionofcopynumbervariationsofvariouslengths
AT xuying pecnvaccurateandefficientdetectionofcopynumbervariationsofvariouslengths
AT liuruoyu pecnvaccurateandefficientdetectionofcopynumbervariationsofvariouslengths
AT laixin pecnvaccurateandefficientdetectionofcopynumbervariationsofvariouslengths
AT liuyuqian pecnvaccurateandefficientdetectionofcopynumbervariationsofvariouslengths
AT wangshenjie pecnvaccurateandefficientdetectionofcopynumbervariationsofvariouslengths
AT zhangxuanping pecnvaccurateandefficientdetectionofcopynumbervariationsofvariouslengths
AT wangjiayin pecnvaccurateandefficientdetectionofcopynumbervariationsofvariouslengths