Cargando…

CNV-PCC: An efficient method for detecting copy number variations from next-generation sequencing data

Copy number variations (CNVs) significantly influence the diversity of the human genome and the occurrence of many complex diseases. The next-generation sequencing (NGS) technology provides rich data for detecting CNVs, and the read depth (RD)-based approach is widely used. However, low CN (copy num...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Tong, Dong, Jinxin, Jiang, Hua, Zhao, Zuyao, Zhou, Mengjiao, Yuan, Tianting
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9751350/
https://www.ncbi.nlm.nih.gov/pubmed/36532569
http://dx.doi.org/10.3389/fbioe.2022.1000638
_version_ 1784850451373490176
author Zhang, Tong
Dong, Jinxin
Jiang, Hua
Zhao, Zuyao
Zhou, Mengjiao
Yuan, Tianting
author_facet Zhang, Tong
Dong, Jinxin
Jiang, Hua
Zhao, Zuyao
Zhou, Mengjiao
Yuan, Tianting
author_sort Zhang, Tong
collection PubMed
description Copy number variations (CNVs) significantly influence the diversity of the human genome and the occurrence of many complex diseases. The next-generation sequencing (NGS) technology provides rich data for detecting CNVs, and the read depth (RD)-based approach is widely used. However, low CN (copy number of 3–4) duplication events are challenging to identify with existing methods, especially when the size of CNVs is small. In addition, the RD-based approach can only obtain rough breakpoints. We propose a new method, CNV-PCC (detection of CNVs based on Principal Component Classifier), to identify CNVs in whole genome sequencing data. CNV-PPC first uses the split read signal to search for potential breakpoints. A two-stage segmentation strategy is then implemented to enhance the identification capabilities of low CN duplications and small CNVs. Next, the outlier scores are calculated for each segment by PCC (Principal Component Classifier). Finally, the OTSU algorithm calculates the threshold to determine the CNVs regions. The analysis of simulated data results indicates that CNV-PCC outperforms the other methods for sensitivity and F1-score and improves breakpoint accuracy. Furthermore, CNV-PCC shows high consistency on real sequencing samples with other methods. This study demonstrates that CNV-PCC is an effective method for detecting CNVs, even for low CN duplications and small CNVs.
format Online
Article
Text
id pubmed-9751350
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-97513502022-12-16 CNV-PCC: An efficient method for detecting copy number variations from next-generation sequencing data Zhang, Tong Dong, Jinxin Jiang, Hua Zhao, Zuyao Zhou, Mengjiao Yuan, Tianting Front Bioeng Biotechnol Bioengineering and Biotechnology Copy number variations (CNVs) significantly influence the diversity of the human genome and the occurrence of many complex diseases. The next-generation sequencing (NGS) technology provides rich data for detecting CNVs, and the read depth (RD)-based approach is widely used. However, low CN (copy number of 3–4) duplication events are challenging to identify with existing methods, especially when the size of CNVs is small. In addition, the RD-based approach can only obtain rough breakpoints. We propose a new method, CNV-PCC (detection of CNVs based on Principal Component Classifier), to identify CNVs in whole genome sequencing data. CNV-PPC first uses the split read signal to search for potential breakpoints. A two-stage segmentation strategy is then implemented to enhance the identification capabilities of low CN duplications and small CNVs. Next, the outlier scores are calculated for each segment by PCC (Principal Component Classifier). Finally, the OTSU algorithm calculates the threshold to determine the CNVs regions. The analysis of simulated data results indicates that CNV-PCC outperforms the other methods for sensitivity and F1-score and improves breakpoint accuracy. Furthermore, CNV-PCC shows high consistency on real sequencing samples with other methods. This study demonstrates that CNV-PCC is an effective method for detecting CNVs, even for low CN duplications and small CNVs. Frontiers Media S.A. 2022-12-01 /pmc/articles/PMC9751350/ /pubmed/36532569 http://dx.doi.org/10.3389/fbioe.2022.1000638 Text en Copyright © 2022 Zhang, Dong, Jiang, Zhao, Zhou and Yuan. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Bioengineering and Biotechnology
Zhang, Tong
Dong, Jinxin
Jiang, Hua
Zhao, Zuyao
Zhou, Mengjiao
Yuan, Tianting
CNV-PCC: An efficient method for detecting copy number variations from next-generation sequencing data
title CNV-PCC: An efficient method for detecting copy number variations from next-generation sequencing data
title_full CNV-PCC: An efficient method for detecting copy number variations from next-generation sequencing data
title_fullStr CNV-PCC: An efficient method for detecting copy number variations from next-generation sequencing data
title_full_unstemmed CNV-PCC: An efficient method for detecting copy number variations from next-generation sequencing data
title_short CNV-PCC: An efficient method for detecting copy number variations from next-generation sequencing data
title_sort cnv-pcc: an efficient method for detecting copy number variations from next-generation sequencing data
topic Bioengineering and Biotechnology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9751350/
https://www.ncbi.nlm.nih.gov/pubmed/36532569
http://dx.doi.org/10.3389/fbioe.2022.1000638
work_keys_str_mv AT zhangtong cnvpccanefficientmethodfordetectingcopynumbervariationsfromnextgenerationsequencingdata
AT dongjinxin cnvpccanefficientmethodfordetectingcopynumbervariationsfromnextgenerationsequencingdata
AT jianghua cnvpccanefficientmethodfordetectingcopynumbervariationsfromnextgenerationsequencingdata
AT zhaozuyao cnvpccanefficientmethodfordetectingcopynumbervariationsfromnextgenerationsequencingdata
AT zhoumengjiao cnvpccanefficientmethodfordetectingcopynumbervariationsfromnextgenerationsequencingdata
AT yuantianting cnvpccanefficientmethodfordetectingcopynumbervariationsfromnextgenerationsequencingdata