Cargando…

A Density Peak-Based Method to Detect Copy Number Variations From Next-Generation Sequencing Data

Copy number variation (CNV) is a common type of structural variations in human genome and confers biological meanings to human complex diseases. Detection of CNVs is an important step for a systematic analysis of CNVs in medical research of complex diseases. The recent development of next-generation...

Descripción completa

Detalles Bibliográficos
Autores principales: Xie, Kun, Tian, Ye, Yuan, Xiguo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7838601/
https://www.ncbi.nlm.nih.gov/pubmed/33519925
http://dx.doi.org/10.3389/fgene.2020.632311
_version_ 1783643217313398784
author Xie, Kun
Tian, Ye
Yuan, Xiguo
author_facet Xie, Kun
Tian, Ye
Yuan, Xiguo
author_sort Xie, Kun
collection PubMed
description Copy number variation (CNV) is a common type of structural variations in human genome and confers biological meanings to human complex diseases. Detection of CNVs is an important step for a systematic analysis of CNVs in medical research of complex diseases. The recent development of next-generation sequencing (NGS) platforms provides unprecedented opportunities for the detection of CNVs at a base-level resolution. However, due to the intrinsic characteristics behind NGS data, accurate detection of CNVs is still a challenging task. In this article, we propose a new density peak-based method, called dpCNV, for the detection of CNVs from NGS data. The algorithm of dpCNV is designed based on density peak clustering algorithm. It extracts two features, i.e., local density and minimum distance, from sequencing read depth (RD) profile and generates a two-dimensional data. Based on the generated data, a two-dimensional null distribution is constructed to test the significance of each genome bin and then the significant genome bins are declared as CNVs. We test the performance of the dpCNV method on a number of simulated datasets and make comparison with several existing methods. The experimental results demonstrate that our proposed method outperforms others in terms of sensitivity and F1-score. We further apply it to a set of real sequencing samples and the results demonstrate the validity of dpCNV. Therefore, we expect that dpCNV can be used as a supplementary to existing methods and may become a routine tool in the field of genome mutation analysis.
format Online
Article
Text
id pubmed-7838601
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-78386012021-01-28 A Density Peak-Based Method to Detect Copy Number Variations From Next-Generation Sequencing Data Xie, Kun Tian, Ye Yuan, Xiguo Front Genet Genetics Copy number variation (CNV) is a common type of structural variations in human genome and confers biological meanings to human complex diseases. Detection of CNVs is an important step for a systematic analysis of CNVs in medical research of complex diseases. The recent development of next-generation sequencing (NGS) platforms provides unprecedented opportunities for the detection of CNVs at a base-level resolution. However, due to the intrinsic characteristics behind NGS data, accurate detection of CNVs is still a challenging task. In this article, we propose a new density peak-based method, called dpCNV, for the detection of CNVs from NGS data. The algorithm of dpCNV is designed based on density peak clustering algorithm. It extracts two features, i.e., local density and minimum distance, from sequencing read depth (RD) profile and generates a two-dimensional data. Based on the generated data, a two-dimensional null distribution is constructed to test the significance of each genome bin and then the significant genome bins are declared as CNVs. We test the performance of the dpCNV method on a number of simulated datasets and make comparison with several existing methods. The experimental results demonstrate that our proposed method outperforms others in terms of sensitivity and F1-score. We further apply it to a set of real sequencing samples and the results demonstrate the validity of dpCNV. Therefore, we expect that dpCNV can be used as a supplementary to existing methods and may become a routine tool in the field of genome mutation analysis. Frontiers Media S.A. 2021-01-13 /pmc/articles/PMC7838601/ /pubmed/33519925 http://dx.doi.org/10.3389/fgene.2020.632311 Text en Copyright © 2021 Xie, Tian and Yuan. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Xie, Kun
Tian, Ye
Yuan, Xiguo
A Density Peak-Based Method to Detect Copy Number Variations From Next-Generation Sequencing Data
title A Density Peak-Based Method to Detect Copy Number Variations From Next-Generation Sequencing Data
title_full A Density Peak-Based Method to Detect Copy Number Variations From Next-Generation Sequencing Data
title_fullStr A Density Peak-Based Method to Detect Copy Number Variations From Next-Generation Sequencing Data
title_full_unstemmed A Density Peak-Based Method to Detect Copy Number Variations From Next-Generation Sequencing Data
title_short A Density Peak-Based Method to Detect Copy Number Variations From Next-Generation Sequencing Data
title_sort density peak-based method to detect copy number variations from next-generation sequencing data
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7838601/
https://www.ncbi.nlm.nih.gov/pubmed/33519925
http://dx.doi.org/10.3389/fgene.2020.632311
work_keys_str_mv AT xiekun adensitypeakbasedmethodtodetectcopynumbervariationsfromnextgenerationsequencingdata
AT tianye adensitypeakbasedmethodtodetectcopynumbervariationsfromnextgenerationsequencingdata
AT yuanxiguo adensitypeakbasedmethodtodetectcopynumbervariationsfromnextgenerationsequencingdata
AT xiekun densitypeakbasedmethodtodetectcopynumbervariationsfromnextgenerationsequencingdata
AT tianye densitypeakbasedmethodtodetectcopynumbervariationsfromnextgenerationsequencingdata
AT yuanxiguo densitypeakbasedmethodtodetectcopynumbervariationsfromnextgenerationsequencingdata