Cargando…

dbCNV: deleteriousness-based model to predict pathogenicity of copy number variations

BACKGROUND: Copy number variation (CNV) is a type of structural variation, which is a gain or loss event with abnormal changes in copy number. Methods to predict the pathogenicity of CNVs are required to realize the relationship between these variants and clinical phenotypes. ClassifyCNV, X-CNV, Str...

Descripción completa

Detalles Bibliográficos
Autores principales: Lv, Kangqi, Chen, Dayang, Xiong, Dan, Tang, Huamei, Ou, Tong, Kan, Lijuan, Zhang, Xiuming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10029177/
https://www.ncbi.nlm.nih.gov/pubmed/36941551
http://dx.doi.org/10.1186/s12864-023-09225-4
_version_ 1784910086865420288
author Lv, Kangqi
Chen, Dayang
Xiong, Dan
Tang, Huamei
Ou, Tong
Kan, Lijuan
Zhang, Xiuming
author_facet Lv, Kangqi
Chen, Dayang
Xiong, Dan
Tang, Huamei
Ou, Tong
Kan, Lijuan
Zhang, Xiuming
author_sort Lv, Kangqi
collection PubMed
description BACKGROUND: Copy number variation (CNV) is a type of structural variation, which is a gain or loss event with abnormal changes in copy number. Methods to predict the pathogenicity of CNVs are required to realize the relationship between these variants and clinical phenotypes. ClassifyCNV, X-CNV, StrVCTVRE, etc. have been trained to predict the pathogenicity of CNVs, but few studies have been reported based on the deleterious significance of features. RESULTS: From single nucleotide polymorphism (SNP), gene and region dimensions, we collected 79 informative features that quantitatively describe the characteristics of CNV, such as CNV length, the number of protein genes, the number of three prime untranslated region. Then, according to the deleterious significance, we formulated quantitative methods for features, which fall into two categories: the first is variable type, including maximum, minimum and mean; the second is attribute type, which is measured by numerical sum. We used Gradient Boosted Trees (GBT) algorithm to construct dbCNV, which can be used to predict pathogenicity for five-tier classification and binary classification of CNVs. We demonstrated that the distribution of most feature values was consistent with the deleterious significance. The five-tier classification model accuracy for 0.85 and 0.79 in loss and gain CNVs, which proved that it has high discrimination power in predicting the pathogenicity of five-tier classification CNVs. The binary model achieved area under curve (AUC) values of 0.96 and 0.81 in the validation set, respectively, in gain and loss CNVs. CONCLUSION: The performance of the dbCNV suggest that functional deleteriousness-based model of CNV is a promising approach to support the classification prediction and to further understand the pathogenic mechanism. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-023-09225-4.
format Online
Article
Text
id pubmed-10029177
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-100291772023-03-22 dbCNV: deleteriousness-based model to predict pathogenicity of copy number variations Lv, Kangqi Chen, Dayang Xiong, Dan Tang, Huamei Ou, Tong Kan, Lijuan Zhang, Xiuming BMC Genomics Research BACKGROUND: Copy number variation (CNV) is a type of structural variation, which is a gain or loss event with abnormal changes in copy number. Methods to predict the pathogenicity of CNVs are required to realize the relationship between these variants and clinical phenotypes. ClassifyCNV, X-CNV, StrVCTVRE, etc. have been trained to predict the pathogenicity of CNVs, but few studies have been reported based on the deleterious significance of features. RESULTS: From single nucleotide polymorphism (SNP), gene and region dimensions, we collected 79 informative features that quantitatively describe the characteristics of CNV, such as CNV length, the number of protein genes, the number of three prime untranslated region. Then, according to the deleterious significance, we formulated quantitative methods for features, which fall into two categories: the first is variable type, including maximum, minimum and mean; the second is attribute type, which is measured by numerical sum. We used Gradient Boosted Trees (GBT) algorithm to construct dbCNV, which can be used to predict pathogenicity for five-tier classification and binary classification of CNVs. We demonstrated that the distribution of most feature values was consistent with the deleterious significance. The five-tier classification model accuracy for 0.85 and 0.79 in loss and gain CNVs, which proved that it has high discrimination power in predicting the pathogenicity of five-tier classification CNVs. The binary model achieved area under curve (AUC) values of 0.96 and 0.81 in the validation set, respectively, in gain and loss CNVs. CONCLUSION: The performance of the dbCNV suggest that functional deleteriousness-based model of CNV is a promising approach to support the classification prediction and to further understand the pathogenic mechanism. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-023-09225-4. BioMed Central 2023-03-20 /pmc/articles/PMC10029177/ /pubmed/36941551 http://dx.doi.org/10.1186/s12864-023-09225-4 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Lv, Kangqi
Chen, Dayang
Xiong, Dan
Tang, Huamei
Ou, Tong
Kan, Lijuan
Zhang, Xiuming
dbCNV: deleteriousness-based model to predict pathogenicity of copy number variations
title dbCNV: deleteriousness-based model to predict pathogenicity of copy number variations
title_full dbCNV: deleteriousness-based model to predict pathogenicity of copy number variations
title_fullStr dbCNV: deleteriousness-based model to predict pathogenicity of copy number variations
title_full_unstemmed dbCNV: deleteriousness-based model to predict pathogenicity of copy number variations
title_short dbCNV: deleteriousness-based model to predict pathogenicity of copy number variations
title_sort dbcnv: deleteriousness-based model to predict pathogenicity of copy number variations
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10029177/
https://www.ncbi.nlm.nih.gov/pubmed/36941551
http://dx.doi.org/10.1186/s12864-023-09225-4
work_keys_str_mv AT lvkangqi dbcnvdeleteriousnessbasedmodeltopredictpathogenicityofcopynumbervariations
AT chendayang dbcnvdeleteriousnessbasedmodeltopredictpathogenicityofcopynumbervariations
AT xiongdan dbcnvdeleteriousnessbasedmodeltopredictpathogenicityofcopynumbervariations
AT tanghuamei dbcnvdeleteriousnessbasedmodeltopredictpathogenicityofcopynumbervariations
AT outong dbcnvdeleteriousnessbasedmodeltopredictpathogenicityofcopynumbervariations
AT kanlijuan dbcnvdeleteriousnessbasedmodeltopredictpathogenicityofcopynumbervariations
AT zhangxiuming dbcnvdeleteriousnessbasedmodeltopredictpathogenicityofcopynumbervariations