Cargando…

PeakCNV: A multi-feature ranking algorithm-based tool for genome-wide copy number variation-association study

Copy Number Variation (CNV) refers to a type of structural genomic alteration in which a segment of chromosome is duplicated or deleted. To date, many CNVs have been identified as causative genetic elements for several diseases and phenotypes. However, performing a CNV-based genome-wide association...

Descripción completa

Detalles Bibliográficos
Autores principales: Labani, Mahdieh, Afrasiabi, Ali, Beheshti, Amin, Lovell, Nigel H., Alinejad-Rokny, Hamid
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9478359/
https://www.ncbi.nlm.nih.gov/pubmed/36147666
http://dx.doi.org/10.1016/j.csbj.2022.09.001
_version_ 1784790553016139776
author Labani, Mahdieh
Afrasiabi, Ali
Beheshti, Amin
Lovell, Nigel H.
Alinejad-Rokny, Hamid
author_facet Labani, Mahdieh
Afrasiabi, Ali
Beheshti, Amin
Lovell, Nigel H.
Alinejad-Rokny, Hamid
author_sort Labani, Mahdieh
collection PubMed
description Copy Number Variation (CNV) refers to a type of structural genomic alteration in which a segment of chromosome is duplicated or deleted. To date, many CNVs have been identified as causative genetic elements for several diseases and phenotypes. However, performing a CNV-based genome-wide association study is challenging due to inconsistency in length and occurrence of CNVs across different individuals under investigation. One of the most efficient strategies to address this issue is building CNV regions (genomic regions in which CNVs are overlapping - CNVRs). However, this approach is susceptible to a high false positive rate due to overlapping and co-occurring of confounding CNVRs with true positive CNVRs. Here, we develop PeakCNV that differentiates false-positive CNVRs from true positives by calculating a new metric, independence ranking score, (IR-score) via a feature ranking approach. We compared the performance of PeakCNV with other current existing tools by carrying out two case studies one using the CNV genotype data for individuals with prostate cancer (194 cases and 2,392 healthy individuals) and the second one for individuals with neurodevelopmental disorders (19,642 cases and 6,451 healthy individuals). Crucially, our benchmarking analyses on prostate cancer cohort indicated that PeakCNV identifies a fewer risk candidate CNVRs with shorter lengths compared to other tools. Importantly, these CNVRs cover a greater proportion of case over healthy individuals compared to other tools. The accuracy of PeakCNV in identifying relevant candidate CNVRs was reproducible in the case study on neurodevelopmental disorders. Using data from the FANTOM5 expression atlas and the Clinical Genomic Database, we show that the candidate CNVRs identified by PeakCNV for neurodevelopmental disorders overlap with a greater number of genes with the brain-enriched expression, and a greater number of genes that are associated with neurological conditions compared to candidate CNVRs identified by other tools. Taken together, PeakCNV outperformed current existing CNV association study tools by identifying more biologically meaningful CNVRs relevant to the phenotype of interest. PeakCNV is publicly available for the analysis of CNV-associated diseases and is accessible from https://rdrr.io/github/mahdieh1/PeakCNV.
format Online
Article
Text
id pubmed-9478359
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-94783592022-09-21 PeakCNV: A multi-feature ranking algorithm-based tool for genome-wide copy number variation-association study Labani, Mahdieh Afrasiabi, Ali Beheshti, Amin Lovell, Nigel H. Alinejad-Rokny, Hamid Comput Struct Biotechnol J Method Article Copy Number Variation (CNV) refers to a type of structural genomic alteration in which a segment of chromosome is duplicated or deleted. To date, many CNVs have been identified as causative genetic elements for several diseases and phenotypes. However, performing a CNV-based genome-wide association study is challenging due to inconsistency in length and occurrence of CNVs across different individuals under investigation. One of the most efficient strategies to address this issue is building CNV regions (genomic regions in which CNVs are overlapping - CNVRs). However, this approach is susceptible to a high false positive rate due to overlapping and co-occurring of confounding CNVRs with true positive CNVRs. Here, we develop PeakCNV that differentiates false-positive CNVRs from true positives by calculating a new metric, independence ranking score, (IR-score) via a feature ranking approach. We compared the performance of PeakCNV with other current existing tools by carrying out two case studies one using the CNV genotype data for individuals with prostate cancer (194 cases and 2,392 healthy individuals) and the second one for individuals with neurodevelopmental disorders (19,642 cases and 6,451 healthy individuals). Crucially, our benchmarking analyses on prostate cancer cohort indicated that PeakCNV identifies a fewer risk candidate CNVRs with shorter lengths compared to other tools. Importantly, these CNVRs cover a greater proportion of case over healthy individuals compared to other tools. The accuracy of PeakCNV in identifying relevant candidate CNVRs was reproducible in the case study on neurodevelopmental disorders. Using data from the FANTOM5 expression atlas and the Clinical Genomic Database, we show that the candidate CNVRs identified by PeakCNV for neurodevelopmental disorders overlap with a greater number of genes with the brain-enriched expression, and a greater number of genes that are associated with neurological conditions compared to candidate CNVRs identified by other tools. Taken together, PeakCNV outperformed current existing CNV association study tools by identifying more biologically meaningful CNVRs relevant to the phenotype of interest. PeakCNV is publicly available for the analysis of CNV-associated diseases and is accessible from https://rdrr.io/github/mahdieh1/PeakCNV. Research Network of Computational and Structural Biotechnology 2022-09-07 /pmc/articles/PMC9478359/ /pubmed/36147666 http://dx.doi.org/10.1016/j.csbj.2022.09.001 Text en Crown Copyright © 2022 Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Method Article
Labani, Mahdieh
Afrasiabi, Ali
Beheshti, Amin
Lovell, Nigel H.
Alinejad-Rokny, Hamid
PeakCNV: A multi-feature ranking algorithm-based tool for genome-wide copy number variation-association study
title PeakCNV: A multi-feature ranking algorithm-based tool for genome-wide copy number variation-association study
title_full PeakCNV: A multi-feature ranking algorithm-based tool for genome-wide copy number variation-association study
title_fullStr PeakCNV: A multi-feature ranking algorithm-based tool for genome-wide copy number variation-association study
title_full_unstemmed PeakCNV: A multi-feature ranking algorithm-based tool for genome-wide copy number variation-association study
title_short PeakCNV: A multi-feature ranking algorithm-based tool for genome-wide copy number variation-association study
title_sort peakcnv: a multi-feature ranking algorithm-based tool for genome-wide copy number variation-association study
topic Method Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9478359/
https://www.ncbi.nlm.nih.gov/pubmed/36147666
http://dx.doi.org/10.1016/j.csbj.2022.09.001
work_keys_str_mv AT labanimahdieh peakcnvamultifeaturerankingalgorithmbasedtoolforgenomewidecopynumbervariationassociationstudy
AT afrasiabiali peakcnvamultifeaturerankingalgorithmbasedtoolforgenomewidecopynumbervariationassociationstudy
AT beheshtiamin peakcnvamultifeaturerankingalgorithmbasedtoolforgenomewidecopynumbervariationassociationstudy
AT lovellnigelh peakcnvamultifeaturerankingalgorithmbasedtoolforgenomewidecopynumbervariationassociationstudy
AT alinejadroknyhamid peakcnvamultifeaturerankingalgorithmbasedtoolforgenomewidecopynumbervariationassociationstudy