Cargando…

Performance of case-control rare copy number variation annotation in classification of autism

BACKGROUND: A substantial proportion of Autism Spectrum Disorder (ASD) risk resides in de novo germline and rare inherited genetic variation. In particular, rare copy number variation (CNV) contributes to ASD risk in up to 10% of ASD subjects. Despite the striking degree of genetic heterogeneity, ca...

Descripción completa

Detalles Bibliográficos
Autores principales: Engchuan, Worrawat, Dhindsa, Kiret, Lionel, Anath C, Scherer, Stephen W, Chan, Jonathan H, Merico, Daniele
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4315323/
https://www.ncbi.nlm.nih.gov/pubmed/25783485
http://dx.doi.org/10.1186/1755-8794-8-S1-S7
_version_ 1782355458668888064
author Engchuan, Worrawat
Dhindsa, Kiret
Lionel, Anath C
Scherer, Stephen W
Chan, Jonathan H
Merico, Daniele
author_facet Engchuan, Worrawat
Dhindsa, Kiret
Lionel, Anath C
Scherer, Stephen W
Chan, Jonathan H
Merico, Daniele
author_sort Engchuan, Worrawat
collection PubMed
description BACKGROUND: A substantial proportion of Autism Spectrum Disorder (ASD) risk resides in de novo germline and rare inherited genetic variation. In particular, rare copy number variation (CNV) contributes to ASD risk in up to 10% of ASD subjects. Despite the striking degree of genetic heterogeneity, case-control studies have detected specific burden of rare disruptive CNV for neuronal and neurodevelopmental pathways. Here, we used machine learning methods to classify ASD subjects and controls, based on rare CNV data and comprehensive gene annotations. We investigated performance of different methods and estimated the percentage of ASD subjects that could be reliably classified based on presumed etiologic CNV they carry. RESULTS: We analyzed 1,892 Caucasian ASD subjects and 2,342 matched controls. Rare CNVs (frequency 1% or less) were detected using Illumina 1M and 1M-Duo BeadChips. Conditional Inference Forest (CF) typically performed as well as or better than other classification methods. We found a maximum AUC (area under the ROC curve) of 0.533 when considering all ASD subjects with rare genic CNVs, corresponding to 7.9% correctly classified ASD subjects and less than 3% incorrectly classified controls; performance was significantly higher when considering only subjects harboring de novo or pathogenic CNVs. We also found rare losses to be more predictive than gains and that curated neurally-relevant annotations (brain expression, synaptic components and neurodevelopmental phenotypes) outperform Gene Ontology and pathway-based annotations. CONCLUSIONS: CF is an optimal classification approach for case-control rare CNV data and it can be used to prioritize subjects with variants potentially contributing to ASD risk not yet recognized. The neurally-relevant annotations used in this study could be successfully applied to rare CNV case-control data-sets for other neuropsychiatric disorders.
format Online
Article
Text
id pubmed-4315323
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43153232015-02-12 Performance of case-control rare copy number variation annotation in classification of autism Engchuan, Worrawat Dhindsa, Kiret Lionel, Anath C Scherer, Stephen W Chan, Jonathan H Merico, Daniele BMC Med Genomics Research BACKGROUND: A substantial proportion of Autism Spectrum Disorder (ASD) risk resides in de novo germline and rare inherited genetic variation. In particular, rare copy number variation (CNV) contributes to ASD risk in up to 10% of ASD subjects. Despite the striking degree of genetic heterogeneity, case-control studies have detected specific burden of rare disruptive CNV for neuronal and neurodevelopmental pathways. Here, we used machine learning methods to classify ASD subjects and controls, based on rare CNV data and comprehensive gene annotations. We investigated performance of different methods and estimated the percentage of ASD subjects that could be reliably classified based on presumed etiologic CNV they carry. RESULTS: We analyzed 1,892 Caucasian ASD subjects and 2,342 matched controls. Rare CNVs (frequency 1% or less) were detected using Illumina 1M and 1M-Duo BeadChips. Conditional Inference Forest (CF) typically performed as well as or better than other classification methods. We found a maximum AUC (area under the ROC curve) of 0.533 when considering all ASD subjects with rare genic CNVs, corresponding to 7.9% correctly classified ASD subjects and less than 3% incorrectly classified controls; performance was significantly higher when considering only subjects harboring de novo or pathogenic CNVs. We also found rare losses to be more predictive than gains and that curated neurally-relevant annotations (brain expression, synaptic components and neurodevelopmental phenotypes) outperform Gene Ontology and pathway-based annotations. CONCLUSIONS: CF is an optimal classification approach for case-control rare CNV data and it can be used to prioritize subjects with variants potentially contributing to ASD risk not yet recognized. The neurally-relevant annotations used in this study could be successfully applied to rare CNV case-control data-sets for other neuropsychiatric disorders. BioMed Central 2015-01-15 /pmc/articles/PMC4315323/ /pubmed/25783485 http://dx.doi.org/10.1186/1755-8794-8-S1-S7 Text en Copyright © 2015 Engchuan et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Engchuan, Worrawat
Dhindsa, Kiret
Lionel, Anath C
Scherer, Stephen W
Chan, Jonathan H
Merico, Daniele
Performance of case-control rare copy number variation annotation in classification of autism
title Performance of case-control rare copy number variation annotation in classification of autism
title_full Performance of case-control rare copy number variation annotation in classification of autism
title_fullStr Performance of case-control rare copy number variation annotation in classification of autism
title_full_unstemmed Performance of case-control rare copy number variation annotation in classification of autism
title_short Performance of case-control rare copy number variation annotation in classification of autism
title_sort performance of case-control rare copy number variation annotation in classification of autism
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4315323/
https://www.ncbi.nlm.nih.gov/pubmed/25783485
http://dx.doi.org/10.1186/1755-8794-8-S1-S7
work_keys_str_mv AT engchuanworrawat performanceofcasecontrolrarecopynumbervariationannotationinclassificationofautism
AT dhindsakiret performanceofcasecontrolrarecopynumbervariationannotationinclassificationofautism
AT lionelanathc performanceofcasecontrolrarecopynumbervariationannotationinclassificationofautism
AT schererstephenw performanceofcasecontrolrarecopynumbervariationannotationinclassificationofautism
AT chanjonathanh performanceofcasecontrolrarecopynumbervariationannotationinclassificationofautism
AT mericodaniele performanceofcasecontrolrarecopynumbervariationannotationinclassificationofautism