Cargando…
Outgroup Machine Learning Approach Identifies Single Nucleotide Variants in Noncoding DNA Associated with Autism Spectrum Disorder
Autism spectrum disorder (ASD) is a heritable neurodevelopmental disorder affecting 1 in 59 children. While noncoding genetic variation has been shown to play a major role in many complex disorders, the contribution of these regions to ASD susceptibility remains unclear. Genetic analyses of ASD typi...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6417813/ https://www.ncbi.nlm.nih.gov/pubmed/30864328 |
_version_ | 1783403627009802240 |
---|---|
author | Varma, Maya Paskov, Kelley Marie Jung, Jae-Yoon Chrisman, Brianna Sierra Stockham, Nate Tyler Washington, Peter Yigitcan Wall, Dennis Paul |
author_facet | Varma, Maya Paskov, Kelley Marie Jung, Jae-Yoon Chrisman, Brianna Sierra Stockham, Nate Tyler Washington, Peter Yigitcan Wall, Dennis Paul |
author_sort | Varma, Maya |
collection | PubMed |
description | Autism spectrum disorder (ASD) is a heritable neurodevelopmental disorder affecting 1 in 59 children. While noncoding genetic variation has been shown to play a major role in many complex disorders, the contribution of these regions to ASD susceptibility remains unclear. Genetic analyses of ASD typically use unaffected family members as controls; however, we hypothesize that this method does not effectively elevate variant signal in the noncoding region due to family members having subclinical phenotypes arising from common genetic mechanisms. In this study, we use a separate, unrelated outgroup of individuals with progressive supranuclear palsy (PSP), a neurodegenerative condition with no known etiological overlap with ASD, as a control population. We use whole genome sequencing data from a large cohort of 2182 children with ASD and 379 controls with PSP, sequenced at the same facility with the same machines and variant calling pipeline, in order to investigate the role of noncoding variation in the ASD phenotype. We analyze seven major types of noncoding variants: microRNAs, human accelerated regions, hypersensitive sites, transcription factor binding sites, DNA repeat sequences, simple repeat sequences, and CpG islands. After identifying and removing batch effects between the two groups, we trained an ℓ(1)-regularized logistic regression classifier to predict ASD status from each set of variants. The classifier trained on simple repeat sequences performed well on a held-out test set (AUC-ROC = 0.960); this classifier was also able to differentiate ASD cases from controls when applied to a completely independent dataset (AUC-ROC = 0.960). This suggests that variation in simple repeat regions is predictive of the ASD phenotype and may contribute to ASD risk. Our results show the importance of the noncoding region and the utility of independent control groups in effectively linking genetic variation to disease phenotype for complex disorders. |
format | Online Article Text |
id | pubmed-6417813 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
record_format | MEDLINE/PubMed |
spelling | pubmed-64178132019-03-14 Outgroup Machine Learning Approach Identifies Single Nucleotide Variants in Noncoding DNA Associated with Autism Spectrum Disorder Varma, Maya Paskov, Kelley Marie Jung, Jae-Yoon Chrisman, Brianna Sierra Stockham, Nate Tyler Washington, Peter Yigitcan Wall, Dennis Paul Pac Symp Biocomput Article Autism spectrum disorder (ASD) is a heritable neurodevelopmental disorder affecting 1 in 59 children. While noncoding genetic variation has been shown to play a major role in many complex disorders, the contribution of these regions to ASD susceptibility remains unclear. Genetic analyses of ASD typically use unaffected family members as controls; however, we hypothesize that this method does not effectively elevate variant signal in the noncoding region due to family members having subclinical phenotypes arising from common genetic mechanisms. In this study, we use a separate, unrelated outgroup of individuals with progressive supranuclear palsy (PSP), a neurodegenerative condition with no known etiological overlap with ASD, as a control population. We use whole genome sequencing data from a large cohort of 2182 children with ASD and 379 controls with PSP, sequenced at the same facility with the same machines and variant calling pipeline, in order to investigate the role of noncoding variation in the ASD phenotype. We analyze seven major types of noncoding variants: microRNAs, human accelerated regions, hypersensitive sites, transcription factor binding sites, DNA repeat sequences, simple repeat sequences, and CpG islands. After identifying and removing batch effects between the two groups, we trained an ℓ(1)-regularized logistic regression classifier to predict ASD status from each set of variants. The classifier trained on simple repeat sequences performed well on a held-out test set (AUC-ROC = 0.960); this classifier was also able to differentiate ASD cases from controls when applied to a completely independent dataset (AUC-ROC = 0.960). This suggests that variation in simple repeat regions is predictive of the ASD phenotype and may contribute to ASD risk. Our results show the importance of the noncoding region and the utility of independent control groups in effectively linking genetic variation to disease phenotype for complex disorders. 2019 /pmc/articles/PMC6417813/ /pubmed/30864328 Text en http://creativecommons.org/licenses/by-nc/4.0/ Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution Non-Commercial (CC BY-NC) 4.0 License |
spellingShingle | Article Varma, Maya Paskov, Kelley Marie Jung, Jae-Yoon Chrisman, Brianna Sierra Stockham, Nate Tyler Washington, Peter Yigitcan Wall, Dennis Paul Outgroup Machine Learning Approach Identifies Single Nucleotide Variants in Noncoding DNA Associated with Autism Spectrum Disorder |
title | Outgroup Machine Learning Approach Identifies Single Nucleotide Variants in Noncoding DNA Associated with Autism Spectrum Disorder |
title_full | Outgroup Machine Learning Approach Identifies Single Nucleotide Variants in Noncoding DNA Associated with Autism Spectrum Disorder |
title_fullStr | Outgroup Machine Learning Approach Identifies Single Nucleotide Variants in Noncoding DNA Associated with Autism Spectrum Disorder |
title_full_unstemmed | Outgroup Machine Learning Approach Identifies Single Nucleotide Variants in Noncoding DNA Associated with Autism Spectrum Disorder |
title_short | Outgroup Machine Learning Approach Identifies Single Nucleotide Variants in Noncoding DNA Associated with Autism Spectrum Disorder |
title_sort | outgroup machine learning approach identifies single nucleotide variants in noncoding dna associated with autism spectrum disorder |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6417813/ https://www.ncbi.nlm.nih.gov/pubmed/30864328 |
work_keys_str_mv | AT varmamaya outgroupmachinelearningapproachidentifiessinglenucleotidevariantsinnoncodingdnaassociatedwithautismspectrumdisorder AT paskovkelleymarie outgroupmachinelearningapproachidentifiessinglenucleotidevariantsinnoncodingdnaassociatedwithautismspectrumdisorder AT jungjaeyoon outgroupmachinelearningapproachidentifiessinglenucleotidevariantsinnoncodingdnaassociatedwithautismspectrumdisorder AT chrismanbriannasierra outgroupmachinelearningapproachidentifiessinglenucleotidevariantsinnoncodingdnaassociatedwithautismspectrumdisorder AT stockhamnatetyler outgroupmachinelearningapproachidentifiessinglenucleotidevariantsinnoncodingdnaassociatedwithautismspectrumdisorder AT washingtonpeteryigitcan outgroupmachinelearningapproachidentifiessinglenucleotidevariantsinnoncodingdnaassociatedwithautismspectrumdisorder AT walldennispaul outgroupmachinelearningapproachidentifiessinglenucleotidevariantsinnoncodingdnaassociatedwithautismspectrumdisorder |