Cargando…

Prediction and prioritization of autism-associated long non-coding RNAs using gene expression and sequence features

BACKGROUND: Autism spectrum disorders (ASD) refer to a range of neurodevelopmental conditions, which are genetically complex and heterogeneous with most of the genetic risk factors also found in the unaffected general population. Although all the currently known ASD risk genes code for proteins, lon...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Jun, Wang, Liangjiang
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2020
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7648398/ https://www.ncbi.nlm.nih.gov/pubmed/33160303 http://dx.doi.org/10.1186/s12859-020-03843-5

_version_	1783607102285021184
author	Wang, Jun Wang, Liangjiang
author_facet	Wang, Jun Wang, Liangjiang
author_sort	Wang, Jun
collection	PubMed
description	BACKGROUND: Autism spectrum disorders (ASD) refer to a range of neurodevelopmental conditions, which are genetically complex and heterogeneous with most of the genetic risk factors also found in the unaffected general population. Although all the currently known ASD risk genes code for proteins, long non-coding RNAs (lncRNAs) as essential regulators of gene expression have been implicated in ASD. Some lncRNAs show altered expression levels in autistic brains, but their roles in ASD pathogenesis are still unclear. RESULTS: In this study, we have developed a new machine learning approach to predict candidate lncRNAs associated with ASD. Particularly, the knowledge learnt from protein-coding ASD risk genes was transferred to the prediction and prioritization of ASD-associated lncRNAs. Both developmental brain gene expression data and transcript sequence were found to contain relevant information for ASD risk gene prediction. During the pre-training phase of model construction, an autoencoder network was implemented for a representation learning of the gene expression data, and a random-forest-based feature selection was applied to the transcript-sequence-derived k-mers. Our models, including logistic regression, support vector machine and random forest, showed robust performance based on tenfold cross-validations as well as candidate prioritization with hypothetical loci. We then utilized the models to predict and prioritize a list of candidate lncRNAs, including some reported to be cis-regulators of known ASD risk genes, for further investigation. CONCLUSIONS: Our results suggest that ASD risk genes can be accurately predicted using developmental brain gene expression data and transcript sequence features, and the models may provide useful information for functional characterization of the candidate lncRNAs associated with ASD.
format	Online Article Text
id	pubmed-7648398
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-76483982020-11-09 Prediction and prioritization of autism-associated long non-coding RNAs using gene expression and sequence features Wang, Jun Wang, Liangjiang BMC Bioinformatics Methodology Article BACKGROUND: Autism spectrum disorders (ASD) refer to a range of neurodevelopmental conditions, which are genetically complex and heterogeneous with most of the genetic risk factors also found in the unaffected general population. Although all the currently known ASD risk genes code for proteins, long non-coding RNAs (lncRNAs) as essential regulators of gene expression have been implicated in ASD. Some lncRNAs show altered expression levels in autistic brains, but their roles in ASD pathogenesis are still unclear. RESULTS: In this study, we have developed a new machine learning approach to predict candidate lncRNAs associated with ASD. Particularly, the knowledge learnt from protein-coding ASD risk genes was transferred to the prediction and prioritization of ASD-associated lncRNAs. Both developmental brain gene expression data and transcript sequence were found to contain relevant information for ASD risk gene prediction. During the pre-training phase of model construction, an autoencoder network was implemented for a representation learning of the gene expression data, and a random-forest-based feature selection was applied to the transcript-sequence-derived k-mers. Our models, including logistic regression, support vector machine and random forest, showed robust performance based on tenfold cross-validations as well as candidate prioritization with hypothetical loci. We then utilized the models to predict and prioritize a list of candidate lncRNAs, including some reported to be cis-regulators of known ASD risk genes, for further investigation. CONCLUSIONS: Our results suggest that ASD risk genes can be accurately predicted using developmental brain gene expression data and transcript sequence features, and the models may provide useful information for functional characterization of the candidate lncRNAs associated with ASD. BioMed Central 2020-11-07 /pmc/articles/PMC7648398/ /pubmed/33160303 http://dx.doi.org/10.1186/s12859-020-03843-5 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Methodology Article Wang, Jun Wang, Liangjiang Prediction and prioritization of autism-associated long non-coding RNAs using gene expression and sequence features
title	Prediction and prioritization of autism-associated long non-coding RNAs using gene expression and sequence features
title_full	Prediction and prioritization of autism-associated long non-coding RNAs using gene expression and sequence features
title_fullStr	Prediction and prioritization of autism-associated long non-coding RNAs using gene expression and sequence features
title_full_unstemmed	Prediction and prioritization of autism-associated long non-coding RNAs using gene expression and sequence features
title_short	Prediction and prioritization of autism-associated long non-coding RNAs using gene expression and sequence features
title_sort	prediction and prioritization of autism-associated long non-coding rnas using gene expression and sequence features
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7648398/ https://www.ncbi.nlm.nih.gov/pubmed/33160303 http://dx.doi.org/10.1186/s12859-020-03843-5
work_keys_str_mv	AT wangjun predictionandprioritizationofautismassociatedlongnoncodingrnasusinggeneexpressionandsequencefeatures AT wangliangjiang predictionandprioritizationofautismassociatedlongnoncodingrnasusinggeneexpressionandsequencefeatures

Prediction and prioritization of autism-associated long non-coding RNAs using gene expression and sequence features

Ejemplares similares