Cargando…

A semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations

BACKGROUND: Understanding the functional effects of non-coding variants is important as they are often associated with gene-expression alteration and disease development. Over the past few years, many computational tools have been developed to predict their functional impact. However, the intrinsic...

Descripción completa

Detalles Bibliográficos
Autores principales: Jia, Hao, Park, Sung-Joon, Nakai, Kenta
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8171027/
https://www.ncbi.nlm.nih.gov/pubmed/34078253
http://dx.doi.org/10.1186/s12859-021-03999-8
_version_ 1783702352925032448
author Jia, Hao
Park, Sung-Joon
Nakai, Kenta
author_facet Jia, Hao
Park, Sung-Joon
Nakai, Kenta
author_sort Jia, Hao
collection PubMed
description BACKGROUND: Understanding the functional effects of non-coding variants is important as they are often associated with gene-expression alteration and disease development. Over the past few years, many computational tools have been developed to predict their functional impact. However, the intrinsic difficulty in dealing with the scarcity of data leads to the necessity to further improve the algorithms. In this work, we propose a novel method, employing a semi-supervised deep-learning model with pseudo labels, which takes advantage of learning from both experimentally annotated and unannotated data. RESULTS: We prepared known functional non-coding variants with histone marks, DNA accessibility, and sequence context in GM12878, HepG2, and K562 cell lines. Applying our method to the dataset demonstrated its outstanding performance, compared with that of existing tools. Our results also indicated that the semi-supervised model with pseudo labels achieves higher predictive performance than the supervised model without pseudo labels. Interestingly, a model trained with the data in a certain cell line is unlikely to succeed in other cell lines, which implies the cell-type-specific nature of the non-coding variants. Remarkably, we found that DNA accessibility significantly contributes to the functional consequence of variants, which suggests the importance of open chromatin conformation prior to establishing the interaction of non-coding variants with gene regulation. CONCLUSIONS: The semi-supervised deep learning model coupled with pseudo labeling has advantages in studying with limited datasets, which is not unusual in biology. Our study provides an effective approach in finding non-coding mutations potentially associated with various biological phenomena, including human diseases. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-03999-8.
format Online
Article
Text
id pubmed-8171027
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-81710272021-06-03 A semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations Jia, Hao Park, Sung-Joon Nakai, Kenta BMC Bioinformatics Research BACKGROUND: Understanding the functional effects of non-coding variants is important as they are often associated with gene-expression alteration and disease development. Over the past few years, many computational tools have been developed to predict their functional impact. However, the intrinsic difficulty in dealing with the scarcity of data leads to the necessity to further improve the algorithms. In this work, we propose a novel method, employing a semi-supervised deep-learning model with pseudo labels, which takes advantage of learning from both experimentally annotated and unannotated data. RESULTS: We prepared known functional non-coding variants with histone marks, DNA accessibility, and sequence context in GM12878, HepG2, and K562 cell lines. Applying our method to the dataset demonstrated its outstanding performance, compared with that of existing tools. Our results also indicated that the semi-supervised model with pseudo labels achieves higher predictive performance than the supervised model without pseudo labels. Interestingly, a model trained with the data in a certain cell line is unlikely to succeed in other cell lines, which implies the cell-type-specific nature of the non-coding variants. Remarkably, we found that DNA accessibility significantly contributes to the functional consequence of variants, which suggests the importance of open chromatin conformation prior to establishing the interaction of non-coding variants with gene regulation. CONCLUSIONS: The semi-supervised deep learning model coupled with pseudo labeling has advantages in studying with limited datasets, which is not unusual in biology. Our study provides an effective approach in finding non-coding mutations potentially associated with various biological phenomena, including human diseases. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-03999-8. BioMed Central 2021-06-02 /pmc/articles/PMC8171027/ /pubmed/34078253 http://dx.doi.org/10.1186/s12859-021-03999-8 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Jia, Hao
Park, Sung-Joon
Nakai, Kenta
A semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations
title A semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations
title_full A semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations
title_fullStr A semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations
title_full_unstemmed A semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations
title_short A semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations
title_sort semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8171027/
https://www.ncbi.nlm.nih.gov/pubmed/34078253
http://dx.doi.org/10.1186/s12859-021-03999-8
work_keys_str_mv AT jiahao asemisuperviseddeeplearningapproachforpredictingthefunctionaleffectsofgenomicnoncodingvariations
AT parksungjoon asemisuperviseddeeplearningapproachforpredictingthefunctionaleffectsofgenomicnoncodingvariations
AT nakaikenta asemisuperviseddeeplearningapproachforpredictingthefunctionaleffectsofgenomicnoncodingvariations
AT jiahao semisuperviseddeeplearningapproachforpredictingthefunctionaleffectsofgenomicnoncodingvariations
AT parksungjoon semisuperviseddeeplearningapproachforpredictingthefunctionaleffectsofgenomicnoncodingvariations
AT nakaikenta semisuperviseddeeplearningapproachforpredictingthefunctionaleffectsofgenomicnoncodingvariations