Cargando…
A semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations
BACKGROUND: Understanding the functional effects of non-coding variants is important as they are often associated with gene-expression alteration and disease development. Over the past few years, many computational tools have been developed to predict their functional impact. However, the intrinsic...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8171027/ https://www.ncbi.nlm.nih.gov/pubmed/34078253 http://dx.doi.org/10.1186/s12859-021-03999-8 |
_version_ | 1783702352925032448 |
---|---|
author | Jia, Hao Park, Sung-Joon Nakai, Kenta |
author_facet | Jia, Hao Park, Sung-Joon Nakai, Kenta |
author_sort | Jia, Hao |
collection | PubMed |
description | BACKGROUND: Understanding the functional effects of non-coding variants is important as they are often associated with gene-expression alteration and disease development. Over the past few years, many computational tools have been developed to predict their functional impact. However, the intrinsic difficulty in dealing with the scarcity of data leads to the necessity to further improve the algorithms. In this work, we propose a novel method, employing a semi-supervised deep-learning model with pseudo labels, which takes advantage of learning from both experimentally annotated and unannotated data. RESULTS: We prepared known functional non-coding variants with histone marks, DNA accessibility, and sequence context in GM12878, HepG2, and K562 cell lines. Applying our method to the dataset demonstrated its outstanding performance, compared with that of existing tools. Our results also indicated that the semi-supervised model with pseudo labels achieves higher predictive performance than the supervised model without pseudo labels. Interestingly, a model trained with the data in a certain cell line is unlikely to succeed in other cell lines, which implies the cell-type-specific nature of the non-coding variants. Remarkably, we found that DNA accessibility significantly contributes to the functional consequence of variants, which suggests the importance of open chromatin conformation prior to establishing the interaction of non-coding variants with gene regulation. CONCLUSIONS: The semi-supervised deep learning model coupled with pseudo labeling has advantages in studying with limited datasets, which is not unusual in biology. Our study provides an effective approach in finding non-coding mutations potentially associated with various biological phenomena, including human diseases. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-03999-8. |
format | Online Article Text |
id | pubmed-8171027 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-81710272021-06-03 A semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations Jia, Hao Park, Sung-Joon Nakai, Kenta BMC Bioinformatics Research BACKGROUND: Understanding the functional effects of non-coding variants is important as they are often associated with gene-expression alteration and disease development. Over the past few years, many computational tools have been developed to predict their functional impact. However, the intrinsic difficulty in dealing with the scarcity of data leads to the necessity to further improve the algorithms. In this work, we propose a novel method, employing a semi-supervised deep-learning model with pseudo labels, which takes advantage of learning from both experimentally annotated and unannotated data. RESULTS: We prepared known functional non-coding variants with histone marks, DNA accessibility, and sequence context in GM12878, HepG2, and K562 cell lines. Applying our method to the dataset demonstrated its outstanding performance, compared with that of existing tools. Our results also indicated that the semi-supervised model with pseudo labels achieves higher predictive performance than the supervised model without pseudo labels. Interestingly, a model trained with the data in a certain cell line is unlikely to succeed in other cell lines, which implies the cell-type-specific nature of the non-coding variants. Remarkably, we found that DNA accessibility significantly contributes to the functional consequence of variants, which suggests the importance of open chromatin conformation prior to establishing the interaction of non-coding variants with gene regulation. CONCLUSIONS: The semi-supervised deep learning model coupled with pseudo labeling has advantages in studying with limited datasets, which is not unusual in biology. Our study provides an effective approach in finding non-coding mutations potentially associated with various biological phenomena, including human diseases. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-03999-8. BioMed Central 2021-06-02 /pmc/articles/PMC8171027/ /pubmed/34078253 http://dx.doi.org/10.1186/s12859-021-03999-8 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Jia, Hao Park, Sung-Joon Nakai, Kenta A semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations |
title | A semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations |
title_full | A semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations |
title_fullStr | A semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations |
title_full_unstemmed | A semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations |
title_short | A semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations |
title_sort | semi-supervised deep learning approach for predicting the functional effects of genomic non-coding variations |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8171027/ https://www.ncbi.nlm.nih.gov/pubmed/34078253 http://dx.doi.org/10.1186/s12859-021-03999-8 |
work_keys_str_mv | AT jiahao asemisuperviseddeeplearningapproachforpredictingthefunctionaleffectsofgenomicnoncodingvariations AT parksungjoon asemisuperviseddeeplearningapproachforpredictingthefunctionaleffectsofgenomicnoncodingvariations AT nakaikenta asemisuperviseddeeplearningapproachforpredictingthefunctionaleffectsofgenomicnoncodingvariations AT jiahao semisuperviseddeeplearningapproachforpredictingthefunctionaleffectsofgenomicnoncodingvariations AT parksungjoon semisuperviseddeeplearningapproachforpredictingthefunctionaleffectsofgenomicnoncodingvariations AT nakaikenta semisuperviseddeeplearningapproachforpredictingthefunctionaleffectsofgenomicnoncodingvariations |