Cargando…
A Contrastive Learning Pre-Training Method for Motif Occupancy Identification
Motif occupancy identification is a binary classification task predicting the binding of DNA motif instances to transcription factors, for which several sequence-based methods have been proposed. However, through direct training, these end-to-end methods are lack of biological interpretability withi...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9103107/ https://www.ncbi.nlm.nih.gov/pubmed/35563090 http://dx.doi.org/10.3390/ijms23094699 |
_version_ | 1784707483649966080 |
---|---|
author | Lin, Ken Quan, Xiongwen Yin, Wenya Zhang, Han |
author_facet | Lin, Ken Quan, Xiongwen Yin, Wenya Zhang, Han |
author_sort | Lin, Ken |
collection | PubMed |
description | Motif occupancy identification is a binary classification task predicting the binding of DNA motif instances to transcription factors, for which several sequence-based methods have been proposed. However, through direct training, these end-to-end methods are lack of biological interpretability within their sequence representations. In this work, we propose a contrastive learning method to pre-train interpretable and robust DNA encoding for motif occupancy identification. We construct two alternative models to pre-train DNA sequential encoder, respectively: a self-supervised model and a supervised model. We augment the original sequences for contrastive learning with edit operations defined in edit distance. Specifically, we propose a sequence similarity criterion based on the Needleman–Wunsch algorithm to discriminate positive and negative sample pairs in self-supervised learning. Finally, a DNN classifier is fine-tuned along with the pre-trained encoder to predict the results of motif occupancy identification. Both proposed contrastive learning models outperform the baseline end-to-end CNN model and SimCLR method, reaching AUC of 0.811 and 0.823, respectively. Compared with the baseline method, our models show better robustness for small samples. Specifically, the self-supervised model is proved to be practicable in transfer learning. |
format | Online Article Text |
id | pubmed-9103107 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-91031072022-05-14 A Contrastive Learning Pre-Training Method for Motif Occupancy Identification Lin, Ken Quan, Xiongwen Yin, Wenya Zhang, Han Int J Mol Sci Article Motif occupancy identification is a binary classification task predicting the binding of DNA motif instances to transcription factors, for which several sequence-based methods have been proposed. However, through direct training, these end-to-end methods are lack of biological interpretability within their sequence representations. In this work, we propose a contrastive learning method to pre-train interpretable and robust DNA encoding for motif occupancy identification. We construct two alternative models to pre-train DNA sequential encoder, respectively: a self-supervised model and a supervised model. We augment the original sequences for contrastive learning with edit operations defined in edit distance. Specifically, we propose a sequence similarity criterion based on the Needleman–Wunsch algorithm to discriminate positive and negative sample pairs in self-supervised learning. Finally, a DNN classifier is fine-tuned along with the pre-trained encoder to predict the results of motif occupancy identification. Both proposed contrastive learning models outperform the baseline end-to-end CNN model and SimCLR method, reaching AUC of 0.811 and 0.823, respectively. Compared with the baseline method, our models show better robustness for small samples. Specifically, the self-supervised model is proved to be practicable in transfer learning. MDPI 2022-04-24 /pmc/articles/PMC9103107/ /pubmed/35563090 http://dx.doi.org/10.3390/ijms23094699 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Lin, Ken Quan, Xiongwen Yin, Wenya Zhang, Han A Contrastive Learning Pre-Training Method for Motif Occupancy Identification |
title | A Contrastive Learning Pre-Training Method for Motif Occupancy Identification |
title_full | A Contrastive Learning Pre-Training Method for Motif Occupancy Identification |
title_fullStr | A Contrastive Learning Pre-Training Method for Motif Occupancy Identification |
title_full_unstemmed | A Contrastive Learning Pre-Training Method for Motif Occupancy Identification |
title_short | A Contrastive Learning Pre-Training Method for Motif Occupancy Identification |
title_sort | contrastive learning pre-training method for motif occupancy identification |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9103107/ https://www.ncbi.nlm.nih.gov/pubmed/35563090 http://dx.doi.org/10.3390/ijms23094699 |
work_keys_str_mv | AT linken acontrastivelearningpretrainingmethodformotifoccupancyidentification AT quanxiongwen acontrastivelearningpretrainingmethodformotifoccupancyidentification AT yinwenya acontrastivelearningpretrainingmethodformotifoccupancyidentification AT zhanghan acontrastivelearningpretrainingmethodformotifoccupancyidentification AT linken contrastivelearningpretrainingmethodformotifoccupancyidentification AT quanxiongwen contrastivelearningpretrainingmethodformotifoccupancyidentification AT yinwenya contrastivelearningpretrainingmethodformotifoccupancyidentification AT zhanghan contrastivelearningpretrainingmethodformotifoccupancyidentification |