Cargando…

A Contrastive Learning Pre-Training Method for Motif Occupancy Identification

Motif occupancy identification is a binary classification task predicting the binding of DNA motif instances to transcription factors, for which several sequence-based methods have been proposed. However, through direct training, these end-to-end methods are lack of biological interpretability withi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lin, Ken, Quan, Xiongwen, Yin, Wenya, Zhang, Han
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9103107/ https://www.ncbi.nlm.nih.gov/pubmed/35563090 http://dx.doi.org/10.3390/ijms23094699

_version_	1784707483649966080
author	Lin, Ken Quan, Xiongwen Yin, Wenya Zhang, Han
author_facet	Lin, Ken Quan, Xiongwen Yin, Wenya Zhang, Han
author_sort	Lin, Ken
collection	PubMed
description	Motif occupancy identification is a binary classification task predicting the binding of DNA motif instances to transcription factors, for which several sequence-based methods have been proposed. However, through direct training, these end-to-end methods are lack of biological interpretability within their sequence representations. In this work, we propose a contrastive learning method to pre-train interpretable and robust DNA encoding for motif occupancy identification. We construct two alternative models to pre-train DNA sequential encoder, respectively: a self-supervised model and a supervised model. We augment the original sequences for contrastive learning with edit operations defined in edit distance. Specifically, we propose a sequence similarity criterion based on the Needleman–Wunsch algorithm to discriminate positive and negative sample pairs in self-supervised learning. Finally, a DNN classifier is fine-tuned along with the pre-trained encoder to predict the results of motif occupancy identification. Both proposed contrastive learning models outperform the baseline end-to-end CNN model and SimCLR method, reaching AUC of 0.811 and 0.823, respectively. Compared with the baseline method, our models show better robustness for small samples. Specifically, the self-supervised model is proved to be practicable in transfer learning.
format	Online Article Text
id	pubmed-9103107
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-91031072022-05-14 A Contrastive Learning Pre-Training Method for Motif Occupancy Identification Lin, Ken Quan, Xiongwen Yin, Wenya Zhang, Han Int J Mol Sci Article Motif occupancy identification is a binary classification task predicting the binding of DNA motif instances to transcription factors, for which several sequence-based methods have been proposed. However, through direct training, these end-to-end methods are lack of biological interpretability within their sequence representations. In this work, we propose a contrastive learning method to pre-train interpretable and robust DNA encoding for motif occupancy identification. We construct two alternative models to pre-train DNA sequential encoder, respectively: a self-supervised model and a supervised model. We augment the original sequences for contrastive learning with edit operations defined in edit distance. Specifically, we propose a sequence similarity criterion based on the Needleman–Wunsch algorithm to discriminate positive and negative sample pairs in self-supervised learning. Finally, a DNN classifier is fine-tuned along with the pre-trained encoder to predict the results of motif occupancy identification. Both proposed contrastive learning models outperform the baseline end-to-end CNN model and SimCLR method, reaching AUC of 0.811 and 0.823, respectively. Compared with the baseline method, our models show better robustness for small samples. Specifically, the self-supervised model is proved to be practicable in transfer learning. MDPI 2022-04-24 /pmc/articles/PMC9103107/ /pubmed/35563090 http://dx.doi.org/10.3390/ijms23094699 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Lin, Ken Quan, Xiongwen Yin, Wenya Zhang, Han A Contrastive Learning Pre-Training Method for Motif Occupancy Identification
title	A Contrastive Learning Pre-Training Method for Motif Occupancy Identification
title_full	A Contrastive Learning Pre-Training Method for Motif Occupancy Identification
title_fullStr	A Contrastive Learning Pre-Training Method for Motif Occupancy Identification
title_full_unstemmed	A Contrastive Learning Pre-Training Method for Motif Occupancy Identification
title_short	A Contrastive Learning Pre-Training Method for Motif Occupancy Identification
title_sort	contrastive learning pre-training method for motif occupancy identification
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9103107/ https://www.ncbi.nlm.nih.gov/pubmed/35563090 http://dx.doi.org/10.3390/ijms23094699
work_keys_str_mv	AT linken acontrastivelearningpretrainingmethodformotifoccupancyidentification AT quanxiongwen acontrastivelearningpretrainingmethodformotifoccupancyidentification AT yinwenya acontrastivelearningpretrainingmethodformotifoccupancyidentification AT zhanghan acontrastivelearningpretrainingmethodformotifoccupancyidentification AT linken contrastivelearningpretrainingmethodformotifoccupancyidentification AT quanxiongwen contrastivelearningpretrainingmethodformotifoccupancyidentification AT yinwenya contrastivelearningpretrainingmethodformotifoccupancyidentification AT zhanghan contrastivelearningpretrainingmethodformotifoccupancyidentification

A Contrastive Learning Pre-Training Method for Motif Occupancy Identification

Ejemplares similares