Cargando…

A self-supervised deep learning method for data-efficient training in genomics

Deep learning in bioinformatics is often limited to problems where extensive amounts of labeled data are available for supervised classification. By exploiting unlabeled data, self-supervised learning techniques can improve the performance of machine learning models in the presence of limited labele...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gündüz, Hüseyin Anil, Binder, Martin, To, Xiao-Yin, Mreches, René, Bischl, Bernd, McHardy, Alice C., Münch, Philipp C., Rezaei, Mina
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10495322/ https://www.ncbi.nlm.nih.gov/pubmed/37696966 http://dx.doi.org/10.1038/s42003-023-05310-2

Descripción
Sumario:	Deep learning in bioinformatics is often limited to problems where extensive amounts of labeled data are available for supervised classification. By exploiting unlabeled data, self-supervised learning techniques can improve the performance of machine learning models in the presence of limited labeled data. Although many self-supervised learning methods have been suggested before, they have failed to exploit the unique characteristics of genomic data. Therefore, we introduce Self-GenomeNet, a self-supervised learning technique that is custom-tailored for genomic data. Self-GenomeNet leverages reverse-complement sequences and effectively learns short- and long-term dependencies by predicting targets of different lengths. Self-GenomeNet performs better than other self-supervised methods in data-scarce genomic tasks and outperforms standard supervised training with ~10 times fewer labeled training data. Furthermore, the learned representations generalize well to new datasets and tasks. These findings suggest that Self-GenomeNet is well suited for large-scale, unlabeled genomic datasets and could substantially improve the performance of genomic models.

A self-supervised deep learning method for data-efficient training in genomics

Ejemplares similares