Cargando…

Unsupervised domain adaptation methods for cross-species transfer of regulatory code signals

Due to advances in NGS technologies whole-genome maps of various functional genomic elements were generated for a dozen of species, however experiments are still expensive and are not available for many species of interest. Deep learning methods became the state-of-the-art computational methods to a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Latyshev, Pavel, Pavlov, Fedor, Herbert, Alan, Poptsova, Maria
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2023
Materias:	Big Data
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10101332/ https://www.ncbi.nlm.nih.gov/pubmed/37063486 http://dx.doi.org/10.3389/fdata.2023.1140663

_version_	1785025488680386560
author	Latyshev, Pavel Pavlov, Fedor Herbert, Alan Poptsova, Maria
author_facet	Latyshev, Pavel Pavlov, Fedor Herbert, Alan Poptsova, Maria
author_sort	Latyshev, Pavel
collection	PubMed
description	Due to advances in NGS technologies whole-genome maps of various functional genomic elements were generated for a dozen of species, however experiments are still expensive and are not available for many species of interest. Deep learning methods became the state-of-the-art computational methods to analyze the available data, but the focus is often only on the species studied. Here we take advantage of the progresses in Transfer Learning in the area of Unsupervised Domain Adaption (UDA) and tested nine UDA methods for prediction of regulatory code signals for genomes of other species. We tested each deep learning implementation by training the model on experimental data from one species, then refined the model using the genome sequence of the target species for which we wanted to make predictions. Among nine tested domain adaptation architectures non-adversarial methods Minimum Class Confusion (MCC) and Deep Adaptation Network (DAN) significantly outperformed others. Conditional Domain Adversarial Network (CDAN) appeared as the third best architecture. Here we provide an empirical assessment of each approach using real world data. The different approaches were tested on ChIP-seq data for transcription factor binding sites and histone marks on human and mouse genomes, but is generalizable to any cross-species transfer of interest. We tested the efficiency of each method using species where experimental data was available for both. The results allows us to assess how well each implementation will work for species for which only limited experimental data is available and will inform the design of future experiments in these understudied organisms. Overall, our results proved the validity of UDA methods for generation of missing experimental data for histone marks and transcription factor binding sites in various genomes and highlights how robust the various approaches are to data that is incomplete, noisy and susceptible to analytic bias.
format	Online Article Text
id	pubmed-10101332
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-101013322023-04-14 Unsupervised domain adaptation methods for cross-species transfer of regulatory code signals Latyshev, Pavel Pavlov, Fedor Herbert, Alan Poptsova, Maria Front Big Data Big Data Due to advances in NGS technologies whole-genome maps of various functional genomic elements were generated for a dozen of species, however experiments are still expensive and are not available for many species of interest. Deep learning methods became the state-of-the-art computational methods to analyze the available data, but the focus is often only on the species studied. Here we take advantage of the progresses in Transfer Learning in the area of Unsupervised Domain Adaption (UDA) and tested nine UDA methods for prediction of regulatory code signals for genomes of other species. We tested each deep learning implementation by training the model on experimental data from one species, then refined the model using the genome sequence of the target species for which we wanted to make predictions. Among nine tested domain adaptation architectures non-adversarial methods Minimum Class Confusion (MCC) and Deep Adaptation Network (DAN) significantly outperformed others. Conditional Domain Adversarial Network (CDAN) appeared as the third best architecture. Here we provide an empirical assessment of each approach using real world data. The different approaches were tested on ChIP-seq data for transcription factor binding sites and histone marks on human and mouse genomes, but is generalizable to any cross-species transfer of interest. We tested the efficiency of each method using species where experimental data was available for both. The results allows us to assess how well each implementation will work for species for which only limited experimental data is available and will inform the design of future experiments in these understudied organisms. Overall, our results proved the validity of UDA methods for generation of missing experimental data for histone marks and transcription factor binding sites in various genomes and highlights how robust the various approaches are to data that is incomplete, noisy and susceptible to analytic bias. Frontiers Media S.A. 2023-03-30 /pmc/articles/PMC10101332/ /pubmed/37063486 http://dx.doi.org/10.3389/fdata.2023.1140663 Text en Copyright © 2023 Latyshev, Pavlov, Herbert and Poptsova. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Big Data Latyshev, Pavel Pavlov, Fedor Herbert, Alan Poptsova, Maria Unsupervised domain adaptation methods for cross-species transfer of regulatory code signals
title	Unsupervised domain adaptation methods for cross-species transfer of regulatory code signals
title_full	Unsupervised domain adaptation methods for cross-species transfer of regulatory code signals
title_fullStr	Unsupervised domain adaptation methods for cross-species transfer of regulatory code signals
title_full_unstemmed	Unsupervised domain adaptation methods for cross-species transfer of regulatory code signals
title_short	Unsupervised domain adaptation methods for cross-species transfer of regulatory code signals
title_sort	unsupervised domain adaptation methods for cross-species transfer of regulatory code signals
topic	Big Data
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10101332/ https://www.ncbi.nlm.nih.gov/pubmed/37063486 http://dx.doi.org/10.3389/fdata.2023.1140663
work_keys_str_mv	AT latyshevpavel unsuperviseddomainadaptationmethodsforcrossspeciestransferofregulatorycodesignals AT pavlovfedor unsuperviseddomainadaptationmethodsforcrossspeciestransferofregulatorycodesignals AT herbertalan unsuperviseddomainadaptationmethodsforcrossspeciestransferofregulatorycodesignals AT poptsovamaria unsuperviseddomainadaptationmethodsforcrossspeciestransferofregulatorycodesignals

Unsupervised domain adaptation methods for cross-species transfer of regulatory code signals

Ejemplares similares