Cargando…

Unsupervised domain adaptation methods for cross-species transfer of regulatory code signals

Due to advances in NGS technologies whole-genome maps of various functional genomic elements were generated for a dozen of species, however experiments are still expensive and are not available for many species of interest. Deep learning methods became the state-of-the-art computational methods to a...

Descripción completa

Detalles Bibliográficos
Autores principales: Latyshev, Pavel, Pavlov, Fedor, Herbert, Alan, Poptsova, Maria
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10101332/
https://www.ncbi.nlm.nih.gov/pubmed/37063486
http://dx.doi.org/10.3389/fdata.2023.1140663
_version_ 1785025488680386560
author Latyshev, Pavel
Pavlov, Fedor
Herbert, Alan
Poptsova, Maria
author_facet Latyshev, Pavel
Pavlov, Fedor
Herbert, Alan
Poptsova, Maria
author_sort Latyshev, Pavel
collection PubMed
description Due to advances in NGS technologies whole-genome maps of various functional genomic elements were generated for a dozen of species, however experiments are still expensive and are not available for many species of interest. Deep learning methods became the state-of-the-art computational methods to analyze the available data, but the focus is often only on the species studied. Here we take advantage of the progresses in Transfer Learning in the area of Unsupervised Domain Adaption (UDA) and tested nine UDA methods for prediction of regulatory code signals for genomes of other species. We tested each deep learning implementation by training the model on experimental data from one species, then refined the model using the genome sequence of the target species for which we wanted to make predictions. Among nine tested domain adaptation architectures non-adversarial methods Minimum Class Confusion (MCC) and Deep Adaptation Network (DAN) significantly outperformed others. Conditional Domain Adversarial Network (CDAN) appeared as the third best architecture. Here we provide an empirical assessment of each approach using real world data. The different approaches were tested on ChIP-seq data for transcription factor binding sites and histone marks on human and mouse genomes, but is generalizable to any cross-species transfer of interest. We tested the efficiency of each method using species where experimental data was available for both. The results allows us to assess how well each implementation will work for species for which only limited experimental data is available and will inform the design of future experiments in these understudied organisms. Overall, our results proved the validity of UDA methods for generation of missing experimental data for histone marks and transcription factor binding sites in various genomes and highlights how robust the various approaches are to data that is incomplete, noisy and susceptible to analytic bias.
format Online
Article
Text
id pubmed-10101332
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-101013322023-04-14 Unsupervised domain adaptation methods for cross-species transfer of regulatory code signals Latyshev, Pavel Pavlov, Fedor Herbert, Alan Poptsova, Maria Front Big Data Big Data Due to advances in NGS technologies whole-genome maps of various functional genomic elements were generated for a dozen of species, however experiments are still expensive and are not available for many species of interest. Deep learning methods became the state-of-the-art computational methods to analyze the available data, but the focus is often only on the species studied. Here we take advantage of the progresses in Transfer Learning in the area of Unsupervised Domain Adaption (UDA) and tested nine UDA methods for prediction of regulatory code signals for genomes of other species. We tested each deep learning implementation by training the model on experimental data from one species, then refined the model using the genome sequence of the target species for which we wanted to make predictions. Among nine tested domain adaptation architectures non-adversarial methods Minimum Class Confusion (MCC) and Deep Adaptation Network (DAN) significantly outperformed others. Conditional Domain Adversarial Network (CDAN) appeared as the third best architecture. Here we provide an empirical assessment of each approach using real world data. The different approaches were tested on ChIP-seq data for transcription factor binding sites and histone marks on human and mouse genomes, but is generalizable to any cross-species transfer of interest. We tested the efficiency of each method using species where experimental data was available for both. The results allows us to assess how well each implementation will work for species for which only limited experimental data is available and will inform the design of future experiments in these understudied organisms. Overall, our results proved the validity of UDA methods for generation of missing experimental data for histone marks and transcription factor binding sites in various genomes and highlights how robust the various approaches are to data that is incomplete, noisy and susceptible to analytic bias. Frontiers Media S.A. 2023-03-30 /pmc/articles/PMC10101332/ /pubmed/37063486 http://dx.doi.org/10.3389/fdata.2023.1140663 Text en Copyright © 2023 Latyshev, Pavlov, Herbert and Poptsova. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Big Data
Latyshev, Pavel
Pavlov, Fedor
Herbert, Alan
Poptsova, Maria
Unsupervised domain adaptation methods for cross-species transfer of regulatory code signals
title Unsupervised domain adaptation methods for cross-species transfer of regulatory code signals
title_full Unsupervised domain adaptation methods for cross-species transfer of regulatory code signals
title_fullStr Unsupervised domain adaptation methods for cross-species transfer of regulatory code signals
title_full_unstemmed Unsupervised domain adaptation methods for cross-species transfer of regulatory code signals
title_short Unsupervised domain adaptation methods for cross-species transfer of regulatory code signals
title_sort unsupervised domain adaptation methods for cross-species transfer of regulatory code signals
topic Big Data
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10101332/
https://www.ncbi.nlm.nih.gov/pubmed/37063486
http://dx.doi.org/10.3389/fdata.2023.1140663
work_keys_str_mv AT latyshevpavel unsuperviseddomainadaptationmethodsforcrossspeciestransferofregulatorycodesignals
AT pavlovfedor unsuperviseddomainadaptationmethodsforcrossspeciestransferofregulatorycodesignals
AT herbertalan unsuperviseddomainadaptationmethodsforcrossspeciestransferofregulatorycodesignals
AT poptsovamaria unsuperviseddomainadaptationmethodsforcrossspeciestransferofregulatorycodesignals