Cargando…

DeepSATA: A Deep Learning-Based Sequence Analyzer Incorporating the Transcription Factor Binding Affinity to Dissect the Effects of Non-Coding Genetic Variants

Utilizing large-scale epigenomics data, deep learning tools can predict the regulatory activity of genomic sequences, annotate non-coding genetic variants, and uncover mechanisms behind complex traits. However, these tools primarily rely on human or mouse data for training, limiting their performanc...

Descripción completa

Detalles Bibliográficos
Autores principales: Ma, Wenlong, Fu, Yang, Bao, Yongzhou, Wang, Zhen, Lei, Bowen, Zheng, Weigang, Wang, Chao, Liu, Yuwen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10418434/
https://www.ncbi.nlm.nih.gov/pubmed/37569400
http://dx.doi.org/10.3390/ijms241512023
_version_ 1785088263508197376
author Ma, Wenlong
Fu, Yang
Bao, Yongzhou
Wang, Zhen
Lei, Bowen
Zheng, Weigang
Wang, Chao
Liu, Yuwen
author_facet Ma, Wenlong
Fu, Yang
Bao, Yongzhou
Wang, Zhen
Lei, Bowen
Zheng, Weigang
Wang, Chao
Liu, Yuwen
author_sort Ma, Wenlong
collection PubMed
description Utilizing large-scale epigenomics data, deep learning tools can predict the regulatory activity of genomic sequences, annotate non-coding genetic variants, and uncover mechanisms behind complex traits. However, these tools primarily rely on human or mouse data for training, limiting their performance when applied to other species. Furthermore, the limited exploration of many species, particularly in the case of livestock, has led to a scarcity of comprehensive and high-quality epigenetic data, posing challenges in developing reliable deep learning models for decoding their non-coding genomes. The cross-species prediction of the regulatory genome can be achieved by leveraging publicly available data from extensively studied organisms and making use of the conserved DNA binding preferences of transcription factors within the same tissue. In this study, we introduced DeepSATA, a novel deep learning-based sequence analyzer that incorporates the transcription factor binding affinity for the cross-species prediction of chromatin accessibility. By applying DeepSATA to analyze the genomes of pigs, chickens, cattle, humans, and mice, we demonstrated its ability to improve the prediction accuracy of chromatin accessibility and achieve reliable cross-species predictions in animals. Additionally, we showcased its effectiveness in analyzing pig genetic variants associated with economic traits and in increasing the accuracy of genomic predictions. Overall, our study presents a valuable tool to explore the epigenomic landscape of various species and pinpoint regulatory deoxyribonucleic acid (DNA) variants associated with complex traits.
format Online
Article
Text
id pubmed-10418434
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-104184342023-08-12 DeepSATA: A Deep Learning-Based Sequence Analyzer Incorporating the Transcription Factor Binding Affinity to Dissect the Effects of Non-Coding Genetic Variants Ma, Wenlong Fu, Yang Bao, Yongzhou Wang, Zhen Lei, Bowen Zheng, Weigang Wang, Chao Liu, Yuwen Int J Mol Sci Article Utilizing large-scale epigenomics data, deep learning tools can predict the regulatory activity of genomic sequences, annotate non-coding genetic variants, and uncover mechanisms behind complex traits. However, these tools primarily rely on human or mouse data for training, limiting their performance when applied to other species. Furthermore, the limited exploration of many species, particularly in the case of livestock, has led to a scarcity of comprehensive and high-quality epigenetic data, posing challenges in developing reliable deep learning models for decoding their non-coding genomes. The cross-species prediction of the regulatory genome can be achieved by leveraging publicly available data from extensively studied organisms and making use of the conserved DNA binding preferences of transcription factors within the same tissue. In this study, we introduced DeepSATA, a novel deep learning-based sequence analyzer that incorporates the transcription factor binding affinity for the cross-species prediction of chromatin accessibility. By applying DeepSATA to analyze the genomes of pigs, chickens, cattle, humans, and mice, we demonstrated its ability to improve the prediction accuracy of chromatin accessibility and achieve reliable cross-species predictions in animals. Additionally, we showcased its effectiveness in analyzing pig genetic variants associated with economic traits and in increasing the accuracy of genomic predictions. Overall, our study presents a valuable tool to explore the epigenomic landscape of various species and pinpoint regulatory deoxyribonucleic acid (DNA) variants associated with complex traits. MDPI 2023-07-27 /pmc/articles/PMC10418434/ /pubmed/37569400 http://dx.doi.org/10.3390/ijms241512023 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Ma, Wenlong
Fu, Yang
Bao, Yongzhou
Wang, Zhen
Lei, Bowen
Zheng, Weigang
Wang, Chao
Liu, Yuwen
DeepSATA: A Deep Learning-Based Sequence Analyzer Incorporating the Transcription Factor Binding Affinity to Dissect the Effects of Non-Coding Genetic Variants
title DeepSATA: A Deep Learning-Based Sequence Analyzer Incorporating the Transcription Factor Binding Affinity to Dissect the Effects of Non-Coding Genetic Variants
title_full DeepSATA: A Deep Learning-Based Sequence Analyzer Incorporating the Transcription Factor Binding Affinity to Dissect the Effects of Non-Coding Genetic Variants
title_fullStr DeepSATA: A Deep Learning-Based Sequence Analyzer Incorporating the Transcription Factor Binding Affinity to Dissect the Effects of Non-Coding Genetic Variants
title_full_unstemmed DeepSATA: A Deep Learning-Based Sequence Analyzer Incorporating the Transcription Factor Binding Affinity to Dissect the Effects of Non-Coding Genetic Variants
title_short DeepSATA: A Deep Learning-Based Sequence Analyzer Incorporating the Transcription Factor Binding Affinity to Dissect the Effects of Non-Coding Genetic Variants
title_sort deepsata: a deep learning-based sequence analyzer incorporating the transcription factor binding affinity to dissect the effects of non-coding genetic variants
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10418434/
https://www.ncbi.nlm.nih.gov/pubmed/37569400
http://dx.doi.org/10.3390/ijms241512023
work_keys_str_mv AT mawenlong deepsataadeeplearningbasedsequenceanalyzerincorporatingthetranscriptionfactorbindingaffinitytodissecttheeffectsofnoncodinggeneticvariants
AT fuyang deepsataadeeplearningbasedsequenceanalyzerincorporatingthetranscriptionfactorbindingaffinitytodissecttheeffectsofnoncodinggeneticvariants
AT baoyongzhou deepsataadeeplearningbasedsequenceanalyzerincorporatingthetranscriptionfactorbindingaffinitytodissecttheeffectsofnoncodinggeneticvariants
AT wangzhen deepsataadeeplearningbasedsequenceanalyzerincorporatingthetranscriptionfactorbindingaffinitytodissecttheeffectsofnoncodinggeneticvariants
AT leibowen deepsataadeeplearningbasedsequenceanalyzerincorporatingthetranscriptionfactorbindingaffinitytodissecttheeffectsofnoncodinggeneticvariants
AT zhengweigang deepsataadeeplearningbasedsequenceanalyzerincorporatingthetranscriptionfactorbindingaffinitytodissecttheeffectsofnoncodinggeneticvariants
AT wangchao deepsataadeeplearningbasedsequenceanalyzerincorporatingthetranscriptionfactorbindingaffinitytodissecttheeffectsofnoncodinggeneticvariants
AT liuyuwen deepsataadeeplearningbasedsequenceanalyzerincorporatingthetranscriptionfactorbindingaffinitytodissecttheeffectsofnoncodinggeneticvariants