Cargando…

TIVAN-indel: a computational framework for annotating and predicting non-coding regulatory small insertions and deletions

MOTIVATION: Small insertion and deletion (sindel) of human genome has an important implication for human disease. One important mechanism for non-coding sindel (nc-sindel) to have an impact on human diseases and phenotypes is through the regulation of gene expression. Nevertheless, current sequencin...

Descripción completa

Detalles Bibliográficos
Autores principales: Agarwal, Aman, Zhao, Fengdi, Jiang, Yuchao, Chen, Li
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9900211/
https://www.ncbi.nlm.nih.gov/pubmed/36707993
http://dx.doi.org/10.1093/bioinformatics/btad060
_version_ 1784882799971401728
author Agarwal, Aman
Zhao, Fengdi
Jiang, Yuchao
Chen, Li
author_facet Agarwal, Aman
Zhao, Fengdi
Jiang, Yuchao
Chen, Li
author_sort Agarwal, Aman
collection PubMed
description MOTIVATION: Small insertion and deletion (sindel) of human genome has an important implication for human disease. One important mechanism for non-coding sindel (nc-sindel) to have an impact on human diseases and phenotypes is through the regulation of gene expression. Nevertheless, current sequencing experiments may lack statistical power and resolution to pinpoint the functional sindel due to lower minor allele frequency or small effect size. As an alternative strategy, a supervised machine learning method can identify the otherwise masked functional sindels by predicting their regulatory potential directly. However, computational methods for annotating and predicting the regulatory sindels, especially in the non-coding regions, are underdeveloped. RESULTS: By leveraging labeled nc-sindels identified by cis-expression quantitative trait loci analyses across 44 tissues in Genotype-Tissue Expression (GTEx), and a compilation of both generic functional annotations and large-scale epigenomic profiles, we develop TIssue-specific Variant Annotation for Non-coding indel (TIVAN-indel), which is a supervised computational framework for predicting non-coding regulatory sindels. As a result, we demonstrate that TIVAN-indel achieves the best prediction performance in both with-tissue prediction and cross-tissue prediction. As an independent evaluation, we train TIVAN-indel from the ‘Whole Blood’ tissue in GTEx and test the model using 15 immune cell types from an independent study named Database of Immune Cell Expression. Lastly, we perform an enrichment analysis for both true and predicted sindels in key regulatory regions such as chromatin interactions, open chromatin regions and histone modification sites, and find biologically meaningful enrichment patterns. AVAILABILITY AND IMPLEMENTATION: https://github.com/lichen-lab/TIVAN-indel SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9900211
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-99002112023-02-07 TIVAN-indel: a computational framework for annotating and predicting non-coding regulatory small insertions and deletions Agarwal, Aman Zhao, Fengdi Jiang, Yuchao Chen, Li Bioinformatics Original Paper MOTIVATION: Small insertion and deletion (sindel) of human genome has an important implication for human disease. One important mechanism for non-coding sindel (nc-sindel) to have an impact on human diseases and phenotypes is through the regulation of gene expression. Nevertheless, current sequencing experiments may lack statistical power and resolution to pinpoint the functional sindel due to lower minor allele frequency or small effect size. As an alternative strategy, a supervised machine learning method can identify the otherwise masked functional sindels by predicting their regulatory potential directly. However, computational methods for annotating and predicting the regulatory sindels, especially in the non-coding regions, are underdeveloped. RESULTS: By leveraging labeled nc-sindels identified by cis-expression quantitative trait loci analyses across 44 tissues in Genotype-Tissue Expression (GTEx), and a compilation of both generic functional annotations and large-scale epigenomic profiles, we develop TIssue-specific Variant Annotation for Non-coding indel (TIVAN-indel), which is a supervised computational framework for predicting non-coding regulatory sindels. As a result, we demonstrate that TIVAN-indel achieves the best prediction performance in both with-tissue prediction and cross-tissue prediction. As an independent evaluation, we train TIVAN-indel from the ‘Whole Blood’ tissue in GTEx and test the model using 15 immune cell types from an independent study named Database of Immune Cell Expression. Lastly, we perform an enrichment analysis for both true and predicted sindels in key regulatory regions such as chromatin interactions, open chromatin regions and histone modification sites, and find biologically meaningful enrichment patterns. AVAILABILITY AND IMPLEMENTATION: https://github.com/lichen-lab/TIVAN-indel SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2023-01-27 /pmc/articles/PMC9900211/ /pubmed/36707993 http://dx.doi.org/10.1093/bioinformatics/btad060 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Agarwal, Aman
Zhao, Fengdi
Jiang, Yuchao
Chen, Li
TIVAN-indel: a computational framework for annotating and predicting non-coding regulatory small insertions and deletions
title TIVAN-indel: a computational framework for annotating and predicting non-coding regulatory small insertions and deletions
title_full TIVAN-indel: a computational framework for annotating and predicting non-coding regulatory small insertions and deletions
title_fullStr TIVAN-indel: a computational framework for annotating and predicting non-coding regulatory small insertions and deletions
title_full_unstemmed TIVAN-indel: a computational framework for annotating and predicting non-coding regulatory small insertions and deletions
title_short TIVAN-indel: a computational framework for annotating and predicting non-coding regulatory small insertions and deletions
title_sort tivan-indel: a computational framework for annotating and predicting non-coding regulatory small insertions and deletions
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9900211/
https://www.ncbi.nlm.nih.gov/pubmed/36707993
http://dx.doi.org/10.1093/bioinformatics/btad060
work_keys_str_mv AT agarwalaman tivanindelacomputationalframeworkforannotatingandpredictingnoncodingregulatorysmallinsertionsanddeletions
AT zhaofengdi tivanindelacomputationalframeworkforannotatingandpredictingnoncodingregulatorysmallinsertionsanddeletions
AT jiangyuchao tivanindelacomputationalframeworkforannotatingandpredictingnoncodingregulatorysmallinsertionsanddeletions
AT chenli tivanindelacomputationalframeworkforannotatingandpredictingnoncodingregulatorysmallinsertionsanddeletions