Cargando…

CSN: unsupervised approach for inferring biological networks based on the genome alone

BACKGROUND: Most organisms cannot be cultivated, as they live in unique ecological conditions that cannot be mimicked in the lab. Understanding the functionality of those organisms’ genes and their interactions by performing large-scale measurements of transcription levels, protein-protein interacti...

Descripción completa

Detalles Bibliográficos
Autores principales: Galili, Maya, Tuller, Tamir
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7227238/
https://www.ncbi.nlm.nih.gov/pubmed/32414319
http://dx.doi.org/10.1186/s12859-020-3479-9
_version_ 1783534462167941120
author Galili, Maya
Tuller, Tamir
author_facet Galili, Maya
Tuller, Tamir
author_sort Galili, Maya
collection PubMed
description BACKGROUND: Most organisms cannot be cultivated, as they live in unique ecological conditions that cannot be mimicked in the lab. Understanding the functionality of those organisms’ genes and their interactions by performing large-scale measurements of transcription levels, protein-protein interactions or metabolism, is extremely difficult and, in some cases, impossible. Thus, efficient algorithms for deciphering genome functionality based only on the genomic sequences with no other experimental measurements are needed. RESULTS: In this study, we describe a novel algorithm that infers gene networks that we name Common Substring Network (CSN). The algorithm enables inferring novel regulatory relations among genes based only on the genomic sequence of a given organism and partial homolog/ortholog-based functional annotation. It can specifically infer the functional annotation of genes with unknown homology. This approach is based on the assumption that related genes, not necessarily homologs, tend to share sub-sequences, which may be related to common regulatory mechanisms, similar functionality of encoded proteins, common evolutionary history, and more. We demonstrate that CSNs, which are based on S. cerevisiae and E. coli genomes, have properties similar to ‘traditional’ biological networks inferred from experiments. Highly expressed genes tend to have higher degree nodes in the CSN, genes with similar protein functionality tend to be closer, and the CSN graph exhibits a power-law degree distribution. Also, we show how the CSN can be used for predicting gene interactions and functions. CONCLUSIONS: The reported results suggest that ‘silent’ code inside the transcript can help to predict central features of biological networks and gene function. This approach can help researchers to understand the genome of novel microorganisms, analyze metagenomic data, and can help to decipher new gene functions. AVAILABILITY: Our MATLAB implementation of CSN is available at https://www.cs.tau.ac.il/~tamirtul/CSN-Autogen
format Online
Article
Text
id pubmed-7227238
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-72272382020-05-27 CSN: unsupervised approach for inferring biological networks based on the genome alone Galili, Maya Tuller, Tamir BMC Bioinformatics Methodology Article BACKGROUND: Most organisms cannot be cultivated, as they live in unique ecological conditions that cannot be mimicked in the lab. Understanding the functionality of those organisms’ genes and their interactions by performing large-scale measurements of transcription levels, protein-protein interactions or metabolism, is extremely difficult and, in some cases, impossible. Thus, efficient algorithms for deciphering genome functionality based only on the genomic sequences with no other experimental measurements are needed. RESULTS: In this study, we describe a novel algorithm that infers gene networks that we name Common Substring Network (CSN). The algorithm enables inferring novel regulatory relations among genes based only on the genomic sequence of a given organism and partial homolog/ortholog-based functional annotation. It can specifically infer the functional annotation of genes with unknown homology. This approach is based on the assumption that related genes, not necessarily homologs, tend to share sub-sequences, which may be related to common regulatory mechanisms, similar functionality of encoded proteins, common evolutionary history, and more. We demonstrate that CSNs, which are based on S. cerevisiae and E. coli genomes, have properties similar to ‘traditional’ biological networks inferred from experiments. Highly expressed genes tend to have higher degree nodes in the CSN, genes with similar protein functionality tend to be closer, and the CSN graph exhibits a power-law degree distribution. Also, we show how the CSN can be used for predicting gene interactions and functions. CONCLUSIONS: The reported results suggest that ‘silent’ code inside the transcript can help to predict central features of biological networks and gene function. This approach can help researchers to understand the genome of novel microorganisms, analyze metagenomic data, and can help to decipher new gene functions. AVAILABILITY: Our MATLAB implementation of CSN is available at https://www.cs.tau.ac.il/~tamirtul/CSN-Autogen BioMed Central 2020-05-15 /pmc/articles/PMC7227238/ /pubmed/32414319 http://dx.doi.org/10.1186/s12859-020-3479-9 Text en © The Author(s). 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Galili, Maya
Tuller, Tamir
CSN: unsupervised approach for inferring biological networks based on the genome alone
title CSN: unsupervised approach for inferring biological networks based on the genome alone
title_full CSN: unsupervised approach for inferring biological networks based on the genome alone
title_fullStr CSN: unsupervised approach for inferring biological networks based on the genome alone
title_full_unstemmed CSN: unsupervised approach for inferring biological networks based on the genome alone
title_short CSN: unsupervised approach for inferring biological networks based on the genome alone
title_sort csn: unsupervised approach for inferring biological networks based on the genome alone
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7227238/
https://www.ncbi.nlm.nih.gov/pubmed/32414319
http://dx.doi.org/10.1186/s12859-020-3479-9
work_keys_str_mv AT galilimaya csnunsupervisedapproachforinferringbiologicalnetworksbasedonthegenomealone
AT tullertamir csnunsupervisedapproachforinferringbiologicalnetworksbasedonthegenomealone