Cargando…

Gnocis: An integrated system for interactive and reproducible analysis and modelling of cis-regulatory elements in Python 3

Gene expression is regulated through cis-regulatory elements (CREs), among which are promoters, enhancers, Polycomb/Trithorax Response Elements (PREs), silencers and insulators. Computational prediction of CREs can be achieved using a variety of statistical and machine learning methods combined with...

Descripción completa

Detalles Bibliográficos
Autores principales: Bredesen-Aa, Bjørn André, Rehmsmeier, Marc
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9462789/
https://www.ncbi.nlm.nih.gov/pubmed/36084008
http://dx.doi.org/10.1371/journal.pone.0274338
_version_ 1784787268117987328
author Bredesen-Aa, Bjørn André
Rehmsmeier, Marc
author_facet Bredesen-Aa, Bjørn André
Rehmsmeier, Marc
author_sort Bredesen-Aa, Bjørn André
collection PubMed
description Gene expression is regulated through cis-regulatory elements (CREs), among which are promoters, enhancers, Polycomb/Trithorax Response Elements (PREs), silencers and insulators. Computational prediction of CREs can be achieved using a variety of statistical and machine learning methods combined with different feature space formulations. Although Python packages for DNA sequence feature sets and for machine learning are available, no existing package facilitates the combination of DNA sequence feature sets with machine learning methods for the genome-wide prediction of candidate CREs. We here present Gnocis, a Python package that streamlines the analysis and the modelling of CRE sequences by providing extensible APIs and implementing the glue required for combining feature sets and models for genome-wide prediction. Gnocis implements a variety of base feature sets, including motif pair occurrence frequencies and the k-spectrum mismatch kernel. It integrates with Scikit-learn and TensorFlow for state-of-the-art machine learning. Gnocis additionally implements a broad suite of tools for the handling and preparation of sequence, region and curve data, which can be useful for general DNA bioinformatics in Python. We also present Deep-MOCCA, a neural network architecture inspired by SVM-MOCCA that achieves moderate to high generalization without prior motif knowledge. To demonstrate the use of Gnocis, we applied multiple machine learning methods to the modelling of D. melanogaster PREs, including a Convolutional Neural Network (CNN), making this the first study to model PREs with CNNs. The models are readily adapted to new CRE modelling problems and to other organisms. In order to produce a high-performance, compiled package for Python 3, we implemented Gnocis in Cython. Gnocis can be installed using the PyPI package manager by running ‘pip install gnocis’. The source code is available on GitHub, at https://github.com/bjornbredesen/gnocis.
format Online
Article
Text
id pubmed-9462789
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-94627892022-09-10 Gnocis: An integrated system for interactive and reproducible analysis and modelling of cis-regulatory elements in Python 3 Bredesen-Aa, Bjørn André Rehmsmeier, Marc PLoS One Research Article Gene expression is regulated through cis-regulatory elements (CREs), among which are promoters, enhancers, Polycomb/Trithorax Response Elements (PREs), silencers and insulators. Computational prediction of CREs can be achieved using a variety of statistical and machine learning methods combined with different feature space formulations. Although Python packages for DNA sequence feature sets and for machine learning are available, no existing package facilitates the combination of DNA sequence feature sets with machine learning methods for the genome-wide prediction of candidate CREs. We here present Gnocis, a Python package that streamlines the analysis and the modelling of CRE sequences by providing extensible APIs and implementing the glue required for combining feature sets and models for genome-wide prediction. Gnocis implements a variety of base feature sets, including motif pair occurrence frequencies and the k-spectrum mismatch kernel. It integrates with Scikit-learn and TensorFlow for state-of-the-art machine learning. Gnocis additionally implements a broad suite of tools for the handling and preparation of sequence, region and curve data, which can be useful for general DNA bioinformatics in Python. We also present Deep-MOCCA, a neural network architecture inspired by SVM-MOCCA that achieves moderate to high generalization without prior motif knowledge. To demonstrate the use of Gnocis, we applied multiple machine learning methods to the modelling of D. melanogaster PREs, including a Convolutional Neural Network (CNN), making this the first study to model PREs with CNNs. The models are readily adapted to new CRE modelling problems and to other organisms. In order to produce a high-performance, compiled package for Python 3, we implemented Gnocis in Cython. Gnocis can be installed using the PyPI package manager by running ‘pip install gnocis’. The source code is available on GitHub, at https://github.com/bjornbredesen/gnocis. Public Library of Science 2022-09-09 /pmc/articles/PMC9462789/ /pubmed/36084008 http://dx.doi.org/10.1371/journal.pone.0274338 Text en © 2022 Bredesen-Aa, Rehmsmeier https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Bredesen-Aa, Bjørn André
Rehmsmeier, Marc
Gnocis: An integrated system for interactive and reproducible analysis and modelling of cis-regulatory elements in Python 3
title Gnocis: An integrated system for interactive and reproducible analysis and modelling of cis-regulatory elements in Python 3
title_full Gnocis: An integrated system for interactive and reproducible analysis and modelling of cis-regulatory elements in Python 3
title_fullStr Gnocis: An integrated system for interactive and reproducible analysis and modelling of cis-regulatory elements in Python 3
title_full_unstemmed Gnocis: An integrated system for interactive and reproducible analysis and modelling of cis-regulatory elements in Python 3
title_short Gnocis: An integrated system for interactive and reproducible analysis and modelling of cis-regulatory elements in Python 3
title_sort gnocis: an integrated system for interactive and reproducible analysis and modelling of cis-regulatory elements in python 3
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9462789/
https://www.ncbi.nlm.nih.gov/pubmed/36084008
http://dx.doi.org/10.1371/journal.pone.0274338
work_keys_str_mv AT bredesenaabjørnandre gnocisanintegratedsystemforinteractiveandreproducibleanalysisandmodellingofcisregulatoryelementsinpython3
AT rehmsmeiermarc gnocisanintegratedsystemforinteractiveandreproducibleanalysisandmodellingofcisregulatoryelementsinpython3