Cargando…

Opfi: A Python package for identifying gene clusters in large genomics and metagenomics data sets

Gene clusters are sets of co-localized, often contiguous genes that together perform specific functions, many of which are relevant to biotechnology. There is a need for software tools that can extract candidate gene clusters from vast amounts of available genomic data. Therefore, we developed Opfi:...

Descripción completa

Detalles Bibliográficos
Autores principales: Hill, Alexis M., Rybarski, James R., Hu, Kuang, Finkelstein, Ilya J., Wilke, Claus O.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9017871/
https://www.ncbi.nlm.nih.gov/pubmed/35445164
http://dx.doi.org/10.21105/joss.03678
_version_ 1784688874285432832
author Hill, Alexis M.
Rybarski, James R.
Hu, Kuang
Finkelstein, Ilya J.
Wilke, Claus O.
author_facet Hill, Alexis M.
Rybarski, James R.
Hu, Kuang
Finkelstein, Ilya J.
Wilke, Claus O.
author_sort Hill, Alexis M.
collection PubMed
description Gene clusters are sets of co-localized, often contiguous genes that together perform specific functions, many of which are relevant to biotechnology. There is a need for software tools that can extract candidate gene clusters from vast amounts of available genomic data. Therefore, we developed Opfi: a modular pipeline for identification of arbitrary gene clusters in assembled genomic or metagenomic sequences. Opfi contains functions for annotation, de-deduplication, and visualization of putative gene clusters. It utilizes a customizable rule-based filtering approach for selection of candidate systems that adhere to user-defined criteria. Opfi is implemented in Python, and is available on the Python Package Index and on Bioconda (Grüning et al., 2018).
format Online
Article
Text
id pubmed-9017871
institution National Center for Biotechnology Information
language English
publishDate 2021
record_format MEDLINE/PubMed
spelling pubmed-90178712022-04-19 Opfi: A Python package for identifying gene clusters in large genomics and metagenomics data sets Hill, Alexis M. Rybarski, James R. Hu, Kuang Finkelstein, Ilya J. Wilke, Claus O. J Open Source Softw Article Gene clusters are sets of co-localized, often contiguous genes that together perform specific functions, many of which are relevant to biotechnology. There is a need for software tools that can extract candidate gene clusters from vast amounts of available genomic data. Therefore, we developed Opfi: a modular pipeline for identification of arbitrary gene clusters in assembled genomic or metagenomic sequences. Opfi contains functions for annotation, de-deduplication, and visualization of putative gene clusters. It utilizes a customizable rule-based filtering approach for selection of candidate systems that adhere to user-defined criteria. Opfi is implemented in Python, and is available on the Python Package Index and on Bioconda (Grüning et al., 2018). 2021 2021-10-27 /pmc/articles/PMC9017871/ /pubmed/35445164 http://dx.doi.org/10.21105/joss.03678 Text en https://creativecommons.org/licenses/by/4.0/License Authors of papers retain copyright and release the work under a Creative Commons Attribution 4.0 International License (CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/) ).
spellingShingle Article
Hill, Alexis M.
Rybarski, James R.
Hu, Kuang
Finkelstein, Ilya J.
Wilke, Claus O.
Opfi: A Python package for identifying gene clusters in large genomics and metagenomics data sets
title Opfi: A Python package for identifying gene clusters in large genomics and metagenomics data sets
title_full Opfi: A Python package for identifying gene clusters in large genomics and metagenomics data sets
title_fullStr Opfi: A Python package for identifying gene clusters in large genomics and metagenomics data sets
title_full_unstemmed Opfi: A Python package for identifying gene clusters in large genomics and metagenomics data sets
title_short Opfi: A Python package for identifying gene clusters in large genomics and metagenomics data sets
title_sort opfi: a python package for identifying gene clusters in large genomics and metagenomics data sets
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9017871/
https://www.ncbi.nlm.nih.gov/pubmed/35445164
http://dx.doi.org/10.21105/joss.03678
work_keys_str_mv AT hillalexism opfiapythonpackageforidentifyinggeneclustersinlargegenomicsandmetagenomicsdatasets
AT rybarskijamesr opfiapythonpackageforidentifyinggeneclustersinlargegenomicsandmetagenomicsdatasets
AT hukuang opfiapythonpackageforidentifyinggeneclustersinlargegenomicsandmetagenomicsdatasets
AT finkelsteinilyaj opfiapythonpackageforidentifyinggeneclustersinlargegenomicsandmetagenomicsdatasets
AT wilkeclauso opfiapythonpackageforidentifyinggeneclustersinlargegenomicsandmetagenomicsdatasets