Cargando…
Genofunc: genome annotation and identification of genome features for automated pipelining analysis of virus whole genome sequences
BACKGROUND: Viral genomics and epidemiology have been increasingly important tools for analysing the spread of key pathogens affecting daily lives of individuals worldwide. With the rapidly expanding scale of pathogen genome sequencing efforts for epidemics and outbreaks efficient workflows in extra...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10227794/ https://www.ncbi.nlm.nih.gov/pubmed/37254048 http://dx.doi.org/10.1186/s12859-023-05356-3 |
_version_ | 1785050846817419264 |
---|---|
author | Yu, Xiaoyu |
author_facet | Yu, Xiaoyu |
author_sort | Yu, Xiaoyu |
collection | PubMed |
description | BACKGROUND: Viral genomics and epidemiology have been increasingly important tools for analysing the spread of key pathogens affecting daily lives of individuals worldwide. With the rapidly expanding scale of pathogen genome sequencing efforts for epidemics and outbreaks efficient workflows in extracting genomic information are becoming increasingly important for answering key research questions. RESULTS: Here we present Genofunc, a toolkit offering a range of command line orientated functions for processing of raw virus genome sequences into aligned and annotated data ready for analysis. The tool contains functions such as genome annotation, feature extraction etc. for processing of large genomic datasets both manual or as part of pipeline such as Snakemake or Nextflow ready for down-stream phylogenetic analysis. Originally designed for a large-scale HIV sequencing project, Genofunc has been benchmarked against annotated sequence gene coordinates from the Los Alamos HIV database as validation with downstream phylogenetic analysis result comparable to past literature as case study. CONCLUSION: Genofunc is implemented fully in Python and licensed under the MIT license. Source code and documentation is available at: https://github.com/xiaoyu518/genofunc. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05356-3. |
format | Online Article Text |
id | pubmed-10227794 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-102277942023-05-31 Genofunc: genome annotation and identification of genome features for automated pipelining analysis of virus whole genome sequences Yu, Xiaoyu BMC Bioinformatics Software BACKGROUND: Viral genomics and epidemiology have been increasingly important tools for analysing the spread of key pathogens affecting daily lives of individuals worldwide. With the rapidly expanding scale of pathogen genome sequencing efforts for epidemics and outbreaks efficient workflows in extracting genomic information are becoming increasingly important for answering key research questions. RESULTS: Here we present Genofunc, a toolkit offering a range of command line orientated functions for processing of raw virus genome sequences into aligned and annotated data ready for analysis. The tool contains functions such as genome annotation, feature extraction etc. for processing of large genomic datasets both manual or as part of pipeline such as Snakemake or Nextflow ready for down-stream phylogenetic analysis. Originally designed for a large-scale HIV sequencing project, Genofunc has been benchmarked against annotated sequence gene coordinates from the Los Alamos HIV database as validation with downstream phylogenetic analysis result comparable to past literature as case study. CONCLUSION: Genofunc is implemented fully in Python and licensed under the MIT license. Source code and documentation is available at: https://github.com/xiaoyu518/genofunc. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05356-3. BioMed Central 2023-05-30 /pmc/articles/PMC10227794/ /pubmed/37254048 http://dx.doi.org/10.1186/s12859-023-05356-3 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Software Yu, Xiaoyu Genofunc: genome annotation and identification of genome features for automated pipelining analysis of virus whole genome sequences |
title | Genofunc: genome annotation and identification of genome features for automated pipelining analysis of virus whole genome sequences |
title_full | Genofunc: genome annotation and identification of genome features for automated pipelining analysis of virus whole genome sequences |
title_fullStr | Genofunc: genome annotation and identification of genome features for automated pipelining analysis of virus whole genome sequences |
title_full_unstemmed | Genofunc: genome annotation and identification of genome features for automated pipelining analysis of virus whole genome sequences |
title_short | Genofunc: genome annotation and identification of genome features for automated pipelining analysis of virus whole genome sequences |
title_sort | genofunc: genome annotation and identification of genome features for automated pipelining analysis of virus whole genome sequences |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10227794/ https://www.ncbi.nlm.nih.gov/pubmed/37254048 http://dx.doi.org/10.1186/s12859-023-05356-3 |
work_keys_str_mv | AT yuxiaoyu genofuncgenomeannotationandidentificationofgenomefeaturesforautomatedpipelininganalysisofviruswholegenomesequences |