Cargando…
getSequenceInfo: a suite of tools allowing to get genome sequence information from public repositories
BACKGROUND: Biological sequences are increasing rapidly and exponentially worldwide. Nucleotide sequence databases play an important role in providing meaningful genomic information on a variety of biological organisms. RESULTS: The getSequenceInfo software tool allows to access sequence information...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9264741/ https://www.ncbi.nlm.nih.gov/pubmed/35804320 http://dx.doi.org/10.1186/s12859-022-04809-5 |
_version_ | 1784743028648312832 |
---|---|
author | Moco, Vincent Cazenave, Damien Garnier, Maëlle Pot, Matthieu Marcelino, Isabel Talarmin, Antoine Guyomard-Rabenirina, Stéphanie Breurec, Sébastien Ferdinand, Séverine Dereeper, Alexis Reynaud, Yann Couvin, David |
author_facet | Moco, Vincent Cazenave, Damien Garnier, Maëlle Pot, Matthieu Marcelino, Isabel Talarmin, Antoine Guyomard-Rabenirina, Stéphanie Breurec, Sébastien Ferdinand, Séverine Dereeper, Alexis Reynaud, Yann Couvin, David |
author_sort | Moco, Vincent |
collection | PubMed |
description | BACKGROUND: Biological sequences are increasing rapidly and exponentially worldwide. Nucleotide sequence databases play an important role in providing meaningful genomic information on a variety of biological organisms. RESULTS: The getSequenceInfo software tool allows to access sequence information from various public repositories (GenBank, RefSeq, and the European Nucleotide Archive), and is compatible with different operating systems (Linux, MacOS, and Microsoft Windows) in a programmatic way (command line) or as a graphical user interface. getSequenceInfo or gSeqI v1.0 should help users to get some information on queried sequences that could be useful for specific studies (e.g. the country of origin/isolation or the release date of queried sequences). Queries can be made to retrieve sequence data based on a given kingdom and species, or from a given date. This program allows the separation between chromosomes and plasmids (or other genetic elements/components) by arranging each component in a given folder. Some basic statistics are also performed by the program (such as the calculation of GC content for queried assemblies). An empirically designed nucleotide ratio is calculated using nucleotide information in order to tentatively provide a “NucleScore” for studied genome assemblies. Besides the main gSeqI tool, other additional tools have been developed to perform various tasks related to sequence analysis. CONCLUSION: The aim of this study is to democratize the use of public repositories in programmatic ways, and to facilitate sequence data analysis in a pedagogical perspective. Output results are available in FASTA, FASTQ, Excel/TSV or HTML formats. The program is freely available at: https://github.com/karubiotools/getSequenceInfo. getSequenceInfo and supplementary tools are partly available through the recently released Galaxy KaruBioNet platform (http://calamar.univ-ag.fr/c3i/galaxy_karubionet.html). SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04809-5. |
format | Online Article Text |
id | pubmed-9264741 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-92647412022-07-08 getSequenceInfo: a suite of tools allowing to get genome sequence information from public repositories Moco, Vincent Cazenave, Damien Garnier, Maëlle Pot, Matthieu Marcelino, Isabel Talarmin, Antoine Guyomard-Rabenirina, Stéphanie Breurec, Sébastien Ferdinand, Séverine Dereeper, Alexis Reynaud, Yann Couvin, David BMC Bioinformatics Software BACKGROUND: Biological sequences are increasing rapidly and exponentially worldwide. Nucleotide sequence databases play an important role in providing meaningful genomic information on a variety of biological organisms. RESULTS: The getSequenceInfo software tool allows to access sequence information from various public repositories (GenBank, RefSeq, and the European Nucleotide Archive), and is compatible with different operating systems (Linux, MacOS, and Microsoft Windows) in a programmatic way (command line) or as a graphical user interface. getSequenceInfo or gSeqI v1.0 should help users to get some information on queried sequences that could be useful for specific studies (e.g. the country of origin/isolation or the release date of queried sequences). Queries can be made to retrieve sequence data based on a given kingdom and species, or from a given date. This program allows the separation between chromosomes and plasmids (or other genetic elements/components) by arranging each component in a given folder. Some basic statistics are also performed by the program (such as the calculation of GC content for queried assemblies). An empirically designed nucleotide ratio is calculated using nucleotide information in order to tentatively provide a “NucleScore” for studied genome assemblies. Besides the main gSeqI tool, other additional tools have been developed to perform various tasks related to sequence analysis. CONCLUSION: The aim of this study is to democratize the use of public repositories in programmatic ways, and to facilitate sequence data analysis in a pedagogical perspective. Output results are available in FASTA, FASTQ, Excel/TSV or HTML formats. The program is freely available at: https://github.com/karubiotools/getSequenceInfo. getSequenceInfo and supplementary tools are partly available through the recently released Galaxy KaruBioNet platform (http://calamar.univ-ag.fr/c3i/galaxy_karubionet.html). SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04809-5. BioMed Central 2022-07-08 /pmc/articles/PMC9264741/ /pubmed/35804320 http://dx.doi.org/10.1186/s12859-022-04809-5 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Software Moco, Vincent Cazenave, Damien Garnier, Maëlle Pot, Matthieu Marcelino, Isabel Talarmin, Antoine Guyomard-Rabenirina, Stéphanie Breurec, Sébastien Ferdinand, Séverine Dereeper, Alexis Reynaud, Yann Couvin, David getSequenceInfo: a suite of tools allowing to get genome sequence information from public repositories |
title | getSequenceInfo: a suite of tools allowing to get genome sequence information from public repositories |
title_full | getSequenceInfo: a suite of tools allowing to get genome sequence information from public repositories |
title_fullStr | getSequenceInfo: a suite of tools allowing to get genome sequence information from public repositories |
title_full_unstemmed | getSequenceInfo: a suite of tools allowing to get genome sequence information from public repositories |
title_short | getSequenceInfo: a suite of tools allowing to get genome sequence information from public repositories |
title_sort | getsequenceinfo: a suite of tools allowing to get genome sequence information from public repositories |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9264741/ https://www.ncbi.nlm.nih.gov/pubmed/35804320 http://dx.doi.org/10.1186/s12859-022-04809-5 |
work_keys_str_mv | AT mocovincent getsequenceinfoasuiteoftoolsallowingtogetgenomesequenceinformationfrompublicrepositories AT cazenavedamien getsequenceinfoasuiteoftoolsallowingtogetgenomesequenceinformationfrompublicrepositories AT garniermaelle getsequenceinfoasuiteoftoolsallowingtogetgenomesequenceinformationfrompublicrepositories AT potmatthieu getsequenceinfoasuiteoftoolsallowingtogetgenomesequenceinformationfrompublicrepositories AT marcelinoisabel getsequenceinfoasuiteoftoolsallowingtogetgenomesequenceinformationfrompublicrepositories AT talarminantoine getsequenceinfoasuiteoftoolsallowingtogetgenomesequenceinformationfrompublicrepositories AT guyomardrabenirinastephanie getsequenceinfoasuiteoftoolsallowingtogetgenomesequenceinformationfrompublicrepositories AT breurecsebastien getsequenceinfoasuiteoftoolsallowingtogetgenomesequenceinformationfrompublicrepositories AT ferdinandseverine getsequenceinfoasuiteoftoolsallowingtogetgenomesequenceinformationfrompublicrepositories AT dereeperalexis getsequenceinfoasuiteoftoolsallowingtogetgenomesequenceinformationfrompublicrepositories AT reynaudyann getsequenceinfoasuiteoftoolsallowingtogetgenomesequenceinformationfrompublicrepositories AT couvindavid getsequenceinfoasuiteoftoolsallowingtogetgenomesequenceinformationfrompublicrepositories |