Cargando…

getSequenceInfo: a suite of tools allowing to get genome sequence information from public repositories

BACKGROUND: Biological sequences are increasing rapidly and exponentially worldwide. Nucleotide sequence databases play an important role in providing meaningful genomic information on a variety of biological organisms. RESULTS: The getSequenceInfo software tool allows to access sequence information...

Descripción completa

Detalles Bibliográficos
Autores principales: Moco, Vincent, Cazenave, Damien, Garnier, Maëlle, Pot, Matthieu, Marcelino, Isabel, Talarmin, Antoine, Guyomard-Rabenirina, Stéphanie, Breurec, Sébastien, Ferdinand, Séverine, Dereeper, Alexis, Reynaud, Yann, Couvin, David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9264741/
https://www.ncbi.nlm.nih.gov/pubmed/35804320
http://dx.doi.org/10.1186/s12859-022-04809-5
_version_ 1784743028648312832
author Moco, Vincent
Cazenave, Damien
Garnier, Maëlle
Pot, Matthieu
Marcelino, Isabel
Talarmin, Antoine
Guyomard-Rabenirina, Stéphanie
Breurec, Sébastien
Ferdinand, Séverine
Dereeper, Alexis
Reynaud, Yann
Couvin, David
author_facet Moco, Vincent
Cazenave, Damien
Garnier, Maëlle
Pot, Matthieu
Marcelino, Isabel
Talarmin, Antoine
Guyomard-Rabenirina, Stéphanie
Breurec, Sébastien
Ferdinand, Séverine
Dereeper, Alexis
Reynaud, Yann
Couvin, David
author_sort Moco, Vincent
collection PubMed
description BACKGROUND: Biological sequences are increasing rapidly and exponentially worldwide. Nucleotide sequence databases play an important role in providing meaningful genomic information on a variety of biological organisms. RESULTS: The getSequenceInfo software tool allows to access sequence information from various public repositories (GenBank, RefSeq, and the European Nucleotide Archive), and is compatible with different operating systems (Linux, MacOS, and Microsoft Windows) in a programmatic way (command line) or as a graphical user interface. getSequenceInfo or gSeqI v1.0 should help users to get some information on queried sequences that could be useful for specific studies (e.g. the country of origin/isolation or the release date of queried sequences). Queries can be made to retrieve sequence data based on a given kingdom and species, or from a given date. This program allows the separation between chromosomes and plasmids (or other genetic elements/components) by arranging each component in a given folder. Some basic statistics are also performed by the program (such as the calculation of GC content for queried assemblies). An empirically designed nucleotide ratio is calculated using nucleotide information in order to tentatively provide a “NucleScore” for studied genome assemblies. Besides the main gSeqI tool, other additional tools have been developed to perform various tasks related to sequence analysis. CONCLUSION: The aim of this study is to democratize the use of public repositories in programmatic ways, and to facilitate sequence data analysis in a pedagogical perspective. Output results are available in FASTA, FASTQ, Excel/TSV or HTML formats. The program is freely available at: https://github.com/karubiotools/getSequenceInfo. getSequenceInfo and supplementary tools are partly available through the recently released Galaxy KaruBioNet platform (http://calamar.univ-ag.fr/c3i/galaxy_karubionet.html). SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04809-5.
format Online
Article
Text
id pubmed-9264741
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-92647412022-07-08 getSequenceInfo: a suite of tools allowing to get genome sequence information from public repositories Moco, Vincent Cazenave, Damien Garnier, Maëlle Pot, Matthieu Marcelino, Isabel Talarmin, Antoine Guyomard-Rabenirina, Stéphanie Breurec, Sébastien Ferdinand, Séverine Dereeper, Alexis Reynaud, Yann Couvin, David BMC Bioinformatics Software BACKGROUND: Biological sequences are increasing rapidly and exponentially worldwide. Nucleotide sequence databases play an important role in providing meaningful genomic information on a variety of biological organisms. RESULTS: The getSequenceInfo software tool allows to access sequence information from various public repositories (GenBank, RefSeq, and the European Nucleotide Archive), and is compatible with different operating systems (Linux, MacOS, and Microsoft Windows) in a programmatic way (command line) or as a graphical user interface. getSequenceInfo or gSeqI v1.0 should help users to get some information on queried sequences that could be useful for specific studies (e.g. the country of origin/isolation or the release date of queried sequences). Queries can be made to retrieve sequence data based on a given kingdom and species, or from a given date. This program allows the separation between chromosomes and plasmids (or other genetic elements/components) by arranging each component in a given folder. Some basic statistics are also performed by the program (such as the calculation of GC content for queried assemblies). An empirically designed nucleotide ratio is calculated using nucleotide information in order to tentatively provide a “NucleScore” for studied genome assemblies. Besides the main gSeqI tool, other additional tools have been developed to perform various tasks related to sequence analysis. CONCLUSION: The aim of this study is to democratize the use of public repositories in programmatic ways, and to facilitate sequence data analysis in a pedagogical perspective. Output results are available in FASTA, FASTQ, Excel/TSV or HTML formats. The program is freely available at: https://github.com/karubiotools/getSequenceInfo. getSequenceInfo and supplementary tools are partly available through the recently released Galaxy KaruBioNet platform (http://calamar.univ-ag.fr/c3i/galaxy_karubionet.html). SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04809-5. BioMed Central 2022-07-08 /pmc/articles/PMC9264741/ /pubmed/35804320 http://dx.doi.org/10.1186/s12859-022-04809-5 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Moco, Vincent
Cazenave, Damien
Garnier, Maëlle
Pot, Matthieu
Marcelino, Isabel
Talarmin, Antoine
Guyomard-Rabenirina, Stéphanie
Breurec, Sébastien
Ferdinand, Séverine
Dereeper, Alexis
Reynaud, Yann
Couvin, David
getSequenceInfo: a suite of tools allowing to get genome sequence information from public repositories
title getSequenceInfo: a suite of tools allowing to get genome sequence information from public repositories
title_full getSequenceInfo: a suite of tools allowing to get genome sequence information from public repositories
title_fullStr getSequenceInfo: a suite of tools allowing to get genome sequence information from public repositories
title_full_unstemmed getSequenceInfo: a suite of tools allowing to get genome sequence information from public repositories
title_short getSequenceInfo: a suite of tools allowing to get genome sequence information from public repositories
title_sort getsequenceinfo: a suite of tools allowing to get genome sequence information from public repositories
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9264741/
https://www.ncbi.nlm.nih.gov/pubmed/35804320
http://dx.doi.org/10.1186/s12859-022-04809-5
work_keys_str_mv AT mocovincent getsequenceinfoasuiteoftoolsallowingtogetgenomesequenceinformationfrompublicrepositories
AT cazenavedamien getsequenceinfoasuiteoftoolsallowingtogetgenomesequenceinformationfrompublicrepositories
AT garniermaelle getsequenceinfoasuiteoftoolsallowingtogetgenomesequenceinformationfrompublicrepositories
AT potmatthieu getsequenceinfoasuiteoftoolsallowingtogetgenomesequenceinformationfrompublicrepositories
AT marcelinoisabel getsequenceinfoasuiteoftoolsallowingtogetgenomesequenceinformationfrompublicrepositories
AT talarminantoine getsequenceinfoasuiteoftoolsallowingtogetgenomesequenceinformationfrompublicrepositories
AT guyomardrabenirinastephanie getsequenceinfoasuiteoftoolsallowingtogetgenomesequenceinformationfrompublicrepositories
AT breurecsebastien getsequenceinfoasuiteoftoolsallowingtogetgenomesequenceinformationfrompublicrepositories
AT ferdinandseverine getsequenceinfoasuiteoftoolsallowingtogetgenomesequenceinformationfrompublicrepositories
AT dereeperalexis getsequenceinfoasuiteoftoolsallowingtogetgenomesequenceinformationfrompublicrepositories
AT reynaudyann getsequenceinfoasuiteoftoolsallowingtogetgenomesequenceinformationfrompublicrepositories
AT couvindavid getsequenceinfoasuiteoftoolsallowingtogetgenomesequenceinformationfrompublicrepositories