Cargando…

KmerGO: A Tool to Identify Group-Specific Sequences With k-mers

Capturing group-specific sequences between two groups of genomic/metagenomic sequences is critical for the follow-up identifications of singular nucleotide variants (SNVs), gene families, microbial species or other elements associated with each group. A sequence that is present, or rich, in one grou...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Ying, Chen, Qi, Deng, Chao, Zheng, Yiluan, Sun, Fengzhu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7477287/
https://www.ncbi.nlm.nih.gov/pubmed/32983048
http://dx.doi.org/10.3389/fmicb.2020.02067
_version_ 1783579867074265088
author Wang, Ying
Chen, Qi
Deng, Chao
Zheng, Yiluan
Sun, Fengzhu
author_facet Wang, Ying
Chen, Qi
Deng, Chao
Zheng, Yiluan
Sun, Fengzhu
author_sort Wang, Ying
collection PubMed
description Capturing group-specific sequences between two groups of genomic/metagenomic sequences is critical for the follow-up identifications of singular nucleotide variants (SNVs), gene families, microbial species or other elements associated with each group. A sequence that is present, or rich, in one group, but absent, or scarce, in another group is considered a “group-specific” sequence in our study. We developed a user-friendly tool, KmerGO, to identify group-specific sequences between two groups of genomic/metagenomic long sequences or high-throughput sequencing datasets. Compared with other tools, KmerGO captures group-specific k-mers (k up to 40 bps) with much lower requirements for computing resources in much shorter running time. For a 1.05 TB dataset (.fasta), it takes KmerGO about 21.5 h (including k-mer counting) to return assembled group-specific sequences on a regular stand-alone workstation with no more than 1 GB memory. Furthermore, KmerGO can also be applied to capture trait-associated sequences for continuous-trait. Through multi-process parallel computing, KmerGO is implemented with both graphic user interface and command line on Linux and Windows free from any pre-installed supporting environments, packages, and complex configurations. The output group-specific k-mers or sequences from KmerGO could be the inputs of other tools for the downstream discovery of biomarkers, such as genetic variants, species, or genes. KmerGO is available at https://github.com/ChnMasterOG/KmerGO.
format Online
Article
Text
id pubmed-7477287
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-74772872020-09-26 KmerGO: A Tool to Identify Group-Specific Sequences With k-mers Wang, Ying Chen, Qi Deng, Chao Zheng, Yiluan Sun, Fengzhu Front Microbiol Microbiology Capturing group-specific sequences between two groups of genomic/metagenomic sequences is critical for the follow-up identifications of singular nucleotide variants (SNVs), gene families, microbial species or other elements associated with each group. A sequence that is present, or rich, in one group, but absent, or scarce, in another group is considered a “group-specific” sequence in our study. We developed a user-friendly tool, KmerGO, to identify group-specific sequences between two groups of genomic/metagenomic long sequences or high-throughput sequencing datasets. Compared with other tools, KmerGO captures group-specific k-mers (k up to 40 bps) with much lower requirements for computing resources in much shorter running time. For a 1.05 TB dataset (.fasta), it takes KmerGO about 21.5 h (including k-mer counting) to return assembled group-specific sequences on a regular stand-alone workstation with no more than 1 GB memory. Furthermore, KmerGO can also be applied to capture trait-associated sequences for continuous-trait. Through multi-process parallel computing, KmerGO is implemented with both graphic user interface and command line on Linux and Windows free from any pre-installed supporting environments, packages, and complex configurations. The output group-specific k-mers or sequences from KmerGO could be the inputs of other tools for the downstream discovery of biomarkers, such as genetic variants, species, or genes. KmerGO is available at https://github.com/ChnMasterOG/KmerGO. Frontiers Media S.A. 2020-08-25 /pmc/articles/PMC7477287/ /pubmed/32983048 http://dx.doi.org/10.3389/fmicb.2020.02067 Text en Copyright © 2020 Wang, Chen, Deng, Zheng and Sun. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Wang, Ying
Chen, Qi
Deng, Chao
Zheng, Yiluan
Sun, Fengzhu
KmerGO: A Tool to Identify Group-Specific Sequences With k-mers
title KmerGO: A Tool to Identify Group-Specific Sequences With k-mers
title_full KmerGO: A Tool to Identify Group-Specific Sequences With k-mers
title_fullStr KmerGO: A Tool to Identify Group-Specific Sequences With k-mers
title_full_unstemmed KmerGO: A Tool to Identify Group-Specific Sequences With k-mers
title_short KmerGO: A Tool to Identify Group-Specific Sequences With k-mers
title_sort kmergo: a tool to identify group-specific sequences with k-mers
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7477287/
https://www.ncbi.nlm.nih.gov/pubmed/32983048
http://dx.doi.org/10.3389/fmicb.2020.02067
work_keys_str_mv AT wangying kmergoatooltoidentifygroupspecificsequenceswithkmers
AT chenqi kmergoatooltoidentifygroupspecificsequenceswithkmers
AT dengchao kmergoatooltoidentifygroupspecificsequenceswithkmers
AT zhengyiluan kmergoatooltoidentifygroupspecificsequenceswithkmers
AT sunfengzhu kmergoatooltoidentifygroupspecificsequenceswithkmers