Cargando…
KmerGO: A Tool to Identify Group-Specific Sequences With k-mers
Capturing group-specific sequences between two groups of genomic/metagenomic sequences is critical for the follow-up identifications of singular nucleotide variants (SNVs), gene families, microbial species or other elements associated with each group. A sequence that is present, or rich, in one grou...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7477287/ https://www.ncbi.nlm.nih.gov/pubmed/32983048 http://dx.doi.org/10.3389/fmicb.2020.02067 |
_version_ | 1783579867074265088 |
---|---|
author | Wang, Ying Chen, Qi Deng, Chao Zheng, Yiluan Sun, Fengzhu |
author_facet | Wang, Ying Chen, Qi Deng, Chao Zheng, Yiluan Sun, Fengzhu |
author_sort | Wang, Ying |
collection | PubMed |
description | Capturing group-specific sequences between two groups of genomic/metagenomic sequences is critical for the follow-up identifications of singular nucleotide variants (SNVs), gene families, microbial species or other elements associated with each group. A sequence that is present, or rich, in one group, but absent, or scarce, in another group is considered a “group-specific” sequence in our study. We developed a user-friendly tool, KmerGO, to identify group-specific sequences between two groups of genomic/metagenomic long sequences or high-throughput sequencing datasets. Compared with other tools, KmerGO captures group-specific k-mers (k up to 40 bps) with much lower requirements for computing resources in much shorter running time. For a 1.05 TB dataset (.fasta), it takes KmerGO about 21.5 h (including k-mer counting) to return assembled group-specific sequences on a regular stand-alone workstation with no more than 1 GB memory. Furthermore, KmerGO can also be applied to capture trait-associated sequences for continuous-trait. Through multi-process parallel computing, KmerGO is implemented with both graphic user interface and command line on Linux and Windows free from any pre-installed supporting environments, packages, and complex configurations. The output group-specific k-mers or sequences from KmerGO could be the inputs of other tools for the downstream discovery of biomarkers, such as genetic variants, species, or genes. KmerGO is available at https://github.com/ChnMasterOG/KmerGO. |
format | Online Article Text |
id | pubmed-7477287 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-74772872020-09-26 KmerGO: A Tool to Identify Group-Specific Sequences With k-mers Wang, Ying Chen, Qi Deng, Chao Zheng, Yiluan Sun, Fengzhu Front Microbiol Microbiology Capturing group-specific sequences between two groups of genomic/metagenomic sequences is critical for the follow-up identifications of singular nucleotide variants (SNVs), gene families, microbial species or other elements associated with each group. A sequence that is present, or rich, in one group, but absent, or scarce, in another group is considered a “group-specific” sequence in our study. We developed a user-friendly tool, KmerGO, to identify group-specific sequences between two groups of genomic/metagenomic long sequences or high-throughput sequencing datasets. Compared with other tools, KmerGO captures group-specific k-mers (k up to 40 bps) with much lower requirements for computing resources in much shorter running time. For a 1.05 TB dataset (.fasta), it takes KmerGO about 21.5 h (including k-mer counting) to return assembled group-specific sequences on a regular stand-alone workstation with no more than 1 GB memory. Furthermore, KmerGO can also be applied to capture trait-associated sequences for continuous-trait. Through multi-process parallel computing, KmerGO is implemented with both graphic user interface and command line on Linux and Windows free from any pre-installed supporting environments, packages, and complex configurations. The output group-specific k-mers or sequences from KmerGO could be the inputs of other tools for the downstream discovery of biomarkers, such as genetic variants, species, or genes. KmerGO is available at https://github.com/ChnMasterOG/KmerGO. Frontiers Media S.A. 2020-08-25 /pmc/articles/PMC7477287/ /pubmed/32983048 http://dx.doi.org/10.3389/fmicb.2020.02067 Text en Copyright © 2020 Wang, Chen, Deng, Zheng and Sun. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Microbiology Wang, Ying Chen, Qi Deng, Chao Zheng, Yiluan Sun, Fengzhu KmerGO: A Tool to Identify Group-Specific Sequences With k-mers |
title | KmerGO: A Tool to Identify Group-Specific Sequences With k-mers |
title_full | KmerGO: A Tool to Identify Group-Specific Sequences With k-mers |
title_fullStr | KmerGO: A Tool to Identify Group-Specific Sequences With k-mers |
title_full_unstemmed | KmerGO: A Tool to Identify Group-Specific Sequences With k-mers |
title_short | KmerGO: A Tool to Identify Group-Specific Sequences With k-mers |
title_sort | kmergo: a tool to identify group-specific sequences with k-mers |
topic | Microbiology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7477287/ https://www.ncbi.nlm.nih.gov/pubmed/32983048 http://dx.doi.org/10.3389/fmicb.2020.02067 |
work_keys_str_mv | AT wangying kmergoatooltoidentifygroupspecificsequenceswithkmers AT chenqi kmergoatooltoidentifygroupspecificsequenceswithkmers AT dengchao kmergoatooltoidentifygroupspecificsequenceswithkmers AT zhengyiluan kmergoatooltoidentifygroupspecificsequenceswithkmers AT sunfengzhu kmergoatooltoidentifygroupspecificsequenceswithkmers |