Cargando…
Whole-Genome k-mer Topic Modeling Associates Bacterial Families
Alignment-free k-mer-based algorithms in whole genome sequence comparisons remain an ongoing challenge. Here, we explore the possibility to use Topic Modeling for organism whole-genome comparisons. We analyzed 30 complete genomes from three bacterial families by topic modeling. For this, each genome...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7074292/ https://www.ncbi.nlm.nih.gov/pubmed/32075081 http://dx.doi.org/10.3390/genes11020197 |
_version_ | 1783506800224501760 |
---|---|
author | Borrayo, Ernesto May-Canche, Isaias Paredes, Omar Morales, J. Alejandro Romo-Vázquez, Rebeca Vélez-Pérez, Hugo |
author_facet | Borrayo, Ernesto May-Canche, Isaias Paredes, Omar Morales, J. Alejandro Romo-Vázquez, Rebeca Vélez-Pérez, Hugo |
author_sort | Borrayo, Ernesto |
collection | PubMed |
description | Alignment-free k-mer-based algorithms in whole genome sequence comparisons remain an ongoing challenge. Here, we explore the possibility to use Topic Modeling for organism whole-genome comparisons. We analyzed 30 complete genomes from three bacterial families by topic modeling. For this, each genome was considered as a document and 13-mer nucleotide representations as words. Latent Dirichlet allocation was used as the probabilistic modeling of the corpus. We where able to identify the topic distribution among analyzed genomes, which is highly consistent with traditional hierarchical classification. It is possible that topic modeling may be applied to establish relationships between genome’s composition and biological phenomena. |
format | Online Article Text |
id | pubmed-7074292 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-70742922020-03-19 Whole-Genome k-mer Topic Modeling Associates Bacterial Families Borrayo, Ernesto May-Canche, Isaias Paredes, Omar Morales, J. Alejandro Romo-Vázquez, Rebeca Vélez-Pérez, Hugo Genes (Basel) Brief Report Alignment-free k-mer-based algorithms in whole genome sequence comparisons remain an ongoing challenge. Here, we explore the possibility to use Topic Modeling for organism whole-genome comparisons. We analyzed 30 complete genomes from three bacterial families by topic modeling. For this, each genome was considered as a document and 13-mer nucleotide representations as words. Latent Dirichlet allocation was used as the probabilistic modeling of the corpus. We where able to identify the topic distribution among analyzed genomes, which is highly consistent with traditional hierarchical classification. It is possible that topic modeling may be applied to establish relationships between genome’s composition and biological phenomena. MDPI 2020-02-14 /pmc/articles/PMC7074292/ /pubmed/32075081 http://dx.doi.org/10.3390/genes11020197 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Brief Report Borrayo, Ernesto May-Canche, Isaias Paredes, Omar Morales, J. Alejandro Romo-Vázquez, Rebeca Vélez-Pérez, Hugo Whole-Genome k-mer Topic Modeling Associates Bacterial Families |
title | Whole-Genome k-mer Topic Modeling Associates Bacterial Families |
title_full | Whole-Genome k-mer Topic Modeling Associates Bacterial Families |
title_fullStr | Whole-Genome k-mer Topic Modeling Associates Bacterial Families |
title_full_unstemmed | Whole-Genome k-mer Topic Modeling Associates Bacterial Families |
title_short | Whole-Genome k-mer Topic Modeling Associates Bacterial Families |
title_sort | whole-genome k-mer topic modeling associates bacterial families |
topic | Brief Report |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7074292/ https://www.ncbi.nlm.nih.gov/pubmed/32075081 http://dx.doi.org/10.3390/genes11020197 |
work_keys_str_mv | AT borrayoernesto wholegenomekmertopicmodelingassociatesbacterialfamilies AT maycancheisaias wholegenomekmertopicmodelingassociatesbacterialfamilies AT paredesomar wholegenomekmertopicmodelingassociatesbacterialfamilies AT moralesjalejandro wholegenomekmertopicmodelingassociatesbacterialfamilies AT romovazquezrebeca wholegenomekmertopicmodelingassociatesbacterialfamilies AT velezperezhugo wholegenomekmertopicmodelingassociatesbacterialfamilies |