Cargando…

Whole-Genome k-mer Topic Modeling Associates Bacterial Families

Alignment-free k-mer-based algorithms in whole genome sequence comparisons remain an ongoing challenge. Here, we explore the possibility to use Topic Modeling for organism whole-genome comparisons. We analyzed 30 complete genomes from three bacterial families by topic modeling. For this, each genome...

Descripción completa

Detalles Bibliográficos
Autores principales: Borrayo, Ernesto, May-Canche, Isaias, Paredes, Omar, Morales, J. Alejandro, Romo-Vázquez, Rebeca, Vélez-Pérez, Hugo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7074292/
https://www.ncbi.nlm.nih.gov/pubmed/32075081
http://dx.doi.org/10.3390/genes11020197
_version_ 1783506800224501760
author Borrayo, Ernesto
May-Canche, Isaias
Paredes, Omar
Morales, J. Alejandro
Romo-Vázquez, Rebeca
Vélez-Pérez, Hugo
author_facet Borrayo, Ernesto
May-Canche, Isaias
Paredes, Omar
Morales, J. Alejandro
Romo-Vázquez, Rebeca
Vélez-Pérez, Hugo
author_sort Borrayo, Ernesto
collection PubMed
description Alignment-free k-mer-based algorithms in whole genome sequence comparisons remain an ongoing challenge. Here, we explore the possibility to use Topic Modeling for organism whole-genome comparisons. We analyzed 30 complete genomes from three bacterial families by topic modeling. For this, each genome was considered as a document and 13-mer nucleotide representations as words. Latent Dirichlet allocation was used as the probabilistic modeling of the corpus. We where able to identify the topic distribution among analyzed genomes, which is highly consistent with traditional hierarchical classification. It is possible that topic modeling may be applied to establish relationships between genome’s composition and biological phenomena.
format Online
Article
Text
id pubmed-7074292
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-70742922020-03-19 Whole-Genome k-mer Topic Modeling Associates Bacterial Families Borrayo, Ernesto May-Canche, Isaias Paredes, Omar Morales, J. Alejandro Romo-Vázquez, Rebeca Vélez-Pérez, Hugo Genes (Basel) Brief Report Alignment-free k-mer-based algorithms in whole genome sequence comparisons remain an ongoing challenge. Here, we explore the possibility to use Topic Modeling for organism whole-genome comparisons. We analyzed 30 complete genomes from three bacterial families by topic modeling. For this, each genome was considered as a document and 13-mer nucleotide representations as words. Latent Dirichlet allocation was used as the probabilistic modeling of the corpus. We where able to identify the topic distribution among analyzed genomes, which is highly consistent with traditional hierarchical classification. It is possible that topic modeling may be applied to establish relationships between genome’s composition and biological phenomena. MDPI 2020-02-14 /pmc/articles/PMC7074292/ /pubmed/32075081 http://dx.doi.org/10.3390/genes11020197 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Brief Report
Borrayo, Ernesto
May-Canche, Isaias
Paredes, Omar
Morales, J. Alejandro
Romo-Vázquez, Rebeca
Vélez-Pérez, Hugo
Whole-Genome k-mer Topic Modeling Associates Bacterial Families
title Whole-Genome k-mer Topic Modeling Associates Bacterial Families
title_full Whole-Genome k-mer Topic Modeling Associates Bacterial Families
title_fullStr Whole-Genome k-mer Topic Modeling Associates Bacterial Families
title_full_unstemmed Whole-Genome k-mer Topic Modeling Associates Bacterial Families
title_short Whole-Genome k-mer Topic Modeling Associates Bacterial Families
title_sort whole-genome k-mer topic modeling associates bacterial families
topic Brief Report
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7074292/
https://www.ncbi.nlm.nih.gov/pubmed/32075081
http://dx.doi.org/10.3390/genes11020197
work_keys_str_mv AT borrayoernesto wholegenomekmertopicmodelingassociatesbacterialfamilies
AT maycancheisaias wholegenomekmertopicmodelingassociatesbacterialfamilies
AT paredesomar wholegenomekmertopicmodelingassociatesbacterialfamilies
AT moralesjalejandro wholegenomekmertopicmodelingassociatesbacterialfamilies
AT romovazquezrebeca wholegenomekmertopicmodelingassociatesbacterialfamilies
AT velezperezhugo wholegenomekmertopicmodelingassociatesbacterialfamilies