Cargando…

Whole-Genome k-mer Topic Modeling Associates Bacterial Families

Alignment-free k-mer-based algorithms in whole genome sequence comparisons remain an ongoing challenge. Here, we explore the possibility to use Topic Modeling for organism whole-genome comparisons. We analyzed 30 complete genomes from three bacterial families by topic modeling. For this, each genome...

Descripción completa

Detalles Bibliográficos
Autores principales: Borrayo, Ernesto, May-Canche, Isaias, Paredes, Omar, Morales, J. Alejandro, Romo-Vázquez, Rebeca, Vélez-Pérez, Hugo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7074292/
https://www.ncbi.nlm.nih.gov/pubmed/32075081
http://dx.doi.org/10.3390/genes11020197
Descripción
Sumario:Alignment-free k-mer-based algorithms in whole genome sequence comparisons remain an ongoing challenge. Here, we explore the possibility to use Topic Modeling for organism whole-genome comparisons. We analyzed 30 complete genomes from three bacterial families by topic modeling. For this, each genome was considered as a document and 13-mer nucleotide representations as words. Latent Dirichlet allocation was used as the probabilistic modeling of the corpus. We where able to identify the topic distribution among analyzed genomes, which is highly consistent with traditional hierarchical classification. It is possible that topic modeling may be applied to establish relationships between genome’s composition and biological phenomena.