Cargando…

Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF

Bacteria and archaea can exchange genetic material across lineages through processes of lateral genetic transfer (LGT). Collectively, these exchange relationships can be modeled as a network and analyzed using concepts from graph theory. In particular, densely connected regions within an LGT network...

Descripción completa

Detalles Bibliográficos
Autores principales: Cong, Yingnan, Chan, Yao-ban, Phillips, Charles A., Langston, Michael A., Ragan, Mark A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5243798/
https://www.ncbi.nlm.nih.gov/pubmed/28154557
http://dx.doi.org/10.3389/fmicb.2017.00021
_version_ 1782496579844833280
author Cong, Yingnan
Chan, Yao-ban
Phillips, Charles A.
Langston, Michael A.
Ragan, Mark A.
author_facet Cong, Yingnan
Chan, Yao-ban
Phillips, Charles A.
Langston, Michael A.
Ragan, Mark A.
author_sort Cong, Yingnan
collection PubMed
description Bacteria and archaea can exchange genetic material across lineages through processes of lateral genetic transfer (LGT). Collectively, these exchange relationships can be modeled as a network and analyzed using concepts from graph theory. In particular, densely connected regions within an LGT network have been defined as genetic exchange communities (GECs). However, it has been problematic to construct networks in which edges solely represent LGT. Here we apply term frequency-inverse document frequency (TF-IDF), an alignment-free method originating from document analysis, to infer regions of lateral origin in bacterial genomes. We examine four empirical datasets of different size (number of genomes) and phyletic breadth, varying a key parameter (word length k) within bounds established in previous work. We map the inferred lateral regions to genes in recipient genomes, and construct networks in which the nodes are groups of genomes, and the edges natively represent LGT. We then extract maximum and maximal cliques (i.e., GECs) from these graphs, and identify nodes that belong to GECs across a wide range of k. Most surviving lateral transfer has happened within these GECs. Using Gene Ontology enrichment tests we demonstrate that biological processes associated with metabolism, regulation and transport are often over-represented among the genes affected by LGT within these communities. These enrichments are largely robust to change of k.
format Online
Article
Text
id pubmed-5243798
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-52437982017-02-02 Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF Cong, Yingnan Chan, Yao-ban Phillips, Charles A. Langston, Michael A. Ragan, Mark A. Front Microbiol Microbiology Bacteria and archaea can exchange genetic material across lineages through processes of lateral genetic transfer (LGT). Collectively, these exchange relationships can be modeled as a network and analyzed using concepts from graph theory. In particular, densely connected regions within an LGT network have been defined as genetic exchange communities (GECs). However, it has been problematic to construct networks in which edges solely represent LGT. Here we apply term frequency-inverse document frequency (TF-IDF), an alignment-free method originating from document analysis, to infer regions of lateral origin in bacterial genomes. We examine four empirical datasets of different size (number of genomes) and phyletic breadth, varying a key parameter (word length k) within bounds established in previous work. We map the inferred lateral regions to genes in recipient genomes, and construct networks in which the nodes are groups of genomes, and the edges natively represent LGT. We then extract maximum and maximal cliques (i.e., GECs) from these graphs, and identify nodes that belong to GECs across a wide range of k. Most surviving lateral transfer has happened within these GECs. Using Gene Ontology enrichment tests we demonstrate that biological processes associated with metabolism, regulation and transport are often over-represented among the genes affected by LGT within these communities. These enrichments are largely robust to change of k. Frontiers Media S.A. 2017-01-19 /pmc/articles/PMC5243798/ /pubmed/28154557 http://dx.doi.org/10.3389/fmicb.2017.00021 Text en Copyright © 2017 Cong, Chan, Phillips, Langston and Ragan. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Cong, Yingnan
Chan, Yao-ban
Phillips, Charles A.
Langston, Michael A.
Ragan, Mark A.
Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF
title Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF
title_full Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF
title_fullStr Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF
title_full_unstemmed Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF
title_short Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF
title_sort robust inference of genetic exchange communities from microbial genomes using tf-idf
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5243798/
https://www.ncbi.nlm.nih.gov/pubmed/28154557
http://dx.doi.org/10.3389/fmicb.2017.00021
work_keys_str_mv AT congyingnan robustinferenceofgeneticexchangecommunitiesfrommicrobialgenomesusingtfidf
AT chanyaoban robustinferenceofgeneticexchangecommunitiesfrommicrobialgenomesusingtfidf
AT phillipscharlesa robustinferenceofgeneticexchangecommunitiesfrommicrobialgenomesusingtfidf
AT langstonmichaela robustinferenceofgeneticexchangecommunitiesfrommicrobialgenomesusingtfidf
AT raganmarka robustinferenceofgeneticexchangecommunitiesfrommicrobialgenomesusingtfidf