Cargando…
Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF
Bacteria and archaea can exchange genetic material across lineages through processes of lateral genetic transfer (LGT). Collectively, these exchange relationships can be modeled as a network and analyzed using concepts from graph theory. In particular, densely connected regions within an LGT network...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5243798/ https://www.ncbi.nlm.nih.gov/pubmed/28154557 http://dx.doi.org/10.3389/fmicb.2017.00021 |
_version_ | 1782496579844833280 |
---|---|
author | Cong, Yingnan Chan, Yao-ban Phillips, Charles A. Langston, Michael A. Ragan, Mark A. |
author_facet | Cong, Yingnan Chan, Yao-ban Phillips, Charles A. Langston, Michael A. Ragan, Mark A. |
author_sort | Cong, Yingnan |
collection | PubMed |
description | Bacteria and archaea can exchange genetic material across lineages through processes of lateral genetic transfer (LGT). Collectively, these exchange relationships can be modeled as a network and analyzed using concepts from graph theory. In particular, densely connected regions within an LGT network have been defined as genetic exchange communities (GECs). However, it has been problematic to construct networks in which edges solely represent LGT. Here we apply term frequency-inverse document frequency (TF-IDF), an alignment-free method originating from document analysis, to infer regions of lateral origin in bacterial genomes. We examine four empirical datasets of different size (number of genomes) and phyletic breadth, varying a key parameter (word length k) within bounds established in previous work. We map the inferred lateral regions to genes in recipient genomes, and construct networks in which the nodes are groups of genomes, and the edges natively represent LGT. We then extract maximum and maximal cliques (i.e., GECs) from these graphs, and identify nodes that belong to GECs across a wide range of k. Most surviving lateral transfer has happened within these GECs. Using Gene Ontology enrichment tests we demonstrate that biological processes associated with metabolism, regulation and transport are often over-represented among the genes affected by LGT within these communities. These enrichments are largely robust to change of k. |
format | Online Article Text |
id | pubmed-5243798 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-52437982017-02-02 Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF Cong, Yingnan Chan, Yao-ban Phillips, Charles A. Langston, Michael A. Ragan, Mark A. Front Microbiol Microbiology Bacteria and archaea can exchange genetic material across lineages through processes of lateral genetic transfer (LGT). Collectively, these exchange relationships can be modeled as a network and analyzed using concepts from graph theory. In particular, densely connected regions within an LGT network have been defined as genetic exchange communities (GECs). However, it has been problematic to construct networks in which edges solely represent LGT. Here we apply term frequency-inverse document frequency (TF-IDF), an alignment-free method originating from document analysis, to infer regions of lateral origin in bacterial genomes. We examine four empirical datasets of different size (number of genomes) and phyletic breadth, varying a key parameter (word length k) within bounds established in previous work. We map the inferred lateral regions to genes in recipient genomes, and construct networks in which the nodes are groups of genomes, and the edges natively represent LGT. We then extract maximum and maximal cliques (i.e., GECs) from these graphs, and identify nodes that belong to GECs across a wide range of k. Most surviving lateral transfer has happened within these GECs. Using Gene Ontology enrichment tests we demonstrate that biological processes associated with metabolism, regulation and transport are often over-represented among the genes affected by LGT within these communities. These enrichments are largely robust to change of k. Frontiers Media S.A. 2017-01-19 /pmc/articles/PMC5243798/ /pubmed/28154557 http://dx.doi.org/10.3389/fmicb.2017.00021 Text en Copyright © 2017 Cong, Chan, Phillips, Langston and Ragan. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Microbiology Cong, Yingnan Chan, Yao-ban Phillips, Charles A. Langston, Michael A. Ragan, Mark A. Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF |
title | Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF |
title_full | Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF |
title_fullStr | Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF |
title_full_unstemmed | Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF |
title_short | Robust Inference of Genetic Exchange Communities from Microbial Genomes Using TF-IDF |
title_sort | robust inference of genetic exchange communities from microbial genomes using tf-idf |
topic | Microbiology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5243798/ https://www.ncbi.nlm.nih.gov/pubmed/28154557 http://dx.doi.org/10.3389/fmicb.2017.00021 |
work_keys_str_mv | AT congyingnan robustinferenceofgeneticexchangecommunitiesfrommicrobialgenomesusingtfidf AT chanyaoban robustinferenceofgeneticexchangecommunitiesfrommicrobialgenomesusingtfidf AT phillipscharlesa robustinferenceofgeneticexchangecommunitiesfrommicrobialgenomesusingtfidf AT langstonmichaela robustinferenceofgeneticexchangecommunitiesfrommicrobialgenomesusingtfidf AT raganmarka robustinferenceofgeneticexchangecommunitiesfrommicrobialgenomesusingtfidf |