Cargando…
Exploring lateral genetic transfer among microbial genomes using TF-IDF
Many microbes can acquire genetic material from their environment and incorporate it into their genome, a process known as lateral genetic transfer (LGT). Computational approaches have been developed to detect genomic regions of lateral origin, but typically lack sensitivity, ability to distinguish...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4958990/ https://www.ncbi.nlm.nih.gov/pubmed/27452976 http://dx.doi.org/10.1038/srep29319 |
_version_ | 1782444353429438464 |
---|---|
author | Cong, Yingnan Chan, Yao-ban Ragan, Mark A. |
author_facet | Cong, Yingnan Chan, Yao-ban Ragan, Mark A. |
author_sort | Cong, Yingnan |
collection | PubMed |
description | Many microbes can acquire genetic material from their environment and incorporate it into their genome, a process known as lateral genetic transfer (LGT). Computational approaches have been developed to detect genomic regions of lateral origin, but typically lack sensitivity, ability to distinguish donor from recipient, and scalability to very large datasets. To address these issues we have introduced an alignment-free method based on ideas from document analysis, term frequency-inverse document frequency (TF-IDF). Here we examine the performance of TF-IDF on three empirical datasets: 27 genomes of Escherichia coli and Shigella, 110 genomes of enteric bacteria, and 143 genomes across 12 bacterial and three archaeal phyla. We investigate the effect of k-mer size, gap size and delineation of groups on the inference of genomic regions of lateral origin, finding an interplay among these parameters and sequence divergence. Because TF-IDF identifies donor groups and delineates regions of lateral origin within recipient genomes, aggregating these regions by gene enables us to explore, for the first time, the mosaic nature of lateral genes including the multiplicity of biological sources, ancestry of transfer and over-writing by subsequent transfers. We carry out Gene Ontology enrichment tests to investigate which biological processes are potentially affected by LGT. |
format | Online Article Text |
id | pubmed-4958990 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Nature Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-49589902016-08-04 Exploring lateral genetic transfer among microbial genomes using TF-IDF Cong, Yingnan Chan, Yao-ban Ragan, Mark A. Sci Rep Article Many microbes can acquire genetic material from their environment and incorporate it into their genome, a process known as lateral genetic transfer (LGT). Computational approaches have been developed to detect genomic regions of lateral origin, but typically lack sensitivity, ability to distinguish donor from recipient, and scalability to very large datasets. To address these issues we have introduced an alignment-free method based on ideas from document analysis, term frequency-inverse document frequency (TF-IDF). Here we examine the performance of TF-IDF on three empirical datasets: 27 genomes of Escherichia coli and Shigella, 110 genomes of enteric bacteria, and 143 genomes across 12 bacterial and three archaeal phyla. We investigate the effect of k-mer size, gap size and delineation of groups on the inference of genomic regions of lateral origin, finding an interplay among these parameters and sequence divergence. Because TF-IDF identifies donor groups and delineates regions of lateral origin within recipient genomes, aggregating these regions by gene enables us to explore, for the first time, the mosaic nature of lateral genes including the multiplicity of biological sources, ancestry of transfer and over-writing by subsequent transfers. We carry out Gene Ontology enrichment tests to investigate which biological processes are potentially affected by LGT. Nature Publishing Group 2016-07-25 /pmc/articles/PMC4958990/ /pubmed/27452976 http://dx.doi.org/10.1038/srep29319 Text en Copyright © 2016, Macmillan Publishers Limited http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ |
spellingShingle | Article Cong, Yingnan Chan, Yao-ban Ragan, Mark A. Exploring lateral genetic transfer among microbial genomes using TF-IDF |
title | Exploring lateral genetic transfer among microbial genomes using TF-IDF |
title_full | Exploring lateral genetic transfer among microbial genomes using TF-IDF |
title_fullStr | Exploring lateral genetic transfer among microbial genomes using TF-IDF |
title_full_unstemmed | Exploring lateral genetic transfer among microbial genomes using TF-IDF |
title_short | Exploring lateral genetic transfer among microbial genomes using TF-IDF |
title_sort | exploring lateral genetic transfer among microbial genomes using tf-idf |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4958990/ https://www.ncbi.nlm.nih.gov/pubmed/27452976 http://dx.doi.org/10.1038/srep29319 |
work_keys_str_mv | AT congyingnan exploringlateralgenetictransferamongmicrobialgenomesusingtfidf AT chanyaoban exploringlateralgenetictransferamongmicrobialgenomesusingtfidf AT raganmarka exploringlateralgenetictransferamongmicrobialgenomesusingtfidf |