Cargando…

Exploring lateral genetic transfer among microbial genomes using TF-IDF

Many microbes can acquire genetic material from their environment and incorporate it into their genome, a process known as lateral genetic transfer (LGT). Computational approaches have been developed to detect genomic regions of lateral origin, but typically lack sensitivity, ability to distinguish...

Descripción completa

Detalles Bibliográficos
Autores principales: Cong, Yingnan, Chan, Yao-ban, Ragan, Mark A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4958990/
https://www.ncbi.nlm.nih.gov/pubmed/27452976
http://dx.doi.org/10.1038/srep29319
_version_ 1782444353429438464
author Cong, Yingnan
Chan, Yao-ban
Ragan, Mark A.
author_facet Cong, Yingnan
Chan, Yao-ban
Ragan, Mark A.
author_sort Cong, Yingnan
collection PubMed
description Many microbes can acquire genetic material from their environment and incorporate it into their genome, a process known as lateral genetic transfer (LGT). Computational approaches have been developed to detect genomic regions of lateral origin, but typically lack sensitivity, ability to distinguish donor from recipient, and scalability to very large datasets. To address these issues we have introduced an alignment-free method based on ideas from document analysis, term frequency-inverse document frequency (TF-IDF). Here we examine the performance of TF-IDF on three empirical datasets: 27 genomes of Escherichia coli and Shigella, 110 genomes of enteric bacteria, and 143 genomes across 12 bacterial and three archaeal phyla. We investigate the effect of k-mer size, gap size and delineation of groups on the inference of genomic regions of lateral origin, finding an interplay among these parameters and sequence divergence. Because TF-IDF identifies donor groups and delineates regions of lateral origin within recipient genomes, aggregating these regions by gene enables us to explore, for the first time, the mosaic nature of lateral genes including the multiplicity of biological sources, ancestry of transfer and over-writing by subsequent transfers. We carry out Gene Ontology enrichment tests to investigate which biological processes are potentially affected by LGT.
format Online
Article
Text
id pubmed-4958990
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-49589902016-08-04 Exploring lateral genetic transfer among microbial genomes using TF-IDF Cong, Yingnan Chan, Yao-ban Ragan, Mark A. Sci Rep Article Many microbes can acquire genetic material from their environment and incorporate it into their genome, a process known as lateral genetic transfer (LGT). Computational approaches have been developed to detect genomic regions of lateral origin, but typically lack sensitivity, ability to distinguish donor from recipient, and scalability to very large datasets. To address these issues we have introduced an alignment-free method based on ideas from document analysis, term frequency-inverse document frequency (TF-IDF). Here we examine the performance of TF-IDF on three empirical datasets: 27 genomes of Escherichia coli and Shigella, 110 genomes of enteric bacteria, and 143 genomes across 12 bacterial and three archaeal phyla. We investigate the effect of k-mer size, gap size and delineation of groups on the inference of genomic regions of lateral origin, finding an interplay among these parameters and sequence divergence. Because TF-IDF identifies donor groups and delineates regions of lateral origin within recipient genomes, aggregating these regions by gene enables us to explore, for the first time, the mosaic nature of lateral genes including the multiplicity of biological sources, ancestry of transfer and over-writing by subsequent transfers. We carry out Gene Ontology enrichment tests to investigate which biological processes are potentially affected by LGT. Nature Publishing Group 2016-07-25 /pmc/articles/PMC4958990/ /pubmed/27452976 http://dx.doi.org/10.1038/srep29319 Text en Copyright © 2016, Macmillan Publishers Limited http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Cong, Yingnan
Chan, Yao-ban
Ragan, Mark A.
Exploring lateral genetic transfer among microbial genomes using TF-IDF
title Exploring lateral genetic transfer among microbial genomes using TF-IDF
title_full Exploring lateral genetic transfer among microbial genomes using TF-IDF
title_fullStr Exploring lateral genetic transfer among microbial genomes using TF-IDF
title_full_unstemmed Exploring lateral genetic transfer among microbial genomes using TF-IDF
title_short Exploring lateral genetic transfer among microbial genomes using TF-IDF
title_sort exploring lateral genetic transfer among microbial genomes using tf-idf
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4958990/
https://www.ncbi.nlm.nih.gov/pubmed/27452976
http://dx.doi.org/10.1038/srep29319
work_keys_str_mv AT congyingnan exploringlateralgenetictransferamongmicrobialgenomesusingtfidf
AT chanyaoban exploringlateralgenetictransferamongmicrobialgenomesusingtfidf
AT raganmarka exploringlateralgenetictransferamongmicrobialgenomesusingtfidf