Cargando…

A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF

Lateral genetic transfer (LGT) plays an important role in the evolution of microbes. Existing computational methods for detecting genomic regions of putative lateral origin scale poorly to large data. Here, we propose a novel method based on TF-IDF (Term Frequency-Inverse Document Frequency) statist...

Descripción completa

Detalles Bibliográficos
Autores principales: Cong, Yingnan, Chan, Yao-ban, Ragan, Mark A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4958984/
https://www.ncbi.nlm.nih.gov/pubmed/27453035
http://dx.doi.org/10.1038/srep30308
_version_ 1782444352089358336
author Cong, Yingnan
Chan, Yao-ban
Ragan, Mark A.
author_facet Cong, Yingnan
Chan, Yao-ban
Ragan, Mark A.
author_sort Cong, Yingnan
collection PubMed
description Lateral genetic transfer (LGT) plays an important role in the evolution of microbes. Existing computational methods for detecting genomic regions of putative lateral origin scale poorly to large data. Here, we propose a novel method based on TF-IDF (Term Frequency-Inverse Document Frequency) statistics to detect not only regions of lateral origin, but also their origin and direction of transfer, in sets of hierarchically structured nucleotide or protein sequences. This approach is based on the frequency distributions of k-mers in the sequences. If a set of contiguous k-mers appears sufficiently more frequently in another phyletic group than in its own, we infer that they have been transferred from the first group to the second. We performed rigorous tests of TF-IDF using simulated and empirical datasets. With the simulated data, we tested our method under different parameter settings for sequence length, substitution rate between and within groups and post-LGT, deletion rate, length of transferred region and k size, and found that we can detect LGT events with high precision and recall. Our method performs better than an established method, ALFY, which has high recall but low precision. Our method is efficient, with runtime increasing approximately linearly with sequence length.
format Online
Article
Text
id pubmed-4958984
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-49589842016-08-04 A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF Cong, Yingnan Chan, Yao-ban Ragan, Mark A. Sci Rep Article Lateral genetic transfer (LGT) plays an important role in the evolution of microbes. Existing computational methods for detecting genomic regions of putative lateral origin scale poorly to large data. Here, we propose a novel method based on TF-IDF (Term Frequency-Inverse Document Frequency) statistics to detect not only regions of lateral origin, but also their origin and direction of transfer, in sets of hierarchically structured nucleotide or protein sequences. This approach is based on the frequency distributions of k-mers in the sequences. If a set of contiguous k-mers appears sufficiently more frequently in another phyletic group than in its own, we infer that they have been transferred from the first group to the second. We performed rigorous tests of TF-IDF using simulated and empirical datasets. With the simulated data, we tested our method under different parameter settings for sequence length, substitution rate between and within groups and post-LGT, deletion rate, length of transferred region and k size, and found that we can detect LGT events with high precision and recall. Our method performs better than an established method, ALFY, which has high recall but low precision. Our method is efficient, with runtime increasing approximately linearly with sequence length. Nature Publishing Group 2016-07-25 /pmc/articles/PMC4958984/ /pubmed/27453035 http://dx.doi.org/10.1038/srep30308 Text en Copyright © 2016, The Author(s) http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Cong, Yingnan
Chan, Yao-ban
Ragan, Mark A.
A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF
title A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF
title_full A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF
title_fullStr A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF
title_full_unstemmed A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF
title_short A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF
title_sort novel alignment-free method for detection of lateral genetic transfer based on tf-idf
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4958984/
https://www.ncbi.nlm.nih.gov/pubmed/27453035
http://dx.doi.org/10.1038/srep30308
work_keys_str_mv AT congyingnan anovelalignmentfreemethodfordetectionoflateralgenetictransferbasedontfidf
AT chanyaoban anovelalignmentfreemethodfordetectionoflateralgenetictransferbasedontfidf
AT raganmarka anovelalignmentfreemethodfordetectionoflateralgenetictransferbasedontfidf
AT congyingnan novelalignmentfreemethodfordetectionoflateralgenetictransferbasedontfidf
AT chanyaoban novelalignmentfreemethodfordetectionoflateralgenetictransferbasedontfidf
AT raganmarka novelalignmentfreemethodfordetectionoflateralgenetictransferbasedontfidf