Cargando…
A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF
Lateral genetic transfer (LGT) plays an important role in the evolution of microbes. Existing computational methods for detecting genomic regions of putative lateral origin scale poorly to large data. Here, we propose a novel method based on TF-IDF (Term Frequency-Inverse Document Frequency) statist...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4958984/ https://www.ncbi.nlm.nih.gov/pubmed/27453035 http://dx.doi.org/10.1038/srep30308 |
_version_ | 1782444352089358336 |
---|---|
author | Cong, Yingnan Chan, Yao-ban Ragan, Mark A. |
author_facet | Cong, Yingnan Chan, Yao-ban Ragan, Mark A. |
author_sort | Cong, Yingnan |
collection | PubMed |
description | Lateral genetic transfer (LGT) plays an important role in the evolution of microbes. Existing computational methods for detecting genomic regions of putative lateral origin scale poorly to large data. Here, we propose a novel method based on TF-IDF (Term Frequency-Inverse Document Frequency) statistics to detect not only regions of lateral origin, but also their origin and direction of transfer, in sets of hierarchically structured nucleotide or protein sequences. This approach is based on the frequency distributions of k-mers in the sequences. If a set of contiguous k-mers appears sufficiently more frequently in another phyletic group than in its own, we infer that they have been transferred from the first group to the second. We performed rigorous tests of TF-IDF using simulated and empirical datasets. With the simulated data, we tested our method under different parameter settings for sequence length, substitution rate between and within groups and post-LGT, deletion rate, length of transferred region and k size, and found that we can detect LGT events with high precision and recall. Our method performs better than an established method, ALFY, which has high recall but low precision. Our method is efficient, with runtime increasing approximately linearly with sequence length. |
format | Online Article Text |
id | pubmed-4958984 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Nature Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-49589842016-08-04 A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF Cong, Yingnan Chan, Yao-ban Ragan, Mark A. Sci Rep Article Lateral genetic transfer (LGT) plays an important role in the evolution of microbes. Existing computational methods for detecting genomic regions of putative lateral origin scale poorly to large data. Here, we propose a novel method based on TF-IDF (Term Frequency-Inverse Document Frequency) statistics to detect not only regions of lateral origin, but also their origin and direction of transfer, in sets of hierarchically structured nucleotide or protein sequences. This approach is based on the frequency distributions of k-mers in the sequences. If a set of contiguous k-mers appears sufficiently more frequently in another phyletic group than in its own, we infer that they have been transferred from the first group to the second. We performed rigorous tests of TF-IDF using simulated and empirical datasets. With the simulated data, we tested our method under different parameter settings for sequence length, substitution rate between and within groups and post-LGT, deletion rate, length of transferred region and k size, and found that we can detect LGT events with high precision and recall. Our method performs better than an established method, ALFY, which has high recall but low precision. Our method is efficient, with runtime increasing approximately linearly with sequence length. Nature Publishing Group 2016-07-25 /pmc/articles/PMC4958984/ /pubmed/27453035 http://dx.doi.org/10.1038/srep30308 Text en Copyright © 2016, The Author(s) http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ |
spellingShingle | Article Cong, Yingnan Chan, Yao-ban Ragan, Mark A. A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF |
title | A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF |
title_full | A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF |
title_fullStr | A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF |
title_full_unstemmed | A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF |
title_short | A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF |
title_sort | novel alignment-free method for detection of lateral genetic transfer based on tf-idf |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4958984/ https://www.ncbi.nlm.nih.gov/pubmed/27453035 http://dx.doi.org/10.1038/srep30308 |
work_keys_str_mv | AT congyingnan anovelalignmentfreemethodfordetectionoflateralgenetictransferbasedontfidf AT chanyaoban anovelalignmentfreemethodfordetectionoflateralgenetictransferbasedontfidf AT raganmarka anovelalignmentfreemethodfordetectionoflateralgenetictransferbasedontfidf AT congyingnan novelalignmentfreemethodfordetectionoflateralgenetictransferbasedontfidf AT chanyaoban novelalignmentfreemethodfordetectionoflateralgenetictransferbasedontfidf AT raganmarka novelalignmentfreemethodfordetectionoflateralgenetictransferbasedontfidf |