Cargando…
The statistical power of k-mer based aggregative statistics for alignment-free detection of horizontal gene transfer
Alignment-based database search and sequence comparison are commonly used to detect horizontal gene transfer (HGT). However, with the rapid increase of sequencing depth, hundreds of thousands of contigs are routinely assembled from metagenomics studies, which challenges alignment-based HGT analysis...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
KeAi Publishing
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6723412/ https://www.ncbi.nlm.nih.gov/pubmed/31508512 http://dx.doi.org/10.1016/j.synbio.2019.08.001 |
_version_ | 1783448761765199872 |
---|---|
author | Huang, Guan-Da Liu, Xue-Mei Huang, Tian-Lai Xia, Li- C. |
author_facet | Huang, Guan-Da Liu, Xue-Mei Huang, Tian-Lai Xia, Li- C. |
author_sort | Huang, Guan-Da |
collection | PubMed |
description | Alignment-based database search and sequence comparison are commonly used to detect horizontal gene transfer (HGT). However, with the rapid increase of sequencing depth, hundreds of thousands of contigs are routinely assembled from metagenomics studies, which challenges alignment-based HGT analysis by overwhelming the known reference sequences. Detecting HGT by k-mer statistics thus becomes an attractive alternative. These alignment-free statistics have been demonstrated in high performance and efficiency in whole-genome and transcriptome comparisons. To adapt k-mer statistics for HGT detection, we developed two aggregative statistics [Formula: see text] and [Formula: see text] , which subsample metagenome contigs by their representative regions, and summarize the regional [Formula: see text] and [Formula: see text] metrics by their upper bounds. We systematically studied the aggregative statistics’ power at different k-mer size using simulations. Our analysis showed that, in general, the power of [Formula: see text] and [Formula: see text] increases with sequencing coverage, and reaches a maximum power >80% at k = 6, with 5% Type-I error and the coverage ratio >0.2x. The statistical power of [Formula: see text] and [Formula: see text] was evaluated with realistic simulations of HGT mechanism, sequencing depth, read length, and base error. We expect these statistics to be useful distance metrics for identifying HGT in metagenomic studies. |
format | Online Article Text |
id | pubmed-6723412 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | KeAi Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-67234122019-09-10 The statistical power of k-mer based aggregative statistics for alignment-free detection of horizontal gene transfer Huang, Guan-Da Liu, Xue-Mei Huang, Tian-Lai Xia, Li- C. Synth Syst Biotechnol Article Alignment-based database search and sequence comparison are commonly used to detect horizontal gene transfer (HGT). However, with the rapid increase of sequencing depth, hundreds of thousands of contigs are routinely assembled from metagenomics studies, which challenges alignment-based HGT analysis by overwhelming the known reference sequences. Detecting HGT by k-mer statistics thus becomes an attractive alternative. These alignment-free statistics have been demonstrated in high performance and efficiency in whole-genome and transcriptome comparisons. To adapt k-mer statistics for HGT detection, we developed two aggregative statistics [Formula: see text] and [Formula: see text] , which subsample metagenome contigs by their representative regions, and summarize the regional [Formula: see text] and [Formula: see text] metrics by their upper bounds. We systematically studied the aggregative statistics’ power at different k-mer size using simulations. Our analysis showed that, in general, the power of [Formula: see text] and [Formula: see text] increases with sequencing coverage, and reaches a maximum power >80% at k = 6, with 5% Type-I error and the coverage ratio >0.2x. The statistical power of [Formula: see text] and [Formula: see text] was evaluated with realistic simulations of HGT mechanism, sequencing depth, read length, and base error. We expect these statistics to be useful distance metrics for identifying HGT in metagenomic studies. KeAi Publishing 2019-08-31 /pmc/articles/PMC6723412/ /pubmed/31508512 http://dx.doi.org/10.1016/j.synbio.2019.08.001 Text en © 2019 Production and hosting by Elsevier B.V. on behalf of KeAi Communications Co., Ltd. http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Article Huang, Guan-Da Liu, Xue-Mei Huang, Tian-Lai Xia, Li- C. The statistical power of k-mer based aggregative statistics for alignment-free detection of horizontal gene transfer |
title | The statistical power of k-mer based aggregative statistics for alignment-free detection of horizontal gene transfer |
title_full | The statistical power of k-mer based aggregative statistics for alignment-free detection of horizontal gene transfer |
title_fullStr | The statistical power of k-mer based aggregative statistics for alignment-free detection of horizontal gene transfer |
title_full_unstemmed | The statistical power of k-mer based aggregative statistics for alignment-free detection of horizontal gene transfer |
title_short | The statistical power of k-mer based aggregative statistics for alignment-free detection of horizontal gene transfer |
title_sort | statistical power of k-mer based aggregative statistics for alignment-free detection of horizontal gene transfer |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6723412/ https://www.ncbi.nlm.nih.gov/pubmed/31508512 http://dx.doi.org/10.1016/j.synbio.2019.08.001 |
work_keys_str_mv | AT huangguanda thestatisticalpowerofkmerbasedaggregativestatisticsforalignmentfreedetectionofhorizontalgenetransfer AT liuxuemei thestatisticalpowerofkmerbasedaggregativestatisticsforalignmentfreedetectionofhorizontalgenetransfer AT huangtianlai thestatisticalpowerofkmerbasedaggregativestatisticsforalignmentfreedetectionofhorizontalgenetransfer AT xialic thestatisticalpowerofkmerbasedaggregativestatisticsforalignmentfreedetectionofhorizontalgenetransfer AT huangguanda statisticalpowerofkmerbasedaggregativestatisticsforalignmentfreedetectionofhorizontalgenetransfer AT liuxuemei statisticalpowerofkmerbasedaggregativestatisticsforalignmentfreedetectionofhorizontalgenetransfer AT huangtianlai statisticalpowerofkmerbasedaggregativestatisticsforalignmentfreedetectionofhorizontalgenetransfer AT xialic statisticalpowerofkmerbasedaggregativestatisticsforalignmentfreedetectionofhorizontalgenetransfer |