Cargando…
Meta-Prism 2.0: Enabling algorithm and web server for ultra-fast, memory-efficient, and accurate analysis among millions of microbial community samples
BACKGROUND: Microbial community samples have been accumulating at a speed faster than ever, with hundreds of thousands of samples been sequenced each year. Mining such a huge amount of multisource heterogeneous data is becoming an increasingly difficult challenge, so efficient and accurate compare a...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9334027/ https://www.ncbi.nlm.nih.gov/pubmed/35902093 http://dx.doi.org/10.1093/gigascience/giac073 |
_version_ | 1784759008463159296 |
---|---|
author | Kang, Kai Chong, Hui Ning, Kang |
author_facet | Kang, Kai Chong, Hui Ning, Kang |
author_sort | Kang, Kai |
collection | PubMed |
description | BACKGROUND: Microbial community samples have been accumulating at a speed faster than ever, with hundreds of thousands of samples been sequenced each year. Mining such a huge amount of multisource heterogeneous data is becoming an increasingly difficult challenge, so efficient and accurate compare and search of samples is in urgent need: faced with millions of samples in the data repository, traditional sample comparison and search approaches fall short in speed and accuracy. FINDINGS: Here we proposed Meta-Prism 2.0, a microbial community sample analysis method that has pushed the time and memory efficiency to a new limit without compromising accuracy. Based on sparse data structure, time-saving instruction pipeline, and SIMD optimization, Meta-Prism 2.0 has enabled ultra-fast, memory-efficient, flexible, and accurate search among millions of samples. Meta-Prism 2.0 was put to test on several data sets, with the largest containing 1 million samples. Results show that Meta-Prism 2.0’s 0.00001-s per sample pair compare speed and 8-GB memory needs for searching against 1 million samples have made it one of the most efficient sample analysis methods. Additionally, Meta-Prism 2.0 can achieve accuracy comparable with or better than other contemporary methods. Third, Meta-Prism 2.0 can precisely identify the original biome for samples, thus enabling sample source tracking. Finally, we have provided a web server for fast search of microbial community samples online. CONCLUSIONS: In summary, Meta-Prism 2.0 has changed the resource-intensive sample search scheme to an effective procedure, which could be conducted by researchers every day even on a laptop, for insightful sample search, similarity analysis, and knowledge discovery. Meta-Prism 2.0 can be accessed at https://github.com/HUST-NingKang-Lab/Meta-Prism-2.0, and the web server can be accessed at https://hust-ningkang-lab.github.io/Meta-Prism-2.0/. |
format | Online Article Text |
id | pubmed-9334027 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-93340272022-07-29 Meta-Prism 2.0: Enabling algorithm and web server for ultra-fast, memory-efficient, and accurate analysis among millions of microbial community samples Kang, Kai Chong, Hui Ning, Kang Gigascience Technical Note BACKGROUND: Microbial community samples have been accumulating at a speed faster than ever, with hundreds of thousands of samples been sequenced each year. Mining such a huge amount of multisource heterogeneous data is becoming an increasingly difficult challenge, so efficient and accurate compare and search of samples is in urgent need: faced with millions of samples in the data repository, traditional sample comparison and search approaches fall short in speed and accuracy. FINDINGS: Here we proposed Meta-Prism 2.0, a microbial community sample analysis method that has pushed the time and memory efficiency to a new limit without compromising accuracy. Based on sparse data structure, time-saving instruction pipeline, and SIMD optimization, Meta-Prism 2.0 has enabled ultra-fast, memory-efficient, flexible, and accurate search among millions of samples. Meta-Prism 2.0 was put to test on several data sets, with the largest containing 1 million samples. Results show that Meta-Prism 2.0’s 0.00001-s per sample pair compare speed and 8-GB memory needs for searching against 1 million samples have made it one of the most efficient sample analysis methods. Additionally, Meta-Prism 2.0 can achieve accuracy comparable with or better than other contemporary methods. Third, Meta-Prism 2.0 can precisely identify the original biome for samples, thus enabling sample source tracking. Finally, we have provided a web server for fast search of microbial community samples online. CONCLUSIONS: In summary, Meta-Prism 2.0 has changed the resource-intensive sample search scheme to an effective procedure, which could be conducted by researchers every day even on a laptop, for insightful sample search, similarity analysis, and knowledge discovery. Meta-Prism 2.0 can be accessed at https://github.com/HUST-NingKang-Lab/Meta-Prism-2.0, and the web server can be accessed at https://hust-ningkang-lab.github.io/Meta-Prism-2.0/. Oxford University Press 2022-07-28 /pmc/articles/PMC9334027/ /pubmed/35902093 http://dx.doi.org/10.1093/gigascience/giac073 Text en © The Author(s) 2022. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Technical Note Kang, Kai Chong, Hui Ning, Kang Meta-Prism 2.0: Enabling algorithm and web server for ultra-fast, memory-efficient, and accurate analysis among millions of microbial community samples |
title | Meta-Prism 2.0: Enabling algorithm and web server for ultra-fast, memory-efficient, and accurate analysis among millions of microbial community samples |
title_full | Meta-Prism 2.0: Enabling algorithm and web server for ultra-fast, memory-efficient, and accurate analysis among millions of microbial community samples |
title_fullStr | Meta-Prism 2.0: Enabling algorithm and web server for ultra-fast, memory-efficient, and accurate analysis among millions of microbial community samples |
title_full_unstemmed | Meta-Prism 2.0: Enabling algorithm and web server for ultra-fast, memory-efficient, and accurate analysis among millions of microbial community samples |
title_short | Meta-Prism 2.0: Enabling algorithm and web server for ultra-fast, memory-efficient, and accurate analysis among millions of microbial community samples |
title_sort | meta-prism 2.0: enabling algorithm and web server for ultra-fast, memory-efficient, and accurate analysis among millions of microbial community samples |
topic | Technical Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9334027/ https://www.ncbi.nlm.nih.gov/pubmed/35902093 http://dx.doi.org/10.1093/gigascience/giac073 |
work_keys_str_mv | AT kangkai metaprism20enablingalgorithmandwebserverforultrafastmemoryefficientandaccurateanalysisamongmillionsofmicrobialcommunitysamples AT chonghui metaprism20enablingalgorithmandwebserverforultrafastmemoryefficientandaccurateanalysisamongmillionsofmicrobialcommunitysamples AT ningkang metaprism20enablingalgorithmandwebserverforultrafastmemoryefficientandaccurateanalysisamongmillionsofmicrobialcommunitysamples |