Cargando…

Meta-Prism 2.0: Enabling algorithm and web server for ultra-fast, memory-efficient, and accurate analysis among millions of microbial community samples

BACKGROUND: Microbial community samples have been accumulating at a speed faster than ever, with hundreds of thousands of samples been sequenced each year. Mining such a huge amount of multisource heterogeneous data is becoming an increasingly difficult challenge, so efficient and accurate compare a...

Descripción completa

Detalles Bibliográficos
Autores principales: Kang, Kai, Chong, Hui, Ning, Kang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9334027/
https://www.ncbi.nlm.nih.gov/pubmed/35902093
http://dx.doi.org/10.1093/gigascience/giac073
_version_ 1784759008463159296
author Kang, Kai
Chong, Hui
Ning, Kang
author_facet Kang, Kai
Chong, Hui
Ning, Kang
author_sort Kang, Kai
collection PubMed
description BACKGROUND: Microbial community samples have been accumulating at a speed faster than ever, with hundreds of thousands of samples been sequenced each year. Mining such a huge amount of multisource heterogeneous data is becoming an increasingly difficult challenge, so efficient and accurate compare and search of samples is in urgent need: faced with millions of samples in the data repository, traditional sample comparison and search approaches fall short in speed and accuracy. FINDINGS: Here we proposed Meta-Prism 2.0, a microbial community sample analysis method that has pushed the time and memory efficiency to a new limit without compromising accuracy. Based on sparse data structure, time-saving instruction pipeline, and SIMD optimization, Meta-Prism 2.0 has enabled ultra-fast, memory-efficient, flexible, and accurate search among millions of samples. Meta-Prism 2.0 was put to test on several data sets, with the largest containing 1 million samples. Results show that Meta-Prism 2.0’s 0.00001-s per sample pair compare speed and 8-GB memory needs for searching against 1 million samples have made it one of the most efficient sample analysis methods. Additionally, Meta-Prism 2.0 can achieve accuracy comparable with or better than other contemporary methods. Third, Meta-Prism 2.0 can precisely identify the original biome for samples, thus enabling sample source tracking. Finally, we have provided a web server for fast search of microbial community samples online. CONCLUSIONS: In summary, Meta-Prism 2.0 has changed the resource-intensive sample search scheme to an effective procedure, which could be conducted by researchers every day even on a laptop, for insightful sample search, similarity analysis, and knowledge discovery. Meta-Prism 2.0 can be accessed at https://github.com/HUST-NingKang-Lab/Meta-Prism-2.0, and the web server can be accessed at https://hust-ningkang-lab.github.io/Meta-Prism-2.0/.
format Online
Article
Text
id pubmed-9334027
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-93340272022-07-29 Meta-Prism 2.0: Enabling algorithm and web server for ultra-fast, memory-efficient, and accurate analysis among millions of microbial community samples Kang, Kai Chong, Hui Ning, Kang Gigascience Technical Note BACKGROUND: Microbial community samples have been accumulating at a speed faster than ever, with hundreds of thousands of samples been sequenced each year. Mining such a huge amount of multisource heterogeneous data is becoming an increasingly difficult challenge, so efficient and accurate compare and search of samples is in urgent need: faced with millions of samples in the data repository, traditional sample comparison and search approaches fall short in speed and accuracy. FINDINGS: Here we proposed Meta-Prism 2.0, a microbial community sample analysis method that has pushed the time and memory efficiency to a new limit without compromising accuracy. Based on sparse data structure, time-saving instruction pipeline, and SIMD optimization, Meta-Prism 2.0 has enabled ultra-fast, memory-efficient, flexible, and accurate search among millions of samples. Meta-Prism 2.0 was put to test on several data sets, with the largest containing 1 million samples. Results show that Meta-Prism 2.0’s 0.00001-s per sample pair compare speed and 8-GB memory needs for searching against 1 million samples have made it one of the most efficient sample analysis methods. Additionally, Meta-Prism 2.0 can achieve accuracy comparable with or better than other contemporary methods. Third, Meta-Prism 2.0 can precisely identify the original biome for samples, thus enabling sample source tracking. Finally, we have provided a web server for fast search of microbial community samples online. CONCLUSIONS: In summary, Meta-Prism 2.0 has changed the resource-intensive sample search scheme to an effective procedure, which could be conducted by researchers every day even on a laptop, for insightful sample search, similarity analysis, and knowledge discovery. Meta-Prism 2.0 can be accessed at https://github.com/HUST-NingKang-Lab/Meta-Prism-2.0, and the web server can be accessed at https://hust-ningkang-lab.github.io/Meta-Prism-2.0/. Oxford University Press 2022-07-28 /pmc/articles/PMC9334027/ /pubmed/35902093 http://dx.doi.org/10.1093/gigascience/giac073 Text en © The Author(s) 2022. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Kang, Kai
Chong, Hui
Ning, Kang
Meta-Prism 2.0: Enabling algorithm and web server for ultra-fast, memory-efficient, and accurate analysis among millions of microbial community samples
title Meta-Prism 2.0: Enabling algorithm and web server for ultra-fast, memory-efficient, and accurate analysis among millions of microbial community samples
title_full Meta-Prism 2.0: Enabling algorithm and web server for ultra-fast, memory-efficient, and accurate analysis among millions of microbial community samples
title_fullStr Meta-Prism 2.0: Enabling algorithm and web server for ultra-fast, memory-efficient, and accurate analysis among millions of microbial community samples
title_full_unstemmed Meta-Prism 2.0: Enabling algorithm and web server for ultra-fast, memory-efficient, and accurate analysis among millions of microbial community samples
title_short Meta-Prism 2.0: Enabling algorithm and web server for ultra-fast, memory-efficient, and accurate analysis among millions of microbial community samples
title_sort meta-prism 2.0: enabling algorithm and web server for ultra-fast, memory-efficient, and accurate analysis among millions of microbial community samples
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9334027/
https://www.ncbi.nlm.nih.gov/pubmed/35902093
http://dx.doi.org/10.1093/gigascience/giac073
work_keys_str_mv AT kangkai metaprism20enablingalgorithmandwebserverforultrafastmemoryefficientandaccurateanalysisamongmillionsofmicrobialcommunitysamples
AT chonghui metaprism20enablingalgorithmandwebserverforultrafastmemoryefficientandaccurateanalysisamongmillionsofmicrobialcommunitysamples
AT ningkang metaprism20enablingalgorithmandwebserverforultrafastmemoryefficientandaccurateanalysisamongmillionsofmicrobialcommunitysamples