Cargando…

To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics

As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with the pace of sequence archives has remained a challenge. In recent years, the accelerated pace of genomic data availability has been accompanied...

Descripción completa

Detalles Bibliográficos
Autores principales: Elworth, R A Leo, Wang, Qi, Kota, Pavan K, Barberan, C J, Coleman, Benjamin, Balaji, Advait, Gupta, Gaurav, Baraniuk, Richard G, Shrivastava, Anshumali, Treangen, Todd J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7261164/
https://www.ncbi.nlm.nih.gov/pubmed/32338745
http://dx.doi.org/10.1093/nar/gkaa265
_version_ 1783540455095402496
author Elworth, R A Leo
Wang, Qi
Kota, Pavan K
Barberan, C J
Coleman, Benjamin
Balaji, Advait
Gupta, Gaurav
Baraniuk, Richard G
Shrivastava, Anshumali
Treangen, Todd J
author_facet Elworth, R A Leo
Wang, Qi
Kota, Pavan K
Barberan, C J
Coleman, Benjamin
Balaji, Advait
Gupta, Gaurav
Baraniuk, Richard G
Shrivastava, Anshumali
Treangen, Todd J
author_sort Elworth, R A Leo
collection PubMed
description As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with the pace of sequence archives has remained a challenge. In recent years, the accelerated pace of genomic data availability has been accompanied by the application of a wide array of highly efficient approaches from other fields to the field of metagenomics. For instance, sketching algorithms such as MinHash have seen a rapid and widespread adoption. These techniques handle increasingly large datasets with minimal sacrifices in quality for tasks such as sequence similarity calculations. Here, we briefly review the fundamentals of the most impactful probabilistic and signal processing algorithms. We also highlight more recent advances to augment previous reviews in these areas that have taken a broader approach. We then explore the application of these techniques to metagenomics, discuss their pros and cons, and speculate on their future directions.
format Online
Article
Text
id pubmed-7261164
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-72611642020-06-03 To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics Elworth, R A Leo Wang, Qi Kota, Pavan K Barberan, C J Coleman, Benjamin Balaji, Advait Gupta, Gaurav Baraniuk, Richard G Shrivastava, Anshumali Treangen, Todd J Nucleic Acids Res Survey and Summary As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with the pace of sequence archives has remained a challenge. In recent years, the accelerated pace of genomic data availability has been accompanied by the application of a wide array of highly efficient approaches from other fields to the field of metagenomics. For instance, sketching algorithms such as MinHash have seen a rapid and widespread adoption. These techniques handle increasingly large datasets with minimal sacrifices in quality for tasks such as sequence similarity calculations. Here, we briefly review the fundamentals of the most impactful probabilistic and signal processing algorithms. We also highlight more recent advances to augment previous reviews in these areas that have taken a broader approach. We then explore the application of these techniques to metagenomics, discuss their pros and cons, and speculate on their future directions. Oxford University Press 2020-06-04 2020-04-27 /pmc/articles/PMC7261164/ /pubmed/32338745 http://dx.doi.org/10.1093/nar/gkaa265 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Survey and Summary
Elworth, R A Leo
Wang, Qi
Kota, Pavan K
Barberan, C J
Coleman, Benjamin
Balaji, Advait
Gupta, Gaurav
Baraniuk, Richard G
Shrivastava, Anshumali
Treangen, Todd J
To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics
title To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics
title_full To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics
title_fullStr To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics
title_full_unstemmed To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics
title_short To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics
title_sort to petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics
topic Survey and Summary
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7261164/
https://www.ncbi.nlm.nih.gov/pubmed/32338745
http://dx.doi.org/10.1093/nar/gkaa265
work_keys_str_mv AT elworthraleo topetabytesandbeyondrecentadvancesinprobabilisticandsignalprocessingalgorithmsandtheirapplicationtometagenomics
AT wangqi topetabytesandbeyondrecentadvancesinprobabilisticandsignalprocessingalgorithmsandtheirapplicationtometagenomics
AT kotapavank topetabytesandbeyondrecentadvancesinprobabilisticandsignalprocessingalgorithmsandtheirapplicationtometagenomics
AT barberancj topetabytesandbeyondrecentadvancesinprobabilisticandsignalprocessingalgorithmsandtheirapplicationtometagenomics
AT colemanbenjamin topetabytesandbeyondrecentadvancesinprobabilisticandsignalprocessingalgorithmsandtheirapplicationtometagenomics
AT balajiadvait topetabytesandbeyondrecentadvancesinprobabilisticandsignalprocessingalgorithmsandtheirapplicationtometagenomics
AT guptagaurav topetabytesandbeyondrecentadvancesinprobabilisticandsignalprocessingalgorithmsandtheirapplicationtometagenomics
AT baraniukrichardg topetabytesandbeyondrecentadvancesinprobabilisticandsignalprocessingalgorithmsandtheirapplicationtometagenomics
AT shrivastavaanshumali topetabytesandbeyondrecentadvancesinprobabilisticandsignalprocessingalgorithmsandtheirapplicationtometagenomics
AT treangentoddj topetabytesandbeyondrecentadvancesinprobabilisticandsignalprocessingalgorithmsandtheirapplicationtometagenomics