Cargando…
To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics
As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with the pace of sequence archives has remained a challenge. In recent years, the accelerated pace of genomic data availability has been accompanied...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7261164/ https://www.ncbi.nlm.nih.gov/pubmed/32338745 http://dx.doi.org/10.1093/nar/gkaa265 |
_version_ | 1783540455095402496 |
---|---|
author | Elworth, R A Leo Wang, Qi Kota, Pavan K Barberan, C J Coleman, Benjamin Balaji, Advait Gupta, Gaurav Baraniuk, Richard G Shrivastava, Anshumali Treangen, Todd J |
author_facet | Elworth, R A Leo Wang, Qi Kota, Pavan K Barberan, C J Coleman, Benjamin Balaji, Advait Gupta, Gaurav Baraniuk, Richard G Shrivastava, Anshumali Treangen, Todd J |
author_sort | Elworth, R A Leo |
collection | PubMed |
description | As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with the pace of sequence archives has remained a challenge. In recent years, the accelerated pace of genomic data availability has been accompanied by the application of a wide array of highly efficient approaches from other fields to the field of metagenomics. For instance, sketching algorithms such as MinHash have seen a rapid and widespread adoption. These techniques handle increasingly large datasets with minimal sacrifices in quality for tasks such as sequence similarity calculations. Here, we briefly review the fundamentals of the most impactful probabilistic and signal processing algorithms. We also highlight more recent advances to augment previous reviews in these areas that have taken a broader approach. We then explore the application of these techniques to metagenomics, discuss their pros and cons, and speculate on their future directions. |
format | Online Article Text |
id | pubmed-7261164 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-72611642020-06-03 To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics Elworth, R A Leo Wang, Qi Kota, Pavan K Barberan, C J Coleman, Benjamin Balaji, Advait Gupta, Gaurav Baraniuk, Richard G Shrivastava, Anshumali Treangen, Todd J Nucleic Acids Res Survey and Summary As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with the pace of sequence archives has remained a challenge. In recent years, the accelerated pace of genomic data availability has been accompanied by the application of a wide array of highly efficient approaches from other fields to the field of metagenomics. For instance, sketching algorithms such as MinHash have seen a rapid and widespread adoption. These techniques handle increasingly large datasets with minimal sacrifices in quality for tasks such as sequence similarity calculations. Here, we briefly review the fundamentals of the most impactful probabilistic and signal processing algorithms. We also highlight more recent advances to augment previous reviews in these areas that have taken a broader approach. We then explore the application of these techniques to metagenomics, discuss their pros and cons, and speculate on their future directions. Oxford University Press 2020-06-04 2020-04-27 /pmc/articles/PMC7261164/ /pubmed/32338745 http://dx.doi.org/10.1093/nar/gkaa265 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Survey and Summary Elworth, R A Leo Wang, Qi Kota, Pavan K Barberan, C J Coleman, Benjamin Balaji, Advait Gupta, Gaurav Baraniuk, Richard G Shrivastava, Anshumali Treangen, Todd J To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics |
title | To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics |
title_full | To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics |
title_fullStr | To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics |
title_full_unstemmed | To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics |
title_short | To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics |
title_sort | to petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics |
topic | Survey and Summary |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7261164/ https://www.ncbi.nlm.nih.gov/pubmed/32338745 http://dx.doi.org/10.1093/nar/gkaa265 |
work_keys_str_mv | AT elworthraleo topetabytesandbeyondrecentadvancesinprobabilisticandsignalprocessingalgorithmsandtheirapplicationtometagenomics AT wangqi topetabytesandbeyondrecentadvancesinprobabilisticandsignalprocessingalgorithmsandtheirapplicationtometagenomics AT kotapavank topetabytesandbeyondrecentadvancesinprobabilisticandsignalprocessingalgorithmsandtheirapplicationtometagenomics AT barberancj topetabytesandbeyondrecentadvancesinprobabilisticandsignalprocessingalgorithmsandtheirapplicationtometagenomics AT colemanbenjamin topetabytesandbeyondrecentadvancesinprobabilisticandsignalprocessingalgorithmsandtheirapplicationtometagenomics AT balajiadvait topetabytesandbeyondrecentadvancesinprobabilisticandsignalprocessingalgorithmsandtheirapplicationtometagenomics AT guptagaurav topetabytesandbeyondrecentadvancesinprobabilisticandsignalprocessingalgorithmsandtheirapplicationtometagenomics AT baraniukrichardg topetabytesandbeyondrecentadvancesinprobabilisticandsignalprocessingalgorithmsandtheirapplicationtometagenomics AT shrivastavaanshumali topetabytesandbeyondrecentadvancesinprobabilisticandsignalprocessingalgorithmsandtheirapplicationtometagenomics AT treangentoddj topetabytesandbeyondrecentadvancesinprobabilisticandsignalprocessingalgorithmsandtheirapplicationtometagenomics |