Cargando…

Literature on Applied Machine Learning in Metagenomic Classification: A Scoping Review

SIMPLE SUMMARY: Technological advancements have led to modern DNA sequencing methods, capable of generating large amounts of data describing the microorganisms that live in samples taken from the environment. Metagenomics, the field that studies the different genomes within these samples, is becomin...

Descripción completa

Detalles Bibliográficos
Autores principales: Tonkovic, Petar, Kalajdziski, Slobodan, Zdravevski, Eftim, Lameski, Petre, Corizzo, Roberto, Pires, Ivan Miguel, Garcia, Nuno M., Loncar-Turukalo, Tatjana, Trajkovik, Vladimir
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7763105/
https://www.ncbi.nlm.nih.gov/pubmed/33316921
http://dx.doi.org/10.3390/biology9120453
_version_ 1783627938487336960
author Tonkovic, Petar
Kalajdziski, Slobodan
Zdravevski, Eftim
Lameski, Petre
Corizzo, Roberto
Pires, Ivan Miguel
Garcia, Nuno M.
Loncar-Turukalo, Tatjana
Trajkovik, Vladimir
author_facet Tonkovic, Petar
Kalajdziski, Slobodan
Zdravevski, Eftim
Lameski, Petre
Corizzo, Roberto
Pires, Ivan Miguel
Garcia, Nuno M.
Loncar-Turukalo, Tatjana
Trajkovik, Vladimir
author_sort Tonkovic, Petar
collection PubMed
description SIMPLE SUMMARY: Technological advancements have led to modern DNA sequencing methods, capable of generating large amounts of data describing the microorganisms that live in samples taken from the environment. Metagenomics, the field that studies the different genomes within these samples, is becoming increasingly popular, as it has many real-world applications, such as the discovery of new antibiotics, personalized medicine, forensics, and many more. From a computer science point of view, it is interesting to see how these large volumes of data can be processed efficiently to accurately identify (classify) the microorganisms from the input DNA data. This scoping review aims to give an insight into the existing state of the art computational methods for processing metagenomic data through the prism of machine learning, data science, and big data. We provide an overview of the state of the art metagenomic classification methods, as well as the challenges researchers face when tackling this complex problem. The end goal of this review is to help researchers be up to date with current trends, as well as identify opportunities for further research and improvements. ABSTRACT: Applied machine learning in bioinformatics is growing as computer science slowly invades all research spheres. With the arrival of modern next-generation DNA sequencing algorithms, metagenomics is becoming an increasingly interesting research field as it finds countless practical applications exploiting the vast amounts of generated data. This study aims to scope the scientific literature in the field of metagenomic classification in the time interval 2008–2019 and provide an evolutionary timeline of data processing and machine learning in this field. This study follows the scoping review methodology and PRISMA guidelines to identify and process the available literature. Natural Language Processing (NLP) is deployed to ensure efficient and exhaustive search of the literary corpus of three large digital libraries: IEEE, PubMed, and Springer. The search is based on keywords and properties looked up using the digital libraries’ search engines. The scoping review results reveal an increasing number of research papers related to metagenomic classification over the past decade. The research is mainly focused on metagenomic classifiers, identifying scope specific metrics for model evaluation, data set sanitization, and dimensionality reduction. Out of all of these subproblems, data preprocessing is the least researched with considerable potential for improvement.
format Online
Article
Text
id pubmed-7763105
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-77631052020-12-27 Literature on Applied Machine Learning in Metagenomic Classification: A Scoping Review Tonkovic, Petar Kalajdziski, Slobodan Zdravevski, Eftim Lameski, Petre Corizzo, Roberto Pires, Ivan Miguel Garcia, Nuno M. Loncar-Turukalo, Tatjana Trajkovik, Vladimir Biology (Basel) Review SIMPLE SUMMARY: Technological advancements have led to modern DNA sequencing methods, capable of generating large amounts of data describing the microorganisms that live in samples taken from the environment. Metagenomics, the field that studies the different genomes within these samples, is becoming increasingly popular, as it has many real-world applications, such as the discovery of new antibiotics, personalized medicine, forensics, and many more. From a computer science point of view, it is interesting to see how these large volumes of data can be processed efficiently to accurately identify (classify) the microorganisms from the input DNA data. This scoping review aims to give an insight into the existing state of the art computational methods for processing metagenomic data through the prism of machine learning, data science, and big data. We provide an overview of the state of the art metagenomic classification methods, as well as the challenges researchers face when tackling this complex problem. The end goal of this review is to help researchers be up to date with current trends, as well as identify opportunities for further research and improvements. ABSTRACT: Applied machine learning in bioinformatics is growing as computer science slowly invades all research spheres. With the arrival of modern next-generation DNA sequencing algorithms, metagenomics is becoming an increasingly interesting research field as it finds countless practical applications exploiting the vast amounts of generated data. This study aims to scope the scientific literature in the field of metagenomic classification in the time interval 2008–2019 and provide an evolutionary timeline of data processing and machine learning in this field. This study follows the scoping review methodology and PRISMA guidelines to identify and process the available literature. Natural Language Processing (NLP) is deployed to ensure efficient and exhaustive search of the literary corpus of three large digital libraries: IEEE, PubMed, and Springer. The search is based on keywords and properties looked up using the digital libraries’ search engines. The scoping review results reveal an increasing number of research papers related to metagenomic classification over the past decade. The research is mainly focused on metagenomic classifiers, identifying scope specific metrics for model evaluation, data set sanitization, and dimensionality reduction. Out of all of these subproblems, data preprocessing is the least researched with considerable potential for improvement. MDPI 2020-12-09 /pmc/articles/PMC7763105/ /pubmed/33316921 http://dx.doi.org/10.3390/biology9120453 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Review
Tonkovic, Petar
Kalajdziski, Slobodan
Zdravevski, Eftim
Lameski, Petre
Corizzo, Roberto
Pires, Ivan Miguel
Garcia, Nuno M.
Loncar-Turukalo, Tatjana
Trajkovik, Vladimir
Literature on Applied Machine Learning in Metagenomic Classification: A Scoping Review
title Literature on Applied Machine Learning in Metagenomic Classification: A Scoping Review
title_full Literature on Applied Machine Learning in Metagenomic Classification: A Scoping Review
title_fullStr Literature on Applied Machine Learning in Metagenomic Classification: A Scoping Review
title_full_unstemmed Literature on Applied Machine Learning in Metagenomic Classification: A Scoping Review
title_short Literature on Applied Machine Learning in Metagenomic Classification: A Scoping Review
title_sort literature on applied machine learning in metagenomic classification: a scoping review
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7763105/
https://www.ncbi.nlm.nih.gov/pubmed/33316921
http://dx.doi.org/10.3390/biology9120453
work_keys_str_mv AT tonkovicpetar literatureonappliedmachinelearninginmetagenomicclassificationascopingreview
AT kalajdziskislobodan literatureonappliedmachinelearninginmetagenomicclassificationascopingreview
AT zdravevskieftim literatureonappliedmachinelearninginmetagenomicclassificationascopingreview
AT lameskipetre literatureonappliedmachinelearninginmetagenomicclassificationascopingreview
AT corizzoroberto literatureonappliedmachinelearninginmetagenomicclassificationascopingreview
AT piresivanmiguel literatureonappliedmachinelearninginmetagenomicclassificationascopingreview
AT garcianunom literatureonappliedmachinelearninginmetagenomicclassificationascopingreview
AT loncarturukalotatjana literatureonappliedmachinelearninginmetagenomicclassificationascopingreview
AT trajkovikvladimir literatureonappliedmachinelearninginmetagenomicclassificationascopingreview