Cargando…

A systematic review of the application of machine learning in the detection and classification of transposable elements

BACKGROUND: Transposable elements (TEs) constitute the most common repeated sequences in eukaryotic genomes. Recent studies demonstrated their deep impact on species diversity, adaptation to the environment and diseases. Although there are many conventional bioinformatics algorithms for detecting an...

Descripción completa

Detalles Bibliográficos
Autores principales: Orozco-Arias, Simon, Isaza, Gustavo, Guyot, Romain, Tabares-Soto, Reinel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6967008/
https://www.ncbi.nlm.nih.gov/pubmed/31976169
http://dx.doi.org/10.7717/peerj.8311
_version_ 1783488863025496064
author Orozco-Arias, Simon
Isaza, Gustavo
Guyot, Romain
Tabares-Soto, Reinel
author_facet Orozco-Arias, Simon
Isaza, Gustavo
Guyot, Romain
Tabares-Soto, Reinel
author_sort Orozco-Arias, Simon
collection PubMed
description BACKGROUND: Transposable elements (TEs) constitute the most common repeated sequences in eukaryotic genomes. Recent studies demonstrated their deep impact on species diversity, adaptation to the environment and diseases. Although there are many conventional bioinformatics algorithms for detecting and classifying TEs, none have achieved reliable results on different types of TEs. Machine learning (ML) techniques can automatically extract hidden patterns and novel information from labeled or non-labeled data and have been applied to solving several scientific problems. METHODOLOGY: We followed the Systematic Literature Review (SLR) process, applying the six stages of the review protocol from it, but added a previous stage, which aims to detect the need for a review. Then search equations were formulated and executed in several literature databases. Relevant publications were scanned and used to extract evidence to answer research questions. RESULTS: Several ML approaches have already been tested on other bioinformatics problems with promising results, yet there are few algorithms and architectures available in literature focused specifically on TEs, despite representing the majority of the nuclear DNA of many organisms. Only 35 articles were found and categorized as relevant in TE or related fields. CONCLUSIONS: ML is a powerful tool that can be used to address many problems. Although ML techniques have been used widely in other biological tasks, their utilization in TE analyses is still limited. Following the SLR, it was possible to notice that the use of ML for TE analyses (detection and classification) is an open problem, and this new field of research is growing in interest.
format Online
Article
Text
id pubmed-6967008
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-69670082020-01-23 A systematic review of the application of machine learning in the detection and classification of transposable elements Orozco-Arias, Simon Isaza, Gustavo Guyot, Romain Tabares-Soto, Reinel PeerJ Bioinformatics BACKGROUND: Transposable elements (TEs) constitute the most common repeated sequences in eukaryotic genomes. Recent studies demonstrated their deep impact on species diversity, adaptation to the environment and diseases. Although there are many conventional bioinformatics algorithms for detecting and classifying TEs, none have achieved reliable results on different types of TEs. Machine learning (ML) techniques can automatically extract hidden patterns and novel information from labeled or non-labeled data and have been applied to solving several scientific problems. METHODOLOGY: We followed the Systematic Literature Review (SLR) process, applying the six stages of the review protocol from it, but added a previous stage, which aims to detect the need for a review. Then search equations were formulated and executed in several literature databases. Relevant publications were scanned and used to extract evidence to answer research questions. RESULTS: Several ML approaches have already been tested on other bioinformatics problems with promising results, yet there are few algorithms and architectures available in literature focused specifically on TEs, despite representing the majority of the nuclear DNA of many organisms. Only 35 articles were found and categorized as relevant in TE or related fields. CONCLUSIONS: ML is a powerful tool that can be used to address many problems. Although ML techniques have been used widely in other biological tasks, their utilization in TE analyses is still limited. Following the SLR, it was possible to notice that the use of ML for TE analyses (detection and classification) is an open problem, and this new field of research is growing in interest. PeerJ Inc. 2019-12-18 /pmc/articles/PMC6967008/ /pubmed/31976169 http://dx.doi.org/10.7717/peerj.8311 Text en © 2019 Orozco-Arias et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Orozco-Arias, Simon
Isaza, Gustavo
Guyot, Romain
Tabares-Soto, Reinel
A systematic review of the application of machine learning in the detection and classification of transposable elements
title A systematic review of the application of machine learning in the detection and classification of transposable elements
title_full A systematic review of the application of machine learning in the detection and classification of transposable elements
title_fullStr A systematic review of the application of machine learning in the detection and classification of transposable elements
title_full_unstemmed A systematic review of the application of machine learning in the detection and classification of transposable elements
title_short A systematic review of the application of machine learning in the detection and classification of transposable elements
title_sort systematic review of the application of machine learning in the detection and classification of transposable elements
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6967008/
https://www.ncbi.nlm.nih.gov/pubmed/31976169
http://dx.doi.org/10.7717/peerj.8311
work_keys_str_mv AT orozcoariassimon asystematicreviewoftheapplicationofmachinelearninginthedetectionandclassificationoftransposableelements
AT isazagustavo asystematicreviewoftheapplicationofmachinelearninginthedetectionandclassificationoftransposableelements
AT guyotromain asystematicreviewoftheapplicationofmachinelearninginthedetectionandclassificationoftransposableelements
AT tabaressotoreinel asystematicreviewoftheapplicationofmachinelearninginthedetectionandclassificationoftransposableelements
AT orozcoariassimon systematicreviewoftheapplicationofmachinelearninginthedetectionandclassificationoftransposableelements
AT isazagustavo systematicreviewoftheapplicationofmachinelearninginthedetectionandclassificationoftransposableelements
AT guyotromain systematicreviewoftheapplicationofmachinelearninginthedetectionandclassificationoftransposableelements
AT tabaressotoreinel systematicreviewoftheapplicationofmachinelearninginthedetectionandclassificationoftransposableelements