Cargando…
K-mer-based machine learning method to classify LTR-retrotransposons in plant genomes
Every day more plant genomes are available in public databases and additional massive sequencing projects (i.e., that aim to sequence thousands of individuals) are formulated and released. Nevertheless, there are not enough automatic tools to analyze this large amount of genomic information. LTR ret...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8140598/ https://www.ncbi.nlm.nih.gov/pubmed/34055489 http://dx.doi.org/10.7717/peerj.11456 |
_version_ | 1783696219354169344 |
---|---|
author | Orozco-Arias, Simon Candamil-Cortés, Mariana S. Jaimes, Paula A. Piña, Johan S. Tabares-Soto, Reinel Guyot, Romain Isaza, Gustavo |
author_facet | Orozco-Arias, Simon Candamil-Cortés, Mariana S. Jaimes, Paula A. Piña, Johan S. Tabares-Soto, Reinel Guyot, Romain Isaza, Gustavo |
author_sort | Orozco-Arias, Simon |
collection | PubMed |
description | Every day more plant genomes are available in public databases and additional massive sequencing projects (i.e., that aim to sequence thousands of individuals) are formulated and released. Nevertheless, there are not enough automatic tools to analyze this large amount of genomic information. LTR retrotransposons are the most frequent repetitive sequences in plant genomes; however, their detection and classification are commonly performed using semi-automatic and time-consuming programs. Despite the availability of several bioinformatic tools that follow different approaches to detect and classify them, none of these tools can individually obtain accurate results. Here, we used Machine Learning algorithms based on k-mer counts to classify LTR retrotransposons from other genomic sequences and into lineages/families with an F1-Score of 95%, contributing to develop a free-alignment and automatic method to analyze these sequences. |
format | Online Article Text |
id | pubmed-8140598 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-81405982021-05-27 K-mer-based machine learning method to classify LTR-retrotransposons in plant genomes Orozco-Arias, Simon Candamil-Cortés, Mariana S. Jaimes, Paula A. Piña, Johan S. Tabares-Soto, Reinel Guyot, Romain Isaza, Gustavo PeerJ Bioinformatics Every day more plant genomes are available in public databases and additional massive sequencing projects (i.e., that aim to sequence thousands of individuals) are formulated and released. Nevertheless, there are not enough automatic tools to analyze this large amount of genomic information. LTR retrotransposons are the most frequent repetitive sequences in plant genomes; however, their detection and classification are commonly performed using semi-automatic and time-consuming programs. Despite the availability of several bioinformatic tools that follow different approaches to detect and classify them, none of these tools can individually obtain accurate results. Here, we used Machine Learning algorithms based on k-mer counts to classify LTR retrotransposons from other genomic sequences and into lineages/families with an F1-Score of 95%, contributing to develop a free-alignment and automatic method to analyze these sequences. PeerJ Inc. 2021-05-19 /pmc/articles/PMC8140598/ /pubmed/34055489 http://dx.doi.org/10.7717/peerj.11456 Text en © 2021 Orozco-Arias et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Orozco-Arias, Simon Candamil-Cortés, Mariana S. Jaimes, Paula A. Piña, Johan S. Tabares-Soto, Reinel Guyot, Romain Isaza, Gustavo K-mer-based machine learning method to classify LTR-retrotransposons in plant genomes |
title | K-mer-based machine learning method to classify LTR-retrotransposons in plant genomes |
title_full | K-mer-based machine learning method to classify LTR-retrotransposons in plant genomes |
title_fullStr | K-mer-based machine learning method to classify LTR-retrotransposons in plant genomes |
title_full_unstemmed | K-mer-based machine learning method to classify LTR-retrotransposons in plant genomes |
title_short | K-mer-based machine learning method to classify LTR-retrotransposons in plant genomes |
title_sort | k-mer-based machine learning method to classify ltr-retrotransposons in plant genomes |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8140598/ https://www.ncbi.nlm.nih.gov/pubmed/34055489 http://dx.doi.org/10.7717/peerj.11456 |
work_keys_str_mv | AT orozcoariassimon kmerbasedmachinelearningmethodtoclassifyltrretrotransposonsinplantgenomes AT candamilcortesmarianas kmerbasedmachinelearningmethodtoclassifyltrretrotransposonsinplantgenomes AT jaimespaulaa kmerbasedmachinelearningmethodtoclassifyltrretrotransposonsinplantgenomes AT pinajohans kmerbasedmachinelearningmethodtoclassifyltrretrotransposonsinplantgenomes AT tabaressotoreinel kmerbasedmachinelearningmethodtoclassifyltrretrotransposonsinplantgenomes AT guyotromain kmerbasedmachinelearningmethodtoclassifyltrretrotransposonsinplantgenomes AT isazagustavo kmerbasedmachinelearningmethodtoclassifyltrretrotransposonsinplantgenomes |