Cargando…

Automatic curation of LTR retrotransposon libraries from plant genomes through machine learning

Transposable elements are mobile sequences that can move and insert themselves into chromosomes, activating under internal or external stimuli, giving the organism the ability to adapt to the environment. Annotating transposable elements in genomic data is currently considered a crucial task to unde...

Descripción completa

Detalles Bibliográficos
Autores principales: Orozco-Arias, Simon, Candamil-Cortes, Mariana S., Jaimes, Paula A., Valencia-Castrillon, Estiven, Tabares-Soto, Reinel, Isaza, Gustavo, Guyot, Romain
Formato: Online Artículo Texto
Lenguaje:English
Publicado: De Gruyter 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9521825/
https://www.ncbi.nlm.nih.gov/pubmed/35822734
http://dx.doi.org/10.1515/jib-2021-0036
_version_ 1784799926177234944
author Orozco-Arias, Simon
Candamil-Cortes, Mariana S.
Jaimes, Paula A.
Valencia-Castrillon, Estiven
Tabares-Soto, Reinel
Isaza, Gustavo
Guyot, Romain
author_facet Orozco-Arias, Simon
Candamil-Cortes, Mariana S.
Jaimes, Paula A.
Valencia-Castrillon, Estiven
Tabares-Soto, Reinel
Isaza, Gustavo
Guyot, Romain
author_sort Orozco-Arias, Simon
collection PubMed
description Transposable elements are mobile sequences that can move and insert themselves into chromosomes, activating under internal or external stimuli, giving the organism the ability to adapt to the environment. Annotating transposable elements in genomic data is currently considered a crucial task to understand key aspects of organisms such as phenotype variability, species evolution, and genome size, among others. Because of the way they replicate, LTR retrotransposons are the most common transposable elements in plants, accounting in some cases for up to 80% of all DNA information. To annotate these elements, a reference library is usually created, a curation process is performed, eliminating TE fragments and false positives and then annotated in the genome using the homology method. However, the curation process can take weeks, requires extensive manual work and the execution of multiple time-consuming bioinformatics software. Here, we propose a machine learning-based approach to perform this process automatically on plant genomes, obtaining up to 91.18% F1-score. This approach was tested with four plant species, obtaining up to 93.6% F1-score (Oryza granulata) in only 22.61 s, where bioinformatics methods took approximately 6 h. This acceleration demonstrates that the ML-based approach is efficient and could be used in massive sequencing projects.
format Online
Article
Text
id pubmed-9521825
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher De Gruyter
record_format MEDLINE/PubMed
spelling pubmed-95218252022-10-26 Automatic curation of LTR retrotransposon libraries from plant genomes through machine learning Orozco-Arias, Simon Candamil-Cortes, Mariana S. Jaimes, Paula A. Valencia-Castrillon, Estiven Tabares-Soto, Reinel Isaza, Gustavo Guyot, Romain J Integr Bioinform Workshop Transposable elements are mobile sequences that can move and insert themselves into chromosomes, activating under internal or external stimuli, giving the organism the ability to adapt to the environment. Annotating transposable elements in genomic data is currently considered a crucial task to understand key aspects of organisms such as phenotype variability, species evolution, and genome size, among others. Because of the way they replicate, LTR retrotransposons are the most common transposable elements in plants, accounting in some cases for up to 80% of all DNA information. To annotate these elements, a reference library is usually created, a curation process is performed, eliminating TE fragments and false positives and then annotated in the genome using the homology method. However, the curation process can take weeks, requires extensive manual work and the execution of multiple time-consuming bioinformatics software. Here, we propose a machine learning-based approach to perform this process automatically on plant genomes, obtaining up to 91.18% F1-score. This approach was tested with four plant species, obtaining up to 93.6% F1-score (Oryza granulata) in only 22.61 s, where bioinformatics methods took approximately 6 h. This acceleration demonstrates that the ML-based approach is efficient and could be used in massive sequencing projects. De Gruyter 2022-07-12 /pmc/articles/PMC9521825/ /pubmed/35822734 http://dx.doi.org/10.1515/jib-2021-0036 Text en © 2022 the author(s), published by De Gruyter, Berlin/Boston https://creativecommons.org/licenses/by/4.0/This work is licensed under the Creative Commons Attribution 4.0 International License.
spellingShingle Workshop
Orozco-Arias, Simon
Candamil-Cortes, Mariana S.
Jaimes, Paula A.
Valencia-Castrillon, Estiven
Tabares-Soto, Reinel
Isaza, Gustavo
Guyot, Romain
Automatic curation of LTR retrotransposon libraries from plant genomes through machine learning
title Automatic curation of LTR retrotransposon libraries from plant genomes through machine learning
title_full Automatic curation of LTR retrotransposon libraries from plant genomes through machine learning
title_fullStr Automatic curation of LTR retrotransposon libraries from plant genomes through machine learning
title_full_unstemmed Automatic curation of LTR retrotransposon libraries from plant genomes through machine learning
title_short Automatic curation of LTR retrotransposon libraries from plant genomes through machine learning
title_sort automatic curation of ltr retrotransposon libraries from plant genomes through machine learning
topic Workshop
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9521825/
https://www.ncbi.nlm.nih.gov/pubmed/35822734
http://dx.doi.org/10.1515/jib-2021-0036
work_keys_str_mv AT orozcoariassimon automaticcurationofltrretrotransposonlibrariesfromplantgenomesthroughmachinelearning
AT candamilcortesmarianas automaticcurationofltrretrotransposonlibrariesfromplantgenomesthroughmachinelearning
AT jaimespaulaa automaticcurationofltrretrotransposonlibrariesfromplantgenomesthroughmachinelearning
AT valenciacastrillonestiven automaticcurationofltrretrotransposonlibrariesfromplantgenomesthroughmachinelearning
AT tabaressotoreinel automaticcurationofltrretrotransposonlibrariesfromplantgenomesthroughmachinelearning
AT isazagustavo automaticcurationofltrretrotransposonlibrariesfromplantgenomesthroughmachinelearning
AT guyotromain automaticcurationofltrretrotransposonlibrariesfromplantgenomesthroughmachinelearning