Cargando…
Automatic curation of LTR retrotransposon libraries from plant genomes through machine learning
Transposable elements are mobile sequences that can move and insert themselves into chromosomes, activating under internal or external stimuli, giving the organism the ability to adapt to the environment. Annotating transposable elements in genomic data is currently considered a crucial task to unde...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
De Gruyter
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9521825/ https://www.ncbi.nlm.nih.gov/pubmed/35822734 http://dx.doi.org/10.1515/jib-2021-0036 |
_version_ | 1784799926177234944 |
---|---|
author | Orozco-Arias, Simon Candamil-Cortes, Mariana S. Jaimes, Paula A. Valencia-Castrillon, Estiven Tabares-Soto, Reinel Isaza, Gustavo Guyot, Romain |
author_facet | Orozco-Arias, Simon Candamil-Cortes, Mariana S. Jaimes, Paula A. Valencia-Castrillon, Estiven Tabares-Soto, Reinel Isaza, Gustavo Guyot, Romain |
author_sort | Orozco-Arias, Simon |
collection | PubMed |
description | Transposable elements are mobile sequences that can move and insert themselves into chromosomes, activating under internal or external stimuli, giving the organism the ability to adapt to the environment. Annotating transposable elements in genomic data is currently considered a crucial task to understand key aspects of organisms such as phenotype variability, species evolution, and genome size, among others. Because of the way they replicate, LTR retrotransposons are the most common transposable elements in plants, accounting in some cases for up to 80% of all DNA information. To annotate these elements, a reference library is usually created, a curation process is performed, eliminating TE fragments and false positives and then annotated in the genome using the homology method. However, the curation process can take weeks, requires extensive manual work and the execution of multiple time-consuming bioinformatics software. Here, we propose a machine learning-based approach to perform this process automatically on plant genomes, obtaining up to 91.18% F1-score. This approach was tested with four plant species, obtaining up to 93.6% F1-score (Oryza granulata) in only 22.61 s, where bioinformatics methods took approximately 6 h. This acceleration demonstrates that the ML-based approach is efficient and could be used in massive sequencing projects. |
format | Online Article Text |
id | pubmed-9521825 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | De Gruyter |
record_format | MEDLINE/PubMed |
spelling | pubmed-95218252022-10-26 Automatic curation of LTR retrotransposon libraries from plant genomes through machine learning Orozco-Arias, Simon Candamil-Cortes, Mariana S. Jaimes, Paula A. Valencia-Castrillon, Estiven Tabares-Soto, Reinel Isaza, Gustavo Guyot, Romain J Integr Bioinform Workshop Transposable elements are mobile sequences that can move and insert themselves into chromosomes, activating under internal or external stimuli, giving the organism the ability to adapt to the environment. Annotating transposable elements in genomic data is currently considered a crucial task to understand key aspects of organisms such as phenotype variability, species evolution, and genome size, among others. Because of the way they replicate, LTR retrotransposons are the most common transposable elements in plants, accounting in some cases for up to 80% of all DNA information. To annotate these elements, a reference library is usually created, a curation process is performed, eliminating TE fragments and false positives and then annotated in the genome using the homology method. However, the curation process can take weeks, requires extensive manual work and the execution of multiple time-consuming bioinformatics software. Here, we propose a machine learning-based approach to perform this process automatically on plant genomes, obtaining up to 91.18% F1-score. This approach was tested with four plant species, obtaining up to 93.6% F1-score (Oryza granulata) in only 22.61 s, where bioinformatics methods took approximately 6 h. This acceleration demonstrates that the ML-based approach is efficient and could be used in massive sequencing projects. De Gruyter 2022-07-12 /pmc/articles/PMC9521825/ /pubmed/35822734 http://dx.doi.org/10.1515/jib-2021-0036 Text en © 2022 the author(s), published by De Gruyter, Berlin/Boston https://creativecommons.org/licenses/by/4.0/This work is licensed under the Creative Commons Attribution 4.0 International License. |
spellingShingle | Workshop Orozco-Arias, Simon Candamil-Cortes, Mariana S. Jaimes, Paula A. Valencia-Castrillon, Estiven Tabares-Soto, Reinel Isaza, Gustavo Guyot, Romain Automatic curation of LTR retrotransposon libraries from plant genomes through machine learning |
title | Automatic curation of LTR retrotransposon libraries from plant genomes through machine learning |
title_full | Automatic curation of LTR retrotransposon libraries from plant genomes through machine learning |
title_fullStr | Automatic curation of LTR retrotransposon libraries from plant genomes through machine learning |
title_full_unstemmed | Automatic curation of LTR retrotransposon libraries from plant genomes through machine learning |
title_short | Automatic curation of LTR retrotransposon libraries from plant genomes through machine learning |
title_sort | automatic curation of ltr retrotransposon libraries from plant genomes through machine learning |
topic | Workshop |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9521825/ https://www.ncbi.nlm.nih.gov/pubmed/35822734 http://dx.doi.org/10.1515/jib-2021-0036 |
work_keys_str_mv | AT orozcoariassimon automaticcurationofltrretrotransposonlibrariesfromplantgenomesthroughmachinelearning AT candamilcortesmarianas automaticcurationofltrretrotransposonlibrariesfromplantgenomesthroughmachinelearning AT jaimespaulaa automaticcurationofltrretrotransposonlibrariesfromplantgenomesthroughmachinelearning AT valenciacastrillonestiven automaticcurationofltrretrotransposonlibrariesfromplantgenomesthroughmachinelearning AT tabaressotoreinel automaticcurationofltrretrotransposonlibrariesfromplantgenomesthroughmachinelearning AT isazagustavo automaticcurationofltrretrotransposonlibrariesfromplantgenomesthroughmachinelearning AT guyotromain automaticcurationofltrretrotransposonlibrariesfromplantgenomesthroughmachinelearning |