Cargando…

Inpactor2: a software based on deep learning to identify and classify LTR-retrotransposons in plant genomes

LTR-retrotransposons are the most abundant repeat sequences in plant genomes and play an important role in evolution and biodiversity. Their characterization is of great importance to understand their dynamics. However, the identification and classification of these elements remains a challenge toda...

Descripción completa

Detalles Bibliográficos
Autores principales: Orozco-Arias, Simon, Humberto Lopez-Murillo, Luis, Candamil-Cortés, Mariana S, Arias, Maradey, Jaimes, Paula A, Rossi Paschoal, Alexandre, Tabares-Soto, Reinel, Isaza, Gustavo, Guyot, Romain
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9851300/
https://www.ncbi.nlm.nih.gov/pubmed/36502372
http://dx.doi.org/10.1093/bib/bbac511
_version_ 1784872367212724224
author Orozco-Arias, Simon
Humberto Lopez-Murillo, Luis
Candamil-Cortés, Mariana S
Arias, Maradey
Jaimes, Paula A
Rossi Paschoal, Alexandre
Tabares-Soto, Reinel
Isaza, Gustavo
Guyot, Romain
author_facet Orozco-Arias, Simon
Humberto Lopez-Murillo, Luis
Candamil-Cortés, Mariana S
Arias, Maradey
Jaimes, Paula A
Rossi Paschoal, Alexandre
Tabares-Soto, Reinel
Isaza, Gustavo
Guyot, Romain
author_sort Orozco-Arias, Simon
collection PubMed
description LTR-retrotransposons are the most abundant repeat sequences in plant genomes and play an important role in evolution and biodiversity. Their characterization is of great importance to understand their dynamics. However, the identification and classification of these elements remains a challenge today. Moreover, current software can be relatively slow (from hours to days), sometimes involve a lot of manual work and do not reach satisfactory levels in terms of precision and sensitivity. Here we present Inpactor2, an accurate and fast application that creates LTR-retrotransposon reference libraries in a very short time. Inpactor2 takes an assembled genome as input and follows a hybrid approach (deep learning and structure-based) to detect elements, filter partial sequences and finally classify intact sequences into superfamilies and, as very few tools do, into lineages. This tool takes advantage of multi-core and GPU architectures to decrease execution times. Using the rice genome, Inpactor2 showed a run time of 5 minutes (faster than other tools) and has the best accuracy and F1-Score of the tools tested here, also having the second best accuracy and specificity only surpassed by EDTA, but achieving 28% higher sensitivity. For large genomes, Inpactor2 is up to seven times faster than other available bioinformatics tools.
format Online
Article
Text
id pubmed-9851300
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-98513002023-01-20 Inpactor2: a software based on deep learning to identify and classify LTR-retrotransposons in plant genomes Orozco-Arias, Simon Humberto Lopez-Murillo, Luis Candamil-Cortés, Mariana S Arias, Maradey Jaimes, Paula A Rossi Paschoal, Alexandre Tabares-Soto, Reinel Isaza, Gustavo Guyot, Romain Brief Bioinform Problem Solving Protocol LTR-retrotransposons are the most abundant repeat sequences in plant genomes and play an important role in evolution and biodiversity. Their characterization is of great importance to understand their dynamics. However, the identification and classification of these elements remains a challenge today. Moreover, current software can be relatively slow (from hours to days), sometimes involve a lot of manual work and do not reach satisfactory levels in terms of precision and sensitivity. Here we present Inpactor2, an accurate and fast application that creates LTR-retrotransposon reference libraries in a very short time. Inpactor2 takes an assembled genome as input and follows a hybrid approach (deep learning and structure-based) to detect elements, filter partial sequences and finally classify intact sequences into superfamilies and, as very few tools do, into lineages. This tool takes advantage of multi-core and GPU architectures to decrease execution times. Using the rice genome, Inpactor2 showed a run time of 5 minutes (faster than other tools) and has the best accuracy and F1-Score of the tools tested here, also having the second best accuracy and specificity only surpassed by EDTA, but achieving 28% higher sensitivity. For large genomes, Inpactor2 is up to seven times faster than other available bioinformatics tools. Oxford University Press 2022-12-10 /pmc/articles/PMC9851300/ /pubmed/36502372 http://dx.doi.org/10.1093/bib/bbac511 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Problem Solving Protocol
Orozco-Arias, Simon
Humberto Lopez-Murillo, Luis
Candamil-Cortés, Mariana S
Arias, Maradey
Jaimes, Paula A
Rossi Paschoal, Alexandre
Tabares-Soto, Reinel
Isaza, Gustavo
Guyot, Romain
Inpactor2: a software based on deep learning to identify and classify LTR-retrotransposons in plant genomes
title Inpactor2: a software based on deep learning to identify and classify LTR-retrotransposons in plant genomes
title_full Inpactor2: a software based on deep learning to identify and classify LTR-retrotransposons in plant genomes
title_fullStr Inpactor2: a software based on deep learning to identify and classify LTR-retrotransposons in plant genomes
title_full_unstemmed Inpactor2: a software based on deep learning to identify and classify LTR-retrotransposons in plant genomes
title_short Inpactor2: a software based on deep learning to identify and classify LTR-retrotransposons in plant genomes
title_sort inpactor2: a software based on deep learning to identify and classify ltr-retrotransposons in plant genomes
topic Problem Solving Protocol
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9851300/
https://www.ncbi.nlm.nih.gov/pubmed/36502372
http://dx.doi.org/10.1093/bib/bbac511
work_keys_str_mv AT orozcoariassimon inpactor2asoftwarebasedondeeplearningtoidentifyandclassifyltrretrotransposonsinplantgenomes
AT humbertolopezmurilloluis inpactor2asoftwarebasedondeeplearningtoidentifyandclassifyltrretrotransposonsinplantgenomes
AT candamilcortesmarianas inpactor2asoftwarebasedondeeplearningtoidentifyandclassifyltrretrotransposonsinplantgenomes
AT ariasmaradey inpactor2asoftwarebasedondeeplearningtoidentifyandclassifyltrretrotransposonsinplantgenomes
AT jaimespaulaa inpactor2asoftwarebasedondeeplearningtoidentifyandclassifyltrretrotransposonsinplantgenomes
AT rossipaschoalalexandre inpactor2asoftwarebasedondeeplearningtoidentifyandclassifyltrretrotransposonsinplantgenomes
AT tabaressotoreinel inpactor2asoftwarebasedondeeplearningtoidentifyandclassifyltrretrotransposonsinplantgenomes
AT isazagustavo inpactor2asoftwarebasedondeeplearningtoidentifyandclassifyltrretrotransposonsinplantgenomes
AT guyotromain inpactor2asoftwarebasedondeeplearningtoidentifyandclassifyltrretrotransposonsinplantgenomes