Cargando…

InpactorDB: A Classified Lineage-Level Plant LTR Retrotransposon Reference Library for Free-Alignment Methods Based on Machine Learning

Long terminal repeat (LTR) retrotransposons are mobile elements that constitute the major fraction of most plant genomes. The identification and annotation of these elements via bioinformatics approaches represent a major challenge in the era of massive plant genome sequencing. In addition to their...

Descripción completa

Detalles Bibliográficos
Autores principales: Orozco-Arias, Simon, Jaimes, Paula A., Candamil, Mariana S., Jiménez-Varón, Cristian Felipe, Tabares-Soto, Reinel, Isaza, Gustavo, Guyot, Romain
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7910972/
https://www.ncbi.nlm.nih.gov/pubmed/33525408
http://dx.doi.org/10.3390/genes12020190
_version_ 1783656237401899008
author Orozco-Arias, Simon
Jaimes, Paula A.
Candamil, Mariana S.
Jiménez-Varón, Cristian Felipe
Tabares-Soto, Reinel
Isaza, Gustavo
Guyot, Romain
author_facet Orozco-Arias, Simon
Jaimes, Paula A.
Candamil, Mariana S.
Jiménez-Varón, Cristian Felipe
Tabares-Soto, Reinel
Isaza, Gustavo
Guyot, Romain
author_sort Orozco-Arias, Simon
collection PubMed
description Long terminal repeat (LTR) retrotransposons are mobile elements that constitute the major fraction of most plant genomes. The identification and annotation of these elements via bioinformatics approaches represent a major challenge in the era of massive plant genome sequencing. In addition to their involvement in genome size variation, LTR retrotransposons are also associated with the function and structure of different chromosomal regions and can alter the function of coding regions, among others. Several sequence databases of plant LTR retrotransposons are available for public access, such as PGSB and RepetDB, or restricted access such as Repbase. Although these databases are useful to identify LTR-RTs in new genomes by similarity, the elements of these databases are not fully classified to the lineage (also called family) level. Here, we present InpactorDB, a semi-curated dataset composed of 130,439 elements from 195 plant genomes (belonging to 108 plant species) classified to the lineage level. This dataset has been used to train two deep neural networks (i.e., one fully connected and one convolutional) for the rapid classification of these elements. In lineage-level classification approaches, we obtain up to 98% performance, indicated by the F1-score, precision and recall scores.
format Online
Article
Text
id pubmed-7910972
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-79109722021-02-28 InpactorDB: A Classified Lineage-Level Plant LTR Retrotransposon Reference Library for Free-Alignment Methods Based on Machine Learning Orozco-Arias, Simon Jaimes, Paula A. Candamil, Mariana S. Jiménez-Varón, Cristian Felipe Tabares-Soto, Reinel Isaza, Gustavo Guyot, Romain Genes (Basel) Article Long terminal repeat (LTR) retrotransposons are mobile elements that constitute the major fraction of most plant genomes. The identification and annotation of these elements via bioinformatics approaches represent a major challenge in the era of massive plant genome sequencing. In addition to their involvement in genome size variation, LTR retrotransposons are also associated with the function and structure of different chromosomal regions and can alter the function of coding regions, among others. Several sequence databases of plant LTR retrotransposons are available for public access, such as PGSB and RepetDB, or restricted access such as Repbase. Although these databases are useful to identify LTR-RTs in new genomes by similarity, the elements of these databases are not fully classified to the lineage (also called family) level. Here, we present InpactorDB, a semi-curated dataset composed of 130,439 elements from 195 plant genomes (belonging to 108 plant species) classified to the lineage level. This dataset has been used to train two deep neural networks (i.e., one fully connected and one convolutional) for the rapid classification of these elements. In lineage-level classification approaches, we obtain up to 98% performance, indicated by the F1-score, precision and recall scores. MDPI 2021-01-28 /pmc/articles/PMC7910972/ /pubmed/33525408 http://dx.doi.org/10.3390/genes12020190 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Orozco-Arias, Simon
Jaimes, Paula A.
Candamil, Mariana S.
Jiménez-Varón, Cristian Felipe
Tabares-Soto, Reinel
Isaza, Gustavo
Guyot, Romain
InpactorDB: A Classified Lineage-Level Plant LTR Retrotransposon Reference Library for Free-Alignment Methods Based on Machine Learning
title InpactorDB: A Classified Lineage-Level Plant LTR Retrotransposon Reference Library for Free-Alignment Methods Based on Machine Learning
title_full InpactorDB: A Classified Lineage-Level Plant LTR Retrotransposon Reference Library for Free-Alignment Methods Based on Machine Learning
title_fullStr InpactorDB: A Classified Lineage-Level Plant LTR Retrotransposon Reference Library for Free-Alignment Methods Based on Machine Learning
title_full_unstemmed InpactorDB: A Classified Lineage-Level Plant LTR Retrotransposon Reference Library for Free-Alignment Methods Based on Machine Learning
title_short InpactorDB: A Classified Lineage-Level Plant LTR Retrotransposon Reference Library for Free-Alignment Methods Based on Machine Learning
title_sort inpactordb: a classified lineage-level plant ltr retrotransposon reference library for free-alignment methods based on machine learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7910972/
https://www.ncbi.nlm.nih.gov/pubmed/33525408
http://dx.doi.org/10.3390/genes12020190
work_keys_str_mv AT orozcoariassimon inpactordbaclassifiedlineagelevelplantltrretrotransposonreferencelibraryforfreealignmentmethodsbasedonmachinelearning
AT jaimespaulaa inpactordbaclassifiedlineagelevelplantltrretrotransposonreferencelibraryforfreealignmentmethodsbasedonmachinelearning
AT candamilmarianas inpactordbaclassifiedlineagelevelplantltrretrotransposonreferencelibraryforfreealignmentmethodsbasedonmachinelearning
AT jimenezvaroncristianfelipe inpactordbaclassifiedlineagelevelplantltrretrotransposonreferencelibraryforfreealignmentmethodsbasedonmachinelearning
AT tabaressotoreinel inpactordbaclassifiedlineagelevelplantltrretrotransposonreferencelibraryforfreealignmentmethodsbasedonmachinelearning
AT isazagustavo inpactordbaclassifiedlineagelevelplantltrretrotransposonreferencelibraryforfreealignmentmethodsbasedonmachinelearning
AT guyotromain inpactordbaclassifiedlineagelevelplantltrretrotransposonreferencelibraryforfreealignmentmethodsbasedonmachinelearning