Cargando…

GeneMark-ETP: Automatic Gene Finding in Eukaryotic Genomes in Consistency with Extrinsic Data

New large scale initiatives, such as the Earth BioGenome Project, require efficient automatic tools for eukaryotic genome annotation. A new automatic tool, GeneMark-ETP, presented here, finds genes by integration of genomic-, transcriptomic- and protein-derived evidence. GeneMark-ETP first identifie...

Descripción completa

Detalles Bibliográficos
Autores principales: Bruna, Tomas, Lomsadze, Alexandre, Borodovsky, Mark
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9882169/
https://www.ncbi.nlm.nih.gov/pubmed/36711453
http://dx.doi.org/10.1101/2023.01.13.524024
_version_ 1784879248661544960
author Bruna, Tomas
Lomsadze, Alexandre
Borodovsky, Mark
author_facet Bruna, Tomas
Lomsadze, Alexandre
Borodovsky, Mark
author_sort Bruna, Tomas
collection PubMed
description New large scale initiatives, such as the Earth BioGenome Project, require efficient automatic tools for eukaryotic genome annotation. A new automatic tool, GeneMark-ETP, presented here, finds genes by integration of genomic-, transcriptomic- and protein-derived evidence. GeneMark-ETP first identifies genomic loci where extrinsic data is sufficient for gene prediction with ‘high confidence’ and then proceeds with finding the remaining genes across the whole genome. The initial set of parameters of the statistical model is estimated on the training set made from the high confidence genes. Subsequently, the model parameters are iteratively updated in the cycles of gene prediction and parameter re-estimation. Upon reaching convergence GeneMark-ETP makes the final prediction of the whole complement of genes. The algorithm development was made with a focus on large plant and animal genomes. GeneMark-ETP performance was compared favorably with the ones of the gene finders using a single type of extrinsic evidence delivered by either short RNA reads (GeneMark-ET), or by mapped to genome homologous proteins (GeneMark-EP+). These outcomes could be expected. Moreover, comparisons were made with the pipelines utilizing both transcript- and protein-derived extrinsic evidence. For these experiments we have chosen TSEBRA, combining BRAKER1 and BRAKER2, as well as MAKER2. The results demonstrated that GeneMark-ETP delivered state-of-the-art gene prediction accuracy with a large margin of improvement in large eukaryotic genomes.
format Online
Article
Text
id pubmed-9882169
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-98821692023-01-28 GeneMark-ETP: Automatic Gene Finding in Eukaryotic Genomes in Consistency with Extrinsic Data Bruna, Tomas Lomsadze, Alexandre Borodovsky, Mark bioRxiv Article New large scale initiatives, such as the Earth BioGenome Project, require efficient automatic tools for eukaryotic genome annotation. A new automatic tool, GeneMark-ETP, presented here, finds genes by integration of genomic-, transcriptomic- and protein-derived evidence. GeneMark-ETP first identifies genomic loci where extrinsic data is sufficient for gene prediction with ‘high confidence’ and then proceeds with finding the remaining genes across the whole genome. The initial set of parameters of the statistical model is estimated on the training set made from the high confidence genes. Subsequently, the model parameters are iteratively updated in the cycles of gene prediction and parameter re-estimation. Upon reaching convergence GeneMark-ETP makes the final prediction of the whole complement of genes. The algorithm development was made with a focus on large plant and animal genomes. GeneMark-ETP performance was compared favorably with the ones of the gene finders using a single type of extrinsic evidence delivered by either short RNA reads (GeneMark-ET), or by mapped to genome homologous proteins (GeneMark-EP+). These outcomes could be expected. Moreover, comparisons were made with the pipelines utilizing both transcript- and protein-derived extrinsic evidence. For these experiments we have chosen TSEBRA, combining BRAKER1 and BRAKER2, as well as MAKER2. The results demonstrated that GeneMark-ETP delivered state-of-the-art gene prediction accuracy with a large margin of improvement in large eukaryotic genomes. Cold Spring Harbor Laboratory 2023-08-07 /pmc/articles/PMC9882169/ /pubmed/36711453 http://dx.doi.org/10.1101/2023.01.13.524024 Text en https://creativecommons.org/licenses/by-nd/4.0/This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, and only so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Bruna, Tomas
Lomsadze, Alexandre
Borodovsky, Mark
GeneMark-ETP: Automatic Gene Finding in Eukaryotic Genomes in Consistency with Extrinsic Data
title GeneMark-ETP: Automatic Gene Finding in Eukaryotic Genomes in Consistency with Extrinsic Data
title_full GeneMark-ETP: Automatic Gene Finding in Eukaryotic Genomes in Consistency with Extrinsic Data
title_fullStr GeneMark-ETP: Automatic Gene Finding in Eukaryotic Genomes in Consistency with Extrinsic Data
title_full_unstemmed GeneMark-ETP: Automatic Gene Finding in Eukaryotic Genomes in Consistency with Extrinsic Data
title_short GeneMark-ETP: Automatic Gene Finding in Eukaryotic Genomes in Consistency with Extrinsic Data
title_sort genemark-etp: automatic gene finding in eukaryotic genomes in consistency with extrinsic data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9882169/
https://www.ncbi.nlm.nih.gov/pubmed/36711453
http://dx.doi.org/10.1101/2023.01.13.524024
work_keys_str_mv AT brunatomas genemarketpautomaticgenefindingineukaryoticgenomesinconsistencywithextrinsicdata
AT lomsadzealexandre genemarketpautomaticgenefindingineukaryoticgenomesinconsistencywithextrinsicdata
AT borodovskymark genemarketpautomaticgenefindingineukaryoticgenomesinconsistencywithextrinsicdata