Cargando…
GeneMark-ETP: Automatic Gene Finding in Eukaryotic Genomes in Consistency with Extrinsic Data
New large scale initiatives, such as the Earth BioGenome Project, require efficient automatic tools for eukaryotic genome annotation. A new automatic tool, GeneMark-ETP, presented here, finds genes by integration of genomic-, transcriptomic- and protein-derived evidence. GeneMark-ETP first identifie...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9882169/ https://www.ncbi.nlm.nih.gov/pubmed/36711453 http://dx.doi.org/10.1101/2023.01.13.524024 |
_version_ | 1784879248661544960 |
---|---|
author | Bruna, Tomas Lomsadze, Alexandre Borodovsky, Mark |
author_facet | Bruna, Tomas Lomsadze, Alexandre Borodovsky, Mark |
author_sort | Bruna, Tomas |
collection | PubMed |
description | New large scale initiatives, such as the Earth BioGenome Project, require efficient automatic tools for eukaryotic genome annotation. A new automatic tool, GeneMark-ETP, presented here, finds genes by integration of genomic-, transcriptomic- and protein-derived evidence. GeneMark-ETP first identifies genomic loci where extrinsic data is sufficient for gene prediction with ‘high confidence’ and then proceeds with finding the remaining genes across the whole genome. The initial set of parameters of the statistical model is estimated on the training set made from the high confidence genes. Subsequently, the model parameters are iteratively updated in the cycles of gene prediction and parameter re-estimation. Upon reaching convergence GeneMark-ETP makes the final prediction of the whole complement of genes. The algorithm development was made with a focus on large plant and animal genomes. GeneMark-ETP performance was compared favorably with the ones of the gene finders using a single type of extrinsic evidence delivered by either short RNA reads (GeneMark-ET), or by mapped to genome homologous proteins (GeneMark-EP+). These outcomes could be expected. Moreover, comparisons were made with the pipelines utilizing both transcript- and protein-derived extrinsic evidence. For these experiments we have chosen TSEBRA, combining BRAKER1 and BRAKER2, as well as MAKER2. The results demonstrated that GeneMark-ETP delivered state-of-the-art gene prediction accuracy with a large margin of improvement in large eukaryotic genomes. |
format | Online Article Text |
id | pubmed-9882169 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-98821692023-01-28 GeneMark-ETP: Automatic Gene Finding in Eukaryotic Genomes in Consistency with Extrinsic Data Bruna, Tomas Lomsadze, Alexandre Borodovsky, Mark bioRxiv Article New large scale initiatives, such as the Earth BioGenome Project, require efficient automatic tools for eukaryotic genome annotation. A new automatic tool, GeneMark-ETP, presented here, finds genes by integration of genomic-, transcriptomic- and protein-derived evidence. GeneMark-ETP first identifies genomic loci where extrinsic data is sufficient for gene prediction with ‘high confidence’ and then proceeds with finding the remaining genes across the whole genome. The initial set of parameters of the statistical model is estimated on the training set made from the high confidence genes. Subsequently, the model parameters are iteratively updated in the cycles of gene prediction and parameter re-estimation. Upon reaching convergence GeneMark-ETP makes the final prediction of the whole complement of genes. The algorithm development was made with a focus on large plant and animal genomes. GeneMark-ETP performance was compared favorably with the ones of the gene finders using a single type of extrinsic evidence delivered by either short RNA reads (GeneMark-ET), or by mapped to genome homologous proteins (GeneMark-EP+). These outcomes could be expected. Moreover, comparisons were made with the pipelines utilizing both transcript- and protein-derived extrinsic evidence. For these experiments we have chosen TSEBRA, combining BRAKER1 and BRAKER2, as well as MAKER2. The results demonstrated that GeneMark-ETP delivered state-of-the-art gene prediction accuracy with a large margin of improvement in large eukaryotic genomes. Cold Spring Harbor Laboratory 2023-08-07 /pmc/articles/PMC9882169/ /pubmed/36711453 http://dx.doi.org/10.1101/2023.01.13.524024 Text en https://creativecommons.org/licenses/by-nd/4.0/This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, and only so long as attribution is given to the creator. The license allows for commercial use. |
spellingShingle | Article Bruna, Tomas Lomsadze, Alexandre Borodovsky, Mark GeneMark-ETP: Automatic Gene Finding in Eukaryotic Genomes in Consistency with Extrinsic Data |
title | GeneMark-ETP: Automatic Gene Finding in Eukaryotic Genomes in Consistency with Extrinsic Data |
title_full | GeneMark-ETP: Automatic Gene Finding in Eukaryotic Genomes in Consistency with Extrinsic Data |
title_fullStr | GeneMark-ETP: Automatic Gene Finding in Eukaryotic Genomes in Consistency with Extrinsic Data |
title_full_unstemmed | GeneMark-ETP: Automatic Gene Finding in Eukaryotic Genomes in Consistency with Extrinsic Data |
title_short | GeneMark-ETP: Automatic Gene Finding in Eukaryotic Genomes in Consistency with Extrinsic Data |
title_sort | genemark-etp: automatic gene finding in eukaryotic genomes in consistency with extrinsic data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9882169/ https://www.ncbi.nlm.nih.gov/pubmed/36711453 http://dx.doi.org/10.1101/2023.01.13.524024 |
work_keys_str_mv | AT brunatomas genemarketpautomaticgenefindingineukaryoticgenomesinconsistencywithextrinsicdata AT lomsadzealexandre genemarketpautomaticgenefindingineukaryoticgenomesinconsistencywithextrinsicdata AT borodovskymark genemarketpautomaticgenefindingineukaryoticgenomesinconsistencywithextrinsicdata |