Cargando…
GeneMark-ETP: Automatic Gene Finding in Eukaryotic Genomes in Consistency with Extrinsic Data
New large scale initiatives, such as the Earth BioGenome Project, require efficient automatic tools for eukaryotic genome annotation. A new automatic tool, GeneMark-ETP, presented here, finds genes by integration of genomic-, transcriptomic- and protein-derived evidence. GeneMark-ETP first identifie...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9882169/ https://www.ncbi.nlm.nih.gov/pubmed/36711453 http://dx.doi.org/10.1101/2023.01.13.524024 |
Sumario: | New large scale initiatives, such as the Earth BioGenome Project, require efficient automatic tools for eukaryotic genome annotation. A new automatic tool, GeneMark-ETP, presented here, finds genes by integration of genomic-, transcriptomic- and protein-derived evidence. GeneMark-ETP first identifies genomic loci where extrinsic data is sufficient for gene prediction with ‘high confidence’ and then proceeds with finding the remaining genes across the whole genome. The initial set of parameters of the statistical model is estimated on the training set made from the high confidence genes. Subsequently, the model parameters are iteratively updated in the cycles of gene prediction and parameter re-estimation. Upon reaching convergence GeneMark-ETP makes the final prediction of the whole complement of genes. The algorithm development was made with a focus on large plant and animal genomes. GeneMark-ETP performance was compared favorably with the ones of the gene finders using a single type of extrinsic evidence delivered by either short RNA reads (GeneMark-ET), or by mapped to genome homologous proteins (GeneMark-EP+). These outcomes could be expected. Moreover, comparisons were made with the pipelines utilizing both transcript- and protein-derived extrinsic evidence. For these experiments we have chosen TSEBRA, combining BRAKER1 and BRAKER2, as well as MAKER2. The results demonstrated that GeneMark-ETP delivered state-of-the-art gene prediction accuracy with a large margin of improvement in large eukaryotic genomes. |
---|