Cargando…

A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms

BACKGROUND: The draft genome assemblies produced by new sequencing technologies present important challenges for automatic gene prediction pipelines, leading to less accurate gene models. New benchmark methods are needed to evaluate the accuracy of gene prediction methods in the face of incomplete g...

Descripción completa

Detalles Bibliográficos
Autores principales: Scalzitti, Nicolas, Jeannin-Girardon, Anne, Collet, Pierre, Poch, Olivier, Thompson, Julie D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7147072/
https://www.ncbi.nlm.nih.gov/pubmed/32272892
http://dx.doi.org/10.1186/s12864-020-6707-9
_version_ 1783520346861731840
author Scalzitti, Nicolas
Jeannin-Girardon, Anne
Collet, Pierre
Poch, Olivier
Thompson, Julie D.
author_facet Scalzitti, Nicolas
Jeannin-Girardon, Anne
Collet, Pierre
Poch, Olivier
Thompson, Julie D.
author_sort Scalzitti, Nicolas
collection PubMed
description BACKGROUND: The draft genome assemblies produced by new sequencing technologies present important challenges for automatic gene prediction pipelines, leading to less accurate gene models. New benchmark methods are needed to evaluate the accuracy of gene prediction methods in the face of incomplete genome assemblies, low genome coverage and quality, complex gene structures, or a lack of suitable sequences for evidence-based annotations. RESULTS: We describe the construction of a new benchmark, called G3PO (benchmark for Gene and Protein Prediction PrOgrams), designed to represent many of the typical challenges faced by current genome annotation projects. The benchmark is based on a carefully validated and curated set of real eukaryotic genes from 147 phylogenetically disperse organisms, and a number of test sets are defined to evaluate the effects of different features, including genome sequence quality, gene structure complexity, protein length, etc. We used the benchmark to perform an independent comparative analysis of the most widely used ab initio gene prediction programs and identified the main strengths and weaknesses of the programs. More importantly, we highlight a number of features that could be exploited in order to improve the accuracy of current prediction tools. CONCLUSIONS: The experiments showed that ab initio gene structure prediction is a very challenging task, which should be further investigated. We believe that the baseline results associated with the complex gene test sets in G3PO provide useful guidelines for future studies.
format Online
Article
Text
id pubmed-7147072
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-71470722020-04-18 A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms Scalzitti, Nicolas Jeannin-Girardon, Anne Collet, Pierre Poch, Olivier Thompson, Julie D. BMC Genomics Research Article BACKGROUND: The draft genome assemblies produced by new sequencing technologies present important challenges for automatic gene prediction pipelines, leading to less accurate gene models. New benchmark methods are needed to evaluate the accuracy of gene prediction methods in the face of incomplete genome assemblies, low genome coverage and quality, complex gene structures, or a lack of suitable sequences for evidence-based annotations. RESULTS: We describe the construction of a new benchmark, called G3PO (benchmark for Gene and Protein Prediction PrOgrams), designed to represent many of the typical challenges faced by current genome annotation projects. The benchmark is based on a carefully validated and curated set of real eukaryotic genes from 147 phylogenetically disperse organisms, and a number of test sets are defined to evaluate the effects of different features, including genome sequence quality, gene structure complexity, protein length, etc. We used the benchmark to perform an independent comparative analysis of the most widely used ab initio gene prediction programs and identified the main strengths and weaknesses of the programs. More importantly, we highlight a number of features that could be exploited in order to improve the accuracy of current prediction tools. CONCLUSIONS: The experiments showed that ab initio gene structure prediction is a very challenging task, which should be further investigated. We believe that the baseline results associated with the complex gene test sets in G3PO provide useful guidelines for future studies. BioMed Central 2020-04-09 /pmc/articles/PMC7147072/ /pubmed/32272892 http://dx.doi.org/10.1186/s12864-020-6707-9 Text en © The Author(s). 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Scalzitti, Nicolas
Jeannin-Girardon, Anne
Collet, Pierre
Poch, Olivier
Thompson, Julie D.
A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms
title A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms
title_full A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms
title_fullStr A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms
title_full_unstemmed A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms
title_short A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms
title_sort benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7147072/
https://www.ncbi.nlm.nih.gov/pubmed/32272892
http://dx.doi.org/10.1186/s12864-020-6707-9
work_keys_str_mv AT scalzittinicolas abenchmarkstudyofabinitiogenepredictionmethodsindiverseeukaryoticorganisms
AT jeanningirardonanne abenchmarkstudyofabinitiogenepredictionmethodsindiverseeukaryoticorganisms
AT colletpierre abenchmarkstudyofabinitiogenepredictionmethodsindiverseeukaryoticorganisms
AT pocholivier abenchmarkstudyofabinitiogenepredictionmethodsindiverseeukaryoticorganisms
AT thompsonjulied abenchmarkstudyofabinitiogenepredictionmethodsindiverseeukaryoticorganisms
AT scalzittinicolas benchmarkstudyofabinitiogenepredictionmethodsindiverseeukaryoticorganisms
AT jeanningirardonanne benchmarkstudyofabinitiogenepredictionmethodsindiverseeukaryoticorganisms
AT colletpierre benchmarkstudyofabinitiogenepredictionmethodsindiverseeukaryoticorganisms
AT pocholivier benchmarkstudyofabinitiogenepredictionmethodsindiverseeukaryoticorganisms
AT thompsonjulied benchmarkstudyofabinitiogenepredictionmethodsindiverseeukaryoticorganisms