Cargando…

GINGER: an integrated method for high-accuracy prediction of gene structure in higher eukaryotes at the gene and exon level

The prediction of gene structure within the genome sequence is the starting point of genome analysis, and its accuracy has a significant impact on the quality of subsequent analyses. Gene structure prediction is roughly divided into RNA-Seq-based methods, ab initio-based methods, homology-based meth...

Descripción completa

Detalles Bibliográficos
Autores principales: Taniguchi, Takeaki, Okuno, Miki, Shinoda, Takahiro, Kobayashi, Fumiya, Takahashi, Kazuki, Yuasa, Hideaki, Nakamura, Yuta, Tanaka, Hiroyuki, Kajitani, Rei, Itoh, Takehiko
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10439787/
https://www.ncbi.nlm.nih.gov/pubmed/37478310
http://dx.doi.org/10.1093/dnares/dsad017
_version_ 1785093027765682176
author Taniguchi, Takeaki
Okuno, Miki
Shinoda, Takahiro
Kobayashi, Fumiya
Takahashi, Kazuki
Yuasa, Hideaki
Nakamura, Yuta
Tanaka, Hiroyuki
Kajitani, Rei
Itoh, Takehiko
author_facet Taniguchi, Takeaki
Okuno, Miki
Shinoda, Takahiro
Kobayashi, Fumiya
Takahashi, Kazuki
Yuasa, Hideaki
Nakamura, Yuta
Tanaka, Hiroyuki
Kajitani, Rei
Itoh, Takehiko
author_sort Taniguchi, Takeaki
collection PubMed
description The prediction of gene structure within the genome sequence is the starting point of genome analysis, and its accuracy has a significant impact on the quality of subsequent analyses. Gene structure prediction is roughly divided into RNA-Seq-based methods, ab initio-based methods, homology-based methods, and the integration of individual prediction methods. Integrated methods are mainstream in recent genome projects because they improve prediction accuracy by combining or taking the best individual prediction findings; however, adequate prediction accuracy for eukaryotic species has not yet been achieved. Therefore, we developed an integrated tool, GINGER, that solves various issues related to gene structure prediction in higher eukaryotes. By handling artefacts in alignments of RNA and protein sequences, reconstructing gene structures via dynamic programming with appropriately weighted and scored exon/intron/intergenic regions, and applying different prediction processes and filtering criteria to multi-exon and single-exon genes, we achieved a significant improvement in accuracy compared to the existing integration methods. The feature of GINGER is its high prediction accuracy at the gene and exon levels, which is pronounced for species with more complex gene architectures. GINGER is implemented using Nextflow, which allows for the efficient and effective use of computing resources.
format Online
Article
Text
id pubmed-10439787
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-104397872023-08-20 GINGER: an integrated method for high-accuracy prediction of gene structure in higher eukaryotes at the gene and exon level Taniguchi, Takeaki Okuno, Miki Shinoda, Takahiro Kobayashi, Fumiya Takahashi, Kazuki Yuasa, Hideaki Nakamura, Yuta Tanaka, Hiroyuki Kajitani, Rei Itoh, Takehiko DNA Res Research Article The prediction of gene structure within the genome sequence is the starting point of genome analysis, and its accuracy has a significant impact on the quality of subsequent analyses. Gene structure prediction is roughly divided into RNA-Seq-based methods, ab initio-based methods, homology-based methods, and the integration of individual prediction methods. Integrated methods are mainstream in recent genome projects because they improve prediction accuracy by combining or taking the best individual prediction findings; however, adequate prediction accuracy for eukaryotic species has not yet been achieved. Therefore, we developed an integrated tool, GINGER, that solves various issues related to gene structure prediction in higher eukaryotes. By handling artefacts in alignments of RNA and protein sequences, reconstructing gene structures via dynamic programming with appropriately weighted and scored exon/intron/intergenic regions, and applying different prediction processes and filtering criteria to multi-exon and single-exon genes, we achieved a significant improvement in accuracy compared to the existing integration methods. The feature of GINGER is its high prediction accuracy at the gene and exon levels, which is pronounced for species with more complex gene architectures. GINGER is implemented using Nextflow, which allows for the efficient and effective use of computing resources. Oxford University Press 2023-07-21 /pmc/articles/PMC10439787/ /pubmed/37478310 http://dx.doi.org/10.1093/dnares/dsad017 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of Kazusa DNA Research Institute. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Research Article
Taniguchi, Takeaki
Okuno, Miki
Shinoda, Takahiro
Kobayashi, Fumiya
Takahashi, Kazuki
Yuasa, Hideaki
Nakamura, Yuta
Tanaka, Hiroyuki
Kajitani, Rei
Itoh, Takehiko
GINGER: an integrated method for high-accuracy prediction of gene structure in higher eukaryotes at the gene and exon level
title GINGER: an integrated method for high-accuracy prediction of gene structure in higher eukaryotes at the gene and exon level
title_full GINGER: an integrated method for high-accuracy prediction of gene structure in higher eukaryotes at the gene and exon level
title_fullStr GINGER: an integrated method for high-accuracy prediction of gene structure in higher eukaryotes at the gene and exon level
title_full_unstemmed GINGER: an integrated method for high-accuracy prediction of gene structure in higher eukaryotes at the gene and exon level
title_short GINGER: an integrated method for high-accuracy prediction of gene structure in higher eukaryotes at the gene and exon level
title_sort ginger: an integrated method for high-accuracy prediction of gene structure in higher eukaryotes at the gene and exon level
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10439787/
https://www.ncbi.nlm.nih.gov/pubmed/37478310
http://dx.doi.org/10.1093/dnares/dsad017
work_keys_str_mv AT taniguchitakeaki gingeranintegratedmethodforhighaccuracypredictionofgenestructureinhighereukaryotesatthegeneandexonlevel
AT okunomiki gingeranintegratedmethodforhighaccuracypredictionofgenestructureinhighereukaryotesatthegeneandexonlevel
AT shinodatakahiro gingeranintegratedmethodforhighaccuracypredictionofgenestructureinhighereukaryotesatthegeneandexonlevel
AT kobayashifumiya gingeranintegratedmethodforhighaccuracypredictionofgenestructureinhighereukaryotesatthegeneandexonlevel
AT takahashikazuki gingeranintegratedmethodforhighaccuracypredictionofgenestructureinhighereukaryotesatthegeneandexonlevel
AT yuasahideaki gingeranintegratedmethodforhighaccuracypredictionofgenestructureinhighereukaryotesatthegeneandexonlevel
AT nakamurayuta gingeranintegratedmethodforhighaccuracypredictionofgenestructureinhighereukaryotesatthegeneandexonlevel
AT tanakahiroyuki gingeranintegratedmethodforhighaccuracypredictionofgenestructureinhighereukaryotesatthegeneandexonlevel
AT kajitanirei gingeranintegratedmethodforhighaccuracypredictionofgenestructureinhighereukaryotesatthegeneandexonlevel
AT itohtakehiko gingeranintegratedmethodforhighaccuracypredictionofgenestructureinhighereukaryotesatthegeneandexonlevel