Cargando…

Ab initio gene prediction for protein-coding regions

MOTIVATION: Ab initio gene prediction in nonmodel organisms is a difficult task. While many ab initio methods have been developed, their average accuracy over long segments of a genome, and especially when assessed over a wide range of species, generally yields results with sensitivity and specifici...

Descripción completa

Detalles Bibliográficos
Autores principales: Baker, Lonnie, David, Charles, Jacobs, Donald J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10448985/
https://www.ncbi.nlm.nih.gov/pubmed/37638212
http://dx.doi.org/10.1093/bioadv/vbad105
_version_ 1785094847954157568
author Baker, Lonnie
David, Charles
Jacobs, Donald J
author_facet Baker, Lonnie
David, Charles
Jacobs, Donald J
author_sort Baker, Lonnie
collection PubMed
description MOTIVATION: Ab initio gene prediction in nonmodel organisms is a difficult task. While many ab initio methods have been developed, their average accuracy over long segments of a genome, and especially when assessed over a wide range of species, generally yields results with sensitivity and specificity levels in the low 60% range. A common weakness of most methods is the tendency to learn patterns that are species-specific to varying degrees. The need exists for methods to extract genetic features that can distinguish coding and noncoding regions that are not sensitive to specific organism characteristics. RESULTS: A new method based on a neural network (NN) that uses a collection of sensors to create input features is presented. It is shown that accurate predictions are achieved even when trained on organisms that are significantly different phylogenetically than test organisms. A consensus prediction algorithm for a CoDing Sequence (CDS) is subsequently applied to the first nucleotide level of NN predictions that boosts accuracy through a data-driven procedure that optimizes a CDS/non-CDS threshold. An aggregate accuracy benchmark at the nucleotide level shows that this new approach performs better than existing ab initio methods, while requiring significantly less training data. AVAILABILITY AND IMPLEMENTATION: https://github.com/BioMolecularPhysicsGroup-UNCC/MachineLearning.
format Online
Article
Text
id pubmed-10448985
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-104489852023-08-25 Ab initio gene prediction for protein-coding regions Baker, Lonnie David, Charles Jacobs, Donald J Bioinform Adv Original Article MOTIVATION: Ab initio gene prediction in nonmodel organisms is a difficult task. While many ab initio methods have been developed, their average accuracy over long segments of a genome, and especially when assessed over a wide range of species, generally yields results with sensitivity and specificity levels in the low 60% range. A common weakness of most methods is the tendency to learn patterns that are species-specific to varying degrees. The need exists for methods to extract genetic features that can distinguish coding and noncoding regions that are not sensitive to specific organism characteristics. RESULTS: A new method based on a neural network (NN) that uses a collection of sensors to create input features is presented. It is shown that accurate predictions are achieved even when trained on organisms that are significantly different phylogenetically than test organisms. A consensus prediction algorithm for a CoDing Sequence (CDS) is subsequently applied to the first nucleotide level of NN predictions that boosts accuracy through a data-driven procedure that optimizes a CDS/non-CDS threshold. An aggregate accuracy benchmark at the nucleotide level shows that this new approach performs better than existing ab initio methods, while requiring significantly less training data. AVAILABILITY AND IMPLEMENTATION: https://github.com/BioMolecularPhysicsGroup-UNCC/MachineLearning. Oxford University Press 2023-08-10 /pmc/articles/PMC10448985/ /pubmed/37638212 http://dx.doi.org/10.1093/bioadv/vbad105 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Baker, Lonnie
David, Charles
Jacobs, Donald J
Ab initio gene prediction for protein-coding regions
title Ab initio gene prediction for protein-coding regions
title_full Ab initio gene prediction for protein-coding regions
title_fullStr Ab initio gene prediction for protein-coding regions
title_full_unstemmed Ab initio gene prediction for protein-coding regions
title_short Ab initio gene prediction for protein-coding regions
title_sort ab initio gene prediction for protein-coding regions
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10448985/
https://www.ncbi.nlm.nih.gov/pubmed/37638212
http://dx.doi.org/10.1093/bioadv/vbad105
work_keys_str_mv AT bakerlonnie abinitiogenepredictionforproteincodingregions
AT davidcharles abinitiogenepredictionforproteincodingregions
AT jacobsdonaldj abinitiogenepredictionforproteincodingregions