Cargando…
Ab initio gene prediction for protein-coding regions
MOTIVATION: Ab initio gene prediction in nonmodel organisms is a difficult task. While many ab initio methods have been developed, their average accuracy over long segments of a genome, and especially when assessed over a wide range of species, generally yields results with sensitivity and specifici...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10448985/ https://www.ncbi.nlm.nih.gov/pubmed/37638212 http://dx.doi.org/10.1093/bioadv/vbad105 |
_version_ | 1785094847954157568 |
---|---|
author | Baker, Lonnie David, Charles Jacobs, Donald J |
author_facet | Baker, Lonnie David, Charles Jacobs, Donald J |
author_sort | Baker, Lonnie |
collection | PubMed |
description | MOTIVATION: Ab initio gene prediction in nonmodel organisms is a difficult task. While many ab initio methods have been developed, their average accuracy over long segments of a genome, and especially when assessed over a wide range of species, generally yields results with sensitivity and specificity levels in the low 60% range. A common weakness of most methods is the tendency to learn patterns that are species-specific to varying degrees. The need exists for methods to extract genetic features that can distinguish coding and noncoding regions that are not sensitive to specific organism characteristics. RESULTS: A new method based on a neural network (NN) that uses a collection of sensors to create input features is presented. It is shown that accurate predictions are achieved even when trained on organisms that are significantly different phylogenetically than test organisms. A consensus prediction algorithm for a CoDing Sequence (CDS) is subsequently applied to the first nucleotide level of NN predictions that boosts accuracy through a data-driven procedure that optimizes a CDS/non-CDS threshold. An aggregate accuracy benchmark at the nucleotide level shows that this new approach performs better than existing ab initio methods, while requiring significantly less training data. AVAILABILITY AND IMPLEMENTATION: https://github.com/BioMolecularPhysicsGroup-UNCC/MachineLearning. |
format | Online Article Text |
id | pubmed-10448985 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-104489852023-08-25 Ab initio gene prediction for protein-coding regions Baker, Lonnie David, Charles Jacobs, Donald J Bioinform Adv Original Article MOTIVATION: Ab initio gene prediction in nonmodel organisms is a difficult task. While many ab initio methods have been developed, their average accuracy over long segments of a genome, and especially when assessed over a wide range of species, generally yields results with sensitivity and specificity levels in the low 60% range. A common weakness of most methods is the tendency to learn patterns that are species-specific to varying degrees. The need exists for methods to extract genetic features that can distinguish coding and noncoding regions that are not sensitive to specific organism characteristics. RESULTS: A new method based on a neural network (NN) that uses a collection of sensors to create input features is presented. It is shown that accurate predictions are achieved even when trained on organisms that are significantly different phylogenetically than test organisms. A consensus prediction algorithm for a CoDing Sequence (CDS) is subsequently applied to the first nucleotide level of NN predictions that boosts accuracy through a data-driven procedure that optimizes a CDS/non-CDS threshold. An aggregate accuracy benchmark at the nucleotide level shows that this new approach performs better than existing ab initio methods, while requiring significantly less training data. AVAILABILITY AND IMPLEMENTATION: https://github.com/BioMolecularPhysicsGroup-UNCC/MachineLearning. Oxford University Press 2023-08-10 /pmc/articles/PMC10448985/ /pubmed/37638212 http://dx.doi.org/10.1093/bioadv/vbad105 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Article Baker, Lonnie David, Charles Jacobs, Donald J Ab initio gene prediction for protein-coding regions |
title |
Ab initio gene prediction for protein-coding regions |
title_full |
Ab initio gene prediction for protein-coding regions |
title_fullStr |
Ab initio gene prediction for protein-coding regions |
title_full_unstemmed |
Ab initio gene prediction for protein-coding regions |
title_short |
Ab initio gene prediction for protein-coding regions |
title_sort | ab initio gene prediction for protein-coding regions |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10448985/ https://www.ncbi.nlm.nih.gov/pubmed/37638212 http://dx.doi.org/10.1093/bioadv/vbad105 |
work_keys_str_mv | AT bakerlonnie abinitiogenepredictionforproteincodingregions AT davidcharles abinitiogenepredictionforproteincodingregions AT jacobsdonaldj abinitiogenepredictionforproteincodingregions |