Cargando…

Use of Average Mutual Information and Derived Measures to Find Coding Regions

One of the important steps in the annotation of genomes is the identification of regions in the genome which code for proteins. One of the tools used by most annotation approaches is the use of signals extracted from genomic regions that can be used to identify whether the region is a protein coding...

Descripción completa

Detalles Bibliográficos
Autores principales: Newcomb, Garin, Sayood, Khalid
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8534840/
https://www.ncbi.nlm.nih.gov/pubmed/34682048
http://dx.doi.org/10.3390/e23101324
_version_ 1784587640709840896
author Newcomb, Garin
Sayood, Khalid
author_facet Newcomb, Garin
Sayood, Khalid
author_sort Newcomb, Garin
collection PubMed
description One of the important steps in the annotation of genomes is the identification of regions in the genome which code for proteins. One of the tools used by most annotation approaches is the use of signals extracted from genomic regions that can be used to identify whether the region is a protein coding region. Motivated by the fact that these regions are information bearing structures we propose signals based on measures motivated by the average mutual information for use in this task. We show that these signals can be used to identify coding and noncoding sequences with high accuracy. We also show that these signals are robust across species, phyla, and kingdom and can, therefore, be used in species agnostic genome annotation algorithms for identifying protein coding regions. These in turn could be used for gene identification.
format Online
Article
Text
id pubmed-8534840
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-85348402021-10-23 Use of Average Mutual Information and Derived Measures to Find Coding Regions Newcomb, Garin Sayood, Khalid Entropy (Basel) Article One of the important steps in the annotation of genomes is the identification of regions in the genome which code for proteins. One of the tools used by most annotation approaches is the use of signals extracted from genomic regions that can be used to identify whether the region is a protein coding region. Motivated by the fact that these regions are information bearing structures we propose signals based on measures motivated by the average mutual information for use in this task. We show that these signals can be used to identify coding and noncoding sequences with high accuracy. We also show that these signals are robust across species, phyla, and kingdom and can, therefore, be used in species agnostic genome annotation algorithms for identifying protein coding regions. These in turn could be used for gene identification. MDPI 2021-10-11 /pmc/articles/PMC8534840/ /pubmed/34682048 http://dx.doi.org/10.3390/e23101324 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Newcomb, Garin
Sayood, Khalid
Use of Average Mutual Information and Derived Measures to Find Coding Regions
title Use of Average Mutual Information and Derived Measures to Find Coding Regions
title_full Use of Average Mutual Information and Derived Measures to Find Coding Regions
title_fullStr Use of Average Mutual Information and Derived Measures to Find Coding Regions
title_full_unstemmed Use of Average Mutual Information and Derived Measures to Find Coding Regions
title_short Use of Average Mutual Information and Derived Measures to Find Coding Regions
title_sort use of average mutual information and derived measures to find coding regions
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8534840/
https://www.ncbi.nlm.nih.gov/pubmed/34682048
http://dx.doi.org/10.3390/e23101324
work_keys_str_mv AT newcombgarin useofaveragemutualinformationandderivedmeasurestofindcodingregions
AT sayoodkhalid useofaveragemutualinformationandderivedmeasurestofindcodingregions