Cargando…

An optimized approach for annotation of large eukaryotic genomic sequences using genetic algorithm

BACKGROUND: Detection of important functional and/or structural elements and identification of their positions in a large eukaryotic genomic sequence are an active research area. Gene is an important functional and structural unit of DNA. The computation of gene prediction is, therefore, very essent...

Descripción completa

Detalles Bibliográficos
Autores principales: Chowdhury, Biswanath, Garai, Arnav, Garai, Gautam
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5655831/
https://www.ncbi.nlm.nih.gov/pubmed/29065853
http://dx.doi.org/10.1186/s12859-017-1874-7
_version_ 1783273611405033472
author Chowdhury, Biswanath
Garai, Arnav
Garai, Gautam
author_facet Chowdhury, Biswanath
Garai, Arnav
Garai, Gautam
author_sort Chowdhury, Biswanath
collection PubMed
description BACKGROUND: Detection of important functional and/or structural elements and identification of their positions in a large eukaryotic genomic sequence are an active research area. Gene is an important functional and structural unit of DNA. The computation of gene prediction is, therefore, very essential for detailed genome annotation. RESULTS: In this paper, we propose a new gene prediction technique based on Genetic Algorithm (GA) to determine the optimal positions of exons of a gene in a chromosome or genome. The correct identification of the coding and non-coding regions is difficult and computationally demanding. The proposed genetic-based method, named Gene Prediction with Genetic Algorithm (GPGA), reduces this problem by searching only one exon at a time instead of all exons along with its introns. This representation carries a significant advantage in that it breaks the entire gene-finding problem into a number of smaller sub-problems, thereby reducing the computational complexity. We tested the performance of the GPGA with existing benchmark datasets and compared the results with well-known and relevant techniques. The comparison shows the better or comparable performance of the proposed method. We also used GPGA for annotating the human chromosome 21 (HS21) using cross-species comparisons with the mouse orthologs. CONCLUSION: It was noted that the GPGA predicted true genes with better accuracy than other well-known approaches. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-017-1874-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5655831
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-56558312017-10-31 An optimized approach for annotation of large eukaryotic genomic sequences using genetic algorithm Chowdhury, Biswanath Garai, Arnav Garai, Gautam BMC Bioinformatics Methodology Article BACKGROUND: Detection of important functional and/or structural elements and identification of their positions in a large eukaryotic genomic sequence are an active research area. Gene is an important functional and structural unit of DNA. The computation of gene prediction is, therefore, very essential for detailed genome annotation. RESULTS: In this paper, we propose a new gene prediction technique based on Genetic Algorithm (GA) to determine the optimal positions of exons of a gene in a chromosome or genome. The correct identification of the coding and non-coding regions is difficult and computationally demanding. The proposed genetic-based method, named Gene Prediction with Genetic Algorithm (GPGA), reduces this problem by searching only one exon at a time instead of all exons along with its introns. This representation carries a significant advantage in that it breaks the entire gene-finding problem into a number of smaller sub-problems, thereby reducing the computational complexity. We tested the performance of the GPGA with existing benchmark datasets and compared the results with well-known and relevant techniques. The comparison shows the better or comparable performance of the proposed method. We also used GPGA for annotating the human chromosome 21 (HS21) using cross-species comparisons with the mouse orthologs. CONCLUSION: It was noted that the GPGA predicted true genes with better accuracy than other well-known approaches. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-017-1874-7) contains supplementary material, which is available to authorized users. BioMed Central 2017-10-24 /pmc/articles/PMC5655831/ /pubmed/29065853 http://dx.doi.org/10.1186/s12859-017-1874-7 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Chowdhury, Biswanath
Garai, Arnav
Garai, Gautam
An optimized approach for annotation of large eukaryotic genomic sequences using genetic algorithm
title An optimized approach for annotation of large eukaryotic genomic sequences using genetic algorithm
title_full An optimized approach for annotation of large eukaryotic genomic sequences using genetic algorithm
title_fullStr An optimized approach for annotation of large eukaryotic genomic sequences using genetic algorithm
title_full_unstemmed An optimized approach for annotation of large eukaryotic genomic sequences using genetic algorithm
title_short An optimized approach for annotation of large eukaryotic genomic sequences using genetic algorithm
title_sort optimized approach for annotation of large eukaryotic genomic sequences using genetic algorithm
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5655831/
https://www.ncbi.nlm.nih.gov/pubmed/29065853
http://dx.doi.org/10.1186/s12859-017-1874-7
work_keys_str_mv AT chowdhurybiswanath anoptimizedapproachforannotationoflargeeukaryoticgenomicsequencesusinggeneticalgorithm
AT garaiarnav anoptimizedapproachforannotationoflargeeukaryoticgenomicsequencesusinggeneticalgorithm
AT garaigautam anoptimizedapproachforannotationoflargeeukaryoticgenomicsequencesusinggeneticalgorithm
AT chowdhurybiswanath optimizedapproachforannotationoflargeeukaryoticgenomicsequencesusinggeneticalgorithm
AT garaiarnav optimizedapproachforannotationoflargeeukaryoticgenomicsequencesusinggeneticalgorithm
AT garaigautam optimizedapproachforannotationoflargeeukaryoticgenomicsequencesusinggeneticalgorithm