Cargando…

n-Gram characterization of genomic islands in bacterial genomes

The paper presents a novel, n-gram-based method for analysis of bacterial genome segments known as genomic islands (GIs). Identification of GIs in bacterial genomes is an important task since many of them represent inserts that may contribute to bacterial evolution and pathogenesis. In order to char...

Descripción completa

Detalles Bibliográficos
Autores principales: Pavlović-Lažetić, Gordana M., Mitić, Nenad S., Beljanski, Miloš V.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier Ireland Ltd. 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7185697/
https://www.ncbi.nlm.nih.gov/pubmed/19101056
http://dx.doi.org/10.1016/j.cmpb.2008.10.014
_version_ 1783526810527465472
author Pavlović-Lažetić, Gordana M.
Mitić, Nenad S.
Beljanski, Miloš V.
author_facet Pavlović-Lažetić, Gordana M.
Mitić, Nenad S.
Beljanski, Miloš V.
author_sort Pavlović-Lažetić, Gordana M.
collection PubMed
description The paper presents a novel, n-gram-based method for analysis of bacterial genome segments known as genomic islands (GIs). Identification of GIs in bacterial genomes is an important task since many of them represent inserts that may contribute to bacterial evolution and pathogenesis. In order to characterize and distinguish GIs from rest of the genome, binary classification of islands based on n-gram frequency distribution have been performed. It consists of testing the agreement of islands n-gram frequency distributions with the complete genome and backbone sequence. In addition, a statistic based on the maximal order Markov model is used to identify significantly overrepresented and underrepresented n-grams in islands. The results may be used as a basis for Zipf-like analysis suggesting that some of the n-grams are overrepresented in a subset of islands and underrepresented in the backbone, or vice versa, thus complementing the binary classification. The method is applied to strain-specific regions in the Escherichia coli O157:H7 EDL933 genome (O-islands), resulting in two groups of O-islands with different n-gram characteristics. It refines a characterization based on other compositional features such as G + C content and codon usage, and may help in identification of GIs, and also in research and development of adequate drugs targeting virulence genes in them.
format Online
Article
Text
id pubmed-7185697
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Elsevier Ireland Ltd.
record_format MEDLINE/PubMed
spelling pubmed-71856972020-04-28 n-Gram characterization of genomic islands in bacterial genomes Pavlović-Lažetić, Gordana M. Mitić, Nenad S. Beljanski, Miloš V. Comput Methods Programs Biomed Article The paper presents a novel, n-gram-based method for analysis of bacterial genome segments known as genomic islands (GIs). Identification of GIs in bacterial genomes is an important task since many of them represent inserts that may contribute to bacterial evolution and pathogenesis. In order to characterize and distinguish GIs from rest of the genome, binary classification of islands based on n-gram frequency distribution have been performed. It consists of testing the agreement of islands n-gram frequency distributions with the complete genome and backbone sequence. In addition, a statistic based on the maximal order Markov model is used to identify significantly overrepresented and underrepresented n-grams in islands. The results may be used as a basis for Zipf-like analysis suggesting that some of the n-grams are overrepresented in a subset of islands and underrepresented in the backbone, or vice versa, thus complementing the binary classification. The method is applied to strain-specific regions in the Escherichia coli O157:H7 EDL933 genome (O-islands), resulting in two groups of O-islands with different n-gram characteristics. It refines a characterization based on other compositional features such as G + C content and codon usage, and may help in identification of GIs, and also in research and development of adequate drugs targeting virulence genes in them. Elsevier Ireland Ltd. 2009-03 2008-12-19 /pmc/articles/PMC7185697/ /pubmed/19101056 http://dx.doi.org/10.1016/j.cmpb.2008.10.014 Text en Copyright © 2008 Elsevier Ireland Ltd. All rights reserved. Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.
spellingShingle Article
Pavlović-Lažetić, Gordana M.
Mitić, Nenad S.
Beljanski, Miloš V.
n-Gram characterization of genomic islands in bacterial genomes
title n-Gram characterization of genomic islands in bacterial genomes
title_full n-Gram characterization of genomic islands in bacterial genomes
title_fullStr n-Gram characterization of genomic islands in bacterial genomes
title_full_unstemmed n-Gram characterization of genomic islands in bacterial genomes
title_short n-Gram characterization of genomic islands in bacterial genomes
title_sort n-gram characterization of genomic islands in bacterial genomes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7185697/
https://www.ncbi.nlm.nih.gov/pubmed/19101056
http://dx.doi.org/10.1016/j.cmpb.2008.10.014
work_keys_str_mv AT pavloviclazeticgordanam ngramcharacterizationofgenomicislandsinbacterialgenomes
AT miticnenads ngramcharacterizationofgenomicislandsinbacterialgenomes
AT beljanskimilosv ngramcharacterizationofgenomicislandsinbacterialgenomes