Cargando…

Context-driven discovery of gene cassettes in mobile integrons using a computational grammar

BACKGROUND: Gene discovery algorithms typically examine sequence data for low level patterns. A novel method to computationally discover higher order DNA structures is presented, using a context sensitive grammar. The algorithm was applied to the discovery of gene cassettes associated with integrons...

Descripción completa

Detalles Bibliográficos
Autores principales: Tsafnat, Guy, Coiera, Enrico, Partridge, Sally R, Schaeffer, Jaron, Iredell, Jon R
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3087341/
https://www.ncbi.nlm.nih.gov/pubmed/19735578
http://dx.doi.org/10.1186/1471-2105-10-281
_version_ 1782202769321492480
author Tsafnat, Guy
Coiera, Enrico
Partridge, Sally R
Schaeffer, Jaron
Iredell, Jon R
author_facet Tsafnat, Guy
Coiera, Enrico
Partridge, Sally R
Schaeffer, Jaron
Iredell, Jon R
author_sort Tsafnat, Guy
collection PubMed
description BACKGROUND: Gene discovery algorithms typically examine sequence data for low level patterns. A novel method to computationally discover higher order DNA structures is presented, using a context sensitive grammar. The algorithm was applied to the discovery of gene cassettes associated with integrons. The discovery and annotation of antibiotic resistance genes in such cassettes is essential for effective monitoring of antibiotic resistance patterns and formulation of public health antibiotic prescription policies. RESULTS: We discovered two new putative gene cassettes using the method, from 276 integron features and 978 GenBank sequences. The system achieved κ = 0.972 annotation agreement with an expert gold standard of 300 sequences. In rediscovery experiments, we deleted 789,196 cassette instances over 2030 experiments and correctly relabelled 85.6% (α ≥ 95%, E ≤ 1%, mean sensitivity = 0.86, specificity = 1, F-score = 0.93), with no false positives. Error analysis demonstrated that for 72,338 missed deletions, two adjacent deleted cassettes were labeled as a single cassette, increasing performance to 94.8% (mean sensitivity = 0.92, specificity = 1, F-score = 0.96). CONCLUSION: Using grammars we were able to represent heuristic background knowledge about large and complex structures in DNA. Importantly, we were also able to use the context embedded in the model to discover new putative antibiotic resistance gene cassettes. The method is complementary to existing automatic annotation systems which operate at the sequence level.
format Text
id pubmed-3087341
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30873412011-05-05 Context-driven discovery of gene cassettes in mobile integrons using a computational grammar Tsafnat, Guy Coiera, Enrico Partridge, Sally R Schaeffer, Jaron Iredell, Jon R BMC Bioinformatics Methodology Article BACKGROUND: Gene discovery algorithms typically examine sequence data for low level patterns. A novel method to computationally discover higher order DNA structures is presented, using a context sensitive grammar. The algorithm was applied to the discovery of gene cassettes associated with integrons. The discovery and annotation of antibiotic resistance genes in such cassettes is essential for effective monitoring of antibiotic resistance patterns and formulation of public health antibiotic prescription policies. RESULTS: We discovered two new putative gene cassettes using the method, from 276 integron features and 978 GenBank sequences. The system achieved κ = 0.972 annotation agreement with an expert gold standard of 300 sequences. In rediscovery experiments, we deleted 789,196 cassette instances over 2030 experiments and correctly relabelled 85.6% (α ≥ 95%, E ≤ 1%, mean sensitivity = 0.86, specificity = 1, F-score = 0.93), with no false positives. Error analysis demonstrated that for 72,338 missed deletions, two adjacent deleted cassettes were labeled as a single cassette, increasing performance to 94.8% (mean sensitivity = 0.92, specificity = 1, F-score = 0.96). CONCLUSION: Using grammars we were able to represent heuristic background knowledge about large and complex structures in DNA. Importantly, we were also able to use the context embedded in the model to discover new putative antibiotic resistance gene cassettes. The method is complementary to existing automatic annotation systems which operate at the sequence level. BioMed Central 2009-09-08 /pmc/articles/PMC3087341/ /pubmed/19735578 http://dx.doi.org/10.1186/1471-2105-10-281 Text en Copyright ©2009 Tsafnat et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Tsafnat, Guy
Coiera, Enrico
Partridge, Sally R
Schaeffer, Jaron
Iredell, Jon R
Context-driven discovery of gene cassettes in mobile integrons using a computational grammar
title Context-driven discovery of gene cassettes in mobile integrons using a computational grammar
title_full Context-driven discovery of gene cassettes in mobile integrons using a computational grammar
title_fullStr Context-driven discovery of gene cassettes in mobile integrons using a computational grammar
title_full_unstemmed Context-driven discovery of gene cassettes in mobile integrons using a computational grammar
title_short Context-driven discovery of gene cassettes in mobile integrons using a computational grammar
title_sort context-driven discovery of gene cassettes in mobile integrons using a computational grammar
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3087341/
https://www.ncbi.nlm.nih.gov/pubmed/19735578
http://dx.doi.org/10.1186/1471-2105-10-281
work_keys_str_mv AT tsafnatguy contextdrivendiscoveryofgenecassettesinmobileintegronsusingacomputationalgrammar
AT coieraenrico contextdrivendiscoveryofgenecassettesinmobileintegronsusingacomputationalgrammar
AT partridgesallyr contextdrivendiscoveryofgenecassettesinmobileintegronsusingacomputationalgrammar
AT schaefferjaron contextdrivendiscoveryofgenecassettesinmobileintegronsusingacomputationalgrammar
AT iredelljonr contextdrivendiscoveryofgenecassettesinmobileintegronsusingacomputationalgrammar