Cargando…

Evaluation of genomic island predictors using a comparative genomics approach

BACKGROUND: Genomic islands (GIs) are clusters of genes in prokaryotic genomes of probable horizontal origin. GIs are disproportionately associated with microbial adaptations of medical or environmental interest. Recently, multiple programs for automated detection of GIs have been developed that uti...

Descripción completa

Detalles Bibliográficos
Autores principales: Langille, Morgan GI, Hsiao, William WL, Brinkman, Fiona SL
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2518932/
https://www.ncbi.nlm.nih.gov/pubmed/18680607
http://dx.doi.org/10.1186/1471-2105-9-329
_version_ 1782158614856728576
author Langille, Morgan GI
Hsiao, William WL
Brinkman, Fiona SL
author_facet Langille, Morgan GI
Hsiao, William WL
Brinkman, Fiona SL
author_sort Langille, Morgan GI
collection PubMed
description BACKGROUND: Genomic islands (GIs) are clusters of genes in prokaryotic genomes of probable horizontal origin. GIs are disproportionately associated with microbial adaptations of medical or environmental interest. Recently, multiple programs for automated detection of GIs have been developed that utilize sequence composition characteristics, such as G+C ratio and dinucleotide bias. To robustly evaluate the accuracy of such methods, we propose that a dataset of GIs be constructed using criteria that are independent of sequence composition-based analysis approaches. RESULTS: We developed a comparative genomics approach (IslandPick) that identifies both very probable islands and non-island regions. The approach involves 1) flexible, automated selection of comparative genomes for each query genome, using a distance function that picks appropriate genomes for identification of GIs, 2) identification of regions unique to the query genome, compared with the chosen genomes (positive dataset) and 3) identification of regions conserved across all genomes (negative dataset). Using our constructed datasets, we investigated the accuracy of several sequence composition-based GI prediction tools. CONCLUSION: Our results indicate that AlienHunter has the highest recall, but the lowest measured precision, while SIGI-HMM is the most precise method. SIGI-HMM and IslandPath/DIMOB have comparable overall highest accuracy. Our comparative genomics approach, IslandPick, was the most accurate, compared with a curated list of GIs, indicating that we have constructed suitable datasets. This represents the first evaluation, using diverse and, independent datasets that were not artificially constructed, of the accuracy of several sequence composition-based GI predictors. The caveats associated with this analysis and proposals for optimal island prediction are discussed.
format Text
id pubmed-2518932
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-25189322008-08-22 Evaluation of genomic island predictors using a comparative genomics approach Langille, Morgan GI Hsiao, William WL Brinkman, Fiona SL BMC Bioinformatics Research Article BACKGROUND: Genomic islands (GIs) are clusters of genes in prokaryotic genomes of probable horizontal origin. GIs are disproportionately associated with microbial adaptations of medical or environmental interest. Recently, multiple programs for automated detection of GIs have been developed that utilize sequence composition characteristics, such as G+C ratio and dinucleotide bias. To robustly evaluate the accuracy of such methods, we propose that a dataset of GIs be constructed using criteria that are independent of sequence composition-based analysis approaches. RESULTS: We developed a comparative genomics approach (IslandPick) that identifies both very probable islands and non-island regions. The approach involves 1) flexible, automated selection of comparative genomes for each query genome, using a distance function that picks appropriate genomes for identification of GIs, 2) identification of regions unique to the query genome, compared with the chosen genomes (positive dataset) and 3) identification of regions conserved across all genomes (negative dataset). Using our constructed datasets, we investigated the accuracy of several sequence composition-based GI prediction tools. CONCLUSION: Our results indicate that AlienHunter has the highest recall, but the lowest measured precision, while SIGI-HMM is the most precise method. SIGI-HMM and IslandPath/DIMOB have comparable overall highest accuracy. Our comparative genomics approach, IslandPick, was the most accurate, compared with a curated list of GIs, indicating that we have constructed suitable datasets. This represents the first evaluation, using diverse and, independent datasets that were not artificially constructed, of the accuracy of several sequence composition-based GI predictors. The caveats associated with this analysis and proposals for optimal island prediction are discussed. BioMed Central 2008-08-05 /pmc/articles/PMC2518932/ /pubmed/18680607 http://dx.doi.org/10.1186/1471-2105-9-329 Text en Copyright © 2008 Langille et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Langille, Morgan GI
Hsiao, William WL
Brinkman, Fiona SL
Evaluation of genomic island predictors using a comparative genomics approach
title Evaluation of genomic island predictors using a comparative genomics approach
title_full Evaluation of genomic island predictors using a comparative genomics approach
title_fullStr Evaluation of genomic island predictors using a comparative genomics approach
title_full_unstemmed Evaluation of genomic island predictors using a comparative genomics approach
title_short Evaluation of genomic island predictors using a comparative genomics approach
title_sort evaluation of genomic island predictors using a comparative genomics approach
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2518932/
https://www.ncbi.nlm.nih.gov/pubmed/18680607
http://dx.doi.org/10.1186/1471-2105-9-329
work_keys_str_mv AT langillemorgangi evaluationofgenomicislandpredictorsusingacomparativegenomicsapproach
AT hsiaowilliamwl evaluationofgenomicislandpredictorsusingacomparativegenomicsapproach
AT brinkmanfionasl evaluationofgenomicislandpredictorsusingacomparativegenomicsapproach