Cargando…

Pan-genome sequence analysis using Panseq: an online tool for the rapid analysis of core and accessory genomic regions

BACKGROUND: The pan-genome of a bacterial species consists of a core and an accessory gene pool. The accessory genome is thought to be an important source of genetic variability in bacterial populations and is gained through lateral gene transfer, allowing subpopulations of bacteria to better adapt...

Descripción completa

Detalles Bibliográficos
Autores principales: Laing, Chad, Buchanan, Cody, Taboada, Eduardo N, Zhang, Yongxiang, Kropinski, Andrew, Villegas, Andre, Thomas, James E, Gannon, Victor PJ
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2949892/
https://www.ncbi.nlm.nih.gov/pubmed/20843356
http://dx.doi.org/10.1186/1471-2105-11-461
_version_ 1782187605337571328
author Laing, Chad
Buchanan, Cody
Taboada, Eduardo N
Zhang, Yongxiang
Kropinski, Andrew
Villegas, Andre
Thomas, James E
Gannon, Victor PJ
author_facet Laing, Chad
Buchanan, Cody
Taboada, Eduardo N
Zhang, Yongxiang
Kropinski, Andrew
Villegas, Andre
Thomas, James E
Gannon, Victor PJ
author_sort Laing, Chad
collection PubMed
description BACKGROUND: The pan-genome of a bacterial species consists of a core and an accessory gene pool. The accessory genome is thought to be an important source of genetic variability in bacterial populations and is gained through lateral gene transfer, allowing subpopulations of bacteria to better adapt to specific niches. Low-cost and high-throughput sequencing platforms have created an exponential increase in genome sequence data and an opportunity to study the pan-genomes of many bacterial species. In this study, we describe a new online pan-genome sequence analysis program, Panseq. RESULTS: Panseq was used to identify Escherichia coli O157:H7 and E. coli K-12 genomic islands. Within a population of 60 E. coli O157:H7 strains, the existence of 65 accessory genomic regions identified by Panseq analysis was confirmed by PCR. The accessory genome and binary presence/absence data, and core genome and single nucleotide polymorphisms (SNPs) of six L. monocytogenes strains were extracted with Panseq and hierarchically clustered and visualized. The nucleotide core and binary accessory data were also used to construct maximum parsimony (MP) trees, which were compared to the MP tree generated by multi-locus sequence typing (MLST). The topology of the accessory and core trees was identical but differed from the tree produced using seven MLST loci. The Loci Selector module found the most variable and discriminatory combinations of four loci within a 100 loci set among 10 strains in 1 s, compared to the 449 s required to exhaustively search for all possible combinations; it also found the most discriminatory 20 loci from a 96 loci E. coli O157:H7 SNP dataset. CONCLUSION: Panseq determines the core and accessory regions among a collection of genomic sequences based on user-defined parameters. It readily extracts regions unique to a genome or group of genomes, identifies SNPs within shared core genomic regions, constructs files for use in phylogeny programs based on both the presence/absence of accessory regions and SNPs within core regions and produces a graphical overview of the output. Panseq also includes a loci selector that calculates the most variable and discriminatory loci among sets of accessory loci or core gene SNPs. AVAILABILITY: Panseq is freely available online at http://76.70.11.198/panseq. Panseq is written in Perl.
format Text
id pubmed-2949892
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29498922010-10-06 Pan-genome sequence analysis using Panseq: an online tool for the rapid analysis of core and accessory genomic regions Laing, Chad Buchanan, Cody Taboada, Eduardo N Zhang, Yongxiang Kropinski, Andrew Villegas, Andre Thomas, James E Gannon, Victor PJ BMC Bioinformatics Software BACKGROUND: The pan-genome of a bacterial species consists of a core and an accessory gene pool. The accessory genome is thought to be an important source of genetic variability in bacterial populations and is gained through lateral gene transfer, allowing subpopulations of bacteria to better adapt to specific niches. Low-cost and high-throughput sequencing platforms have created an exponential increase in genome sequence data and an opportunity to study the pan-genomes of many bacterial species. In this study, we describe a new online pan-genome sequence analysis program, Panseq. RESULTS: Panseq was used to identify Escherichia coli O157:H7 and E. coli K-12 genomic islands. Within a population of 60 E. coli O157:H7 strains, the existence of 65 accessory genomic regions identified by Panseq analysis was confirmed by PCR. The accessory genome and binary presence/absence data, and core genome and single nucleotide polymorphisms (SNPs) of six L. monocytogenes strains were extracted with Panseq and hierarchically clustered and visualized. The nucleotide core and binary accessory data were also used to construct maximum parsimony (MP) trees, which were compared to the MP tree generated by multi-locus sequence typing (MLST). The topology of the accessory and core trees was identical but differed from the tree produced using seven MLST loci. The Loci Selector module found the most variable and discriminatory combinations of four loci within a 100 loci set among 10 strains in 1 s, compared to the 449 s required to exhaustively search for all possible combinations; it also found the most discriminatory 20 loci from a 96 loci E. coli O157:H7 SNP dataset. CONCLUSION: Panseq determines the core and accessory regions among a collection of genomic sequences based on user-defined parameters. It readily extracts regions unique to a genome or group of genomes, identifies SNPs within shared core genomic regions, constructs files for use in phylogeny programs based on both the presence/absence of accessory regions and SNPs within core regions and produces a graphical overview of the output. Panseq also includes a loci selector that calculates the most variable and discriminatory loci among sets of accessory loci or core gene SNPs. AVAILABILITY: Panseq is freely available online at http://76.70.11.198/panseq. Panseq is written in Perl. BioMed Central 2010-09-15 /pmc/articles/PMC2949892/ /pubmed/20843356 http://dx.doi.org/10.1186/1471-2105-11-461 Text en Copyright ©2010 Laing et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Laing, Chad
Buchanan, Cody
Taboada, Eduardo N
Zhang, Yongxiang
Kropinski, Andrew
Villegas, Andre
Thomas, James E
Gannon, Victor PJ
Pan-genome sequence analysis using Panseq: an online tool for the rapid analysis of core and accessory genomic regions
title Pan-genome sequence analysis using Panseq: an online tool for the rapid analysis of core and accessory genomic regions
title_full Pan-genome sequence analysis using Panseq: an online tool for the rapid analysis of core and accessory genomic regions
title_fullStr Pan-genome sequence analysis using Panseq: an online tool for the rapid analysis of core and accessory genomic regions
title_full_unstemmed Pan-genome sequence analysis using Panseq: an online tool for the rapid analysis of core and accessory genomic regions
title_short Pan-genome sequence analysis using Panseq: an online tool for the rapid analysis of core and accessory genomic regions
title_sort pan-genome sequence analysis using panseq: an online tool for the rapid analysis of core and accessory genomic regions
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2949892/
https://www.ncbi.nlm.nih.gov/pubmed/20843356
http://dx.doi.org/10.1186/1471-2105-11-461
work_keys_str_mv AT laingchad pangenomesequenceanalysisusingpanseqanonlinetoolfortherapidanalysisofcoreandaccessorygenomicregions
AT buchanancody pangenomesequenceanalysisusingpanseqanonlinetoolfortherapidanalysisofcoreandaccessorygenomicregions
AT taboadaeduardon pangenomesequenceanalysisusingpanseqanonlinetoolfortherapidanalysisofcoreandaccessorygenomicregions
AT zhangyongxiang pangenomesequenceanalysisusingpanseqanonlinetoolfortherapidanalysisofcoreandaccessorygenomicregions
AT kropinskiandrew pangenomesequenceanalysisusingpanseqanonlinetoolfortherapidanalysisofcoreandaccessorygenomicregions
AT villegasandre pangenomesequenceanalysisusingpanseqanonlinetoolfortherapidanalysisofcoreandaccessorygenomicregions
AT thomasjamese pangenomesequenceanalysisusingpanseqanonlinetoolfortherapidanalysisofcoreandaccessorygenomicregions
AT gannonvictorpj pangenomesequenceanalysisusingpanseqanonlinetoolfortherapidanalysisofcoreandaccessorygenomicregions