Cargando…

A gene-by-gene population genomics platform: de novo assembly, annotation and genealogical analysis of 108 representative Neisseria meningitidis genomes

BACKGROUND: Highly parallel, ‘second generation’ sequencing technologies have rapidly expanded the number of bacterial whole genome sequences available for study, permitting the emergence of the discipline of population genomics. Most of these data are publically available as unassembled short-read...

Descripción completa

Detalles Bibliográficos
Autores principales: Bratcher, Holly B, Corton, Craig, Jolley, Keith A, Parkhill, Julian, Maiden, Martin CJ
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4377854/
https://www.ncbi.nlm.nih.gov/pubmed/25523208
http://dx.doi.org/10.1186/1471-2164-15-1138
_version_ 1782363971719790592
author Bratcher, Holly B
Corton, Craig
Jolley, Keith A
Parkhill, Julian
Maiden, Martin CJ
author_facet Bratcher, Holly B
Corton, Craig
Jolley, Keith A
Parkhill, Julian
Maiden, Martin CJ
author_sort Bratcher, Holly B
collection PubMed
description BACKGROUND: Highly parallel, ‘second generation’ sequencing technologies have rapidly expanded the number of bacterial whole genome sequences available for study, permitting the emergence of the discipline of population genomics. Most of these data are publically available as unassembled short-read sequence files that require extensive processing before they can be used for analysis. The provision of data in a uniform format, which can be easily assessed for quality, linked to provenance and phenotype and used for analysis, is therefore necessary. RESULTS: The performance of de novo short-read assembly followed by automatic annotation using the pubMLST.org Neisseria database was assessed and evaluated for 108 diverse, representative, and well-characterised Neisseria meningitidis isolates. High-quality sequences were obtained for >99% of known meningococcal genes among the de novo assembled genomes and four resequenced genomes and less than 1% of reassembled genes had sequence discrepancies or misassembled sequences. A core genome of 1600 loci, present in at least 95% of the population, was determined using the Genome Comparator tool. Genealogical relationships compatible with, but at a higher resolution than, those identified by multilocus sequence typing were obtained with core genome comparisons and ribosomal protein gene analysis which revealed a genomic structure for a number of previously described phenotypes. This unified system for cataloguing Neisseria genetic variation in the genome was implemented and used for multiple analyses and the data are publically available in the PubMLST Neisseria database. CONCLUSIONS: The de novo assembly, combined with automated gene-by-gene annotation, generates high quality draft genomes in which the majority of protein-encoding genes are present with high accuracy. The approach catalogues diversity efficiently, permits analyses of a single genome or multiple genome comparisons, and is a practical approach to interpreting WGS data for large bacterial population samples. The method generates novel insights into the biology of the meningococcus and improves our understanding of the whole population structure, not just disease causing lineages. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-1138) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4377854
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43778542015-03-31 A gene-by-gene population genomics platform: de novo assembly, annotation and genealogical analysis of 108 representative Neisseria meningitidis genomes Bratcher, Holly B Corton, Craig Jolley, Keith A Parkhill, Julian Maiden, Martin CJ BMC Genomics Methodology Article BACKGROUND: Highly parallel, ‘second generation’ sequencing technologies have rapidly expanded the number of bacterial whole genome sequences available for study, permitting the emergence of the discipline of population genomics. Most of these data are publically available as unassembled short-read sequence files that require extensive processing before they can be used for analysis. The provision of data in a uniform format, which can be easily assessed for quality, linked to provenance and phenotype and used for analysis, is therefore necessary. RESULTS: The performance of de novo short-read assembly followed by automatic annotation using the pubMLST.org Neisseria database was assessed and evaluated for 108 diverse, representative, and well-characterised Neisseria meningitidis isolates. High-quality sequences were obtained for >99% of known meningococcal genes among the de novo assembled genomes and four resequenced genomes and less than 1% of reassembled genes had sequence discrepancies or misassembled sequences. A core genome of 1600 loci, present in at least 95% of the population, was determined using the Genome Comparator tool. Genealogical relationships compatible with, but at a higher resolution than, those identified by multilocus sequence typing were obtained with core genome comparisons and ribosomal protein gene analysis which revealed a genomic structure for a number of previously described phenotypes. This unified system for cataloguing Neisseria genetic variation in the genome was implemented and used for multiple analyses and the data are publically available in the PubMLST Neisseria database. CONCLUSIONS: The de novo assembly, combined with automated gene-by-gene annotation, generates high quality draft genomes in which the majority of protein-encoding genes are present with high accuracy. The approach catalogues diversity efficiently, permits analyses of a single genome or multiple genome comparisons, and is a practical approach to interpreting WGS data for large bacterial population samples. The method generates novel insights into the biology of the meningococcus and improves our understanding of the whole population structure, not just disease causing lineages. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-1138) contains supplementary material, which is available to authorized users. BioMed Central 2014-12-18 /pmc/articles/PMC4377854/ /pubmed/25523208 http://dx.doi.org/10.1186/1471-2164-15-1138 Text en © Bratcher et al.; licensee BioMed Central. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Bratcher, Holly B
Corton, Craig
Jolley, Keith A
Parkhill, Julian
Maiden, Martin CJ
A gene-by-gene population genomics platform: de novo assembly, annotation and genealogical analysis of 108 representative Neisseria meningitidis genomes
title A gene-by-gene population genomics platform: de novo assembly, annotation and genealogical analysis of 108 representative Neisseria meningitidis genomes
title_full A gene-by-gene population genomics platform: de novo assembly, annotation and genealogical analysis of 108 representative Neisseria meningitidis genomes
title_fullStr A gene-by-gene population genomics platform: de novo assembly, annotation and genealogical analysis of 108 representative Neisseria meningitidis genomes
title_full_unstemmed A gene-by-gene population genomics platform: de novo assembly, annotation and genealogical analysis of 108 representative Neisseria meningitidis genomes
title_short A gene-by-gene population genomics platform: de novo assembly, annotation and genealogical analysis of 108 representative Neisseria meningitidis genomes
title_sort gene-by-gene population genomics platform: de novo assembly, annotation and genealogical analysis of 108 representative neisseria meningitidis genomes
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4377854/
https://www.ncbi.nlm.nih.gov/pubmed/25523208
http://dx.doi.org/10.1186/1471-2164-15-1138
work_keys_str_mv AT bratcherhollyb agenebygenepopulationgenomicsplatformdenovoassemblyannotationandgenealogicalanalysisof108representativeneisseriameningitidisgenomes
AT cortoncraig agenebygenepopulationgenomicsplatformdenovoassemblyannotationandgenealogicalanalysisof108representativeneisseriameningitidisgenomes
AT jolleykeitha agenebygenepopulationgenomicsplatformdenovoassemblyannotationandgenealogicalanalysisof108representativeneisseriameningitidisgenomes
AT parkhilljulian agenebygenepopulationgenomicsplatformdenovoassemblyannotationandgenealogicalanalysisof108representativeneisseriameningitidisgenomes
AT maidenmartincj agenebygenepopulationgenomicsplatformdenovoassemblyannotationandgenealogicalanalysisof108representativeneisseriameningitidisgenomes
AT bratcherhollyb genebygenepopulationgenomicsplatformdenovoassemblyannotationandgenealogicalanalysisof108representativeneisseriameningitidisgenomes
AT cortoncraig genebygenepopulationgenomicsplatformdenovoassemblyannotationandgenealogicalanalysisof108representativeneisseriameningitidisgenomes
AT jolleykeitha genebygenepopulationgenomicsplatformdenovoassemblyannotationandgenealogicalanalysisof108representativeneisseriameningitidisgenomes
AT parkhilljulian genebygenepopulationgenomicsplatformdenovoassemblyannotationandgenealogicalanalysisof108representativeneisseriameningitidisgenomes
AT maidenmartincj genebygenepopulationgenomicsplatformdenovoassemblyannotationandgenealogicalanalysisof108representativeneisseriameningitidisgenomes