Cargando…

Staphylococcus aureus viewed from the perspective of 40,000+ genomes

Low-cost Illumina sequencing of clinically-important bacterial pathogens has generated thousands of publicly available genomic datasets. Analyzing these genomes and extracting relevant information for each pathogen and the associated clinical phenotypes requires not only resources and bioinformatic...

Descripción completa

Detalles Bibliográficos
Autores principales: Petit, Robert A., Read, Timothy D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6046195/
https://www.ncbi.nlm.nih.gov/pubmed/30013858
http://dx.doi.org/10.7717/peerj.5261
_version_ 1783339786489036800
author Petit, Robert A.
Read, Timothy D.
author_facet Petit, Robert A.
Read, Timothy D.
author_sort Petit, Robert A.
collection PubMed
description Low-cost Illumina sequencing of clinically-important bacterial pathogens has generated thousands of publicly available genomic datasets. Analyzing these genomes and extracting relevant information for each pathogen and the associated clinical phenotypes requires not only resources and bioinformatic skills but organism-specific knowledge. In light of these issues, we created Staphopia, an analysis pipeline, database and application programming interface, focused on Staphylococcus aureus, a common colonizer of humans and a major antibiotic-resistant pathogen responsible for a wide spectrum of hospital and community-associated infections. Written in Python, Staphopia’s analysis pipeline consists of submodules running open-source tools. It accepts raw FASTQ reads as an input, which undergo quality control filtration, error correction and reduction to a maximum of approximately 100× chromosome coverage. This reduction significantly reduces total runtime without detrimentally affecting the results. The pipeline performs de novo assembly-based and mapping-based analysis. Automated gene calling and annotation is performed on the assembled contigs. Read-mapping is used to call variants (single nucleotide polymorphisms and insertion/deletions) against a reference S. aureus chromosome (N315, ST5). We ran the analysis pipeline on more than 43,000 S. aureus shotgun Illumina genome projects in the public European Nucleotide Archive database in November 2017. We found that only a quarter of known multi-locus sequence types (STs) were represented but the top 10 STs made up 70% of all genomes. methicillin-resistant S. aureus (MRSA) were 64% of all genomes. Using the Staphopia database we selected 380 high quality genomes deposited with good metadata, each from a different multi-locus ST, as a non-redundant diversity set for studying S. aureus evolution. In addition to answering basic science questions, Staphopia could serve as a potential platform for rapid clinical diagnostics of S. aureus isolates in the future. The system could also be adapted as a template for other organism-specific databases.
format Online
Article
Text
id pubmed-6046195
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-60461952018-07-16 Staphylococcus aureus viewed from the perspective of 40,000+ genomes Petit, Robert A. Read, Timothy D. PeerJ Bioinformatics Low-cost Illumina sequencing of clinically-important bacterial pathogens has generated thousands of publicly available genomic datasets. Analyzing these genomes and extracting relevant information for each pathogen and the associated clinical phenotypes requires not only resources and bioinformatic skills but organism-specific knowledge. In light of these issues, we created Staphopia, an analysis pipeline, database and application programming interface, focused on Staphylococcus aureus, a common colonizer of humans and a major antibiotic-resistant pathogen responsible for a wide spectrum of hospital and community-associated infections. Written in Python, Staphopia’s analysis pipeline consists of submodules running open-source tools. It accepts raw FASTQ reads as an input, which undergo quality control filtration, error correction and reduction to a maximum of approximately 100× chromosome coverage. This reduction significantly reduces total runtime without detrimentally affecting the results. The pipeline performs de novo assembly-based and mapping-based analysis. Automated gene calling and annotation is performed on the assembled contigs. Read-mapping is used to call variants (single nucleotide polymorphisms and insertion/deletions) against a reference S. aureus chromosome (N315, ST5). We ran the analysis pipeline on more than 43,000 S. aureus shotgun Illumina genome projects in the public European Nucleotide Archive database in November 2017. We found that only a quarter of known multi-locus sequence types (STs) were represented but the top 10 STs made up 70% of all genomes. methicillin-resistant S. aureus (MRSA) were 64% of all genomes. Using the Staphopia database we selected 380 high quality genomes deposited with good metadata, each from a different multi-locus ST, as a non-redundant diversity set for studying S. aureus evolution. In addition to answering basic science questions, Staphopia could serve as a potential platform for rapid clinical diagnostics of S. aureus isolates in the future. The system could also be adapted as a template for other organism-specific databases. PeerJ Inc. 2018-07-12 /pmc/articles/PMC6046195/ /pubmed/30013858 http://dx.doi.org/10.7717/peerj.5261 Text en © 2018 Petit and Read http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Petit, Robert A.
Read, Timothy D.
Staphylococcus aureus viewed from the perspective of 40,000+ genomes
title Staphylococcus aureus viewed from the perspective of 40,000+ genomes
title_full Staphylococcus aureus viewed from the perspective of 40,000+ genomes
title_fullStr Staphylococcus aureus viewed from the perspective of 40,000+ genomes
title_full_unstemmed Staphylococcus aureus viewed from the perspective of 40,000+ genomes
title_short Staphylococcus aureus viewed from the perspective of 40,000+ genomes
title_sort staphylococcus aureus viewed from the perspective of 40,000+ genomes
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6046195/
https://www.ncbi.nlm.nih.gov/pubmed/30013858
http://dx.doi.org/10.7717/peerj.5261
work_keys_str_mv AT petitroberta staphylococcusaureusviewedfromtheperspectiveof40000genomes
AT readtimothyd staphylococcusaureusviewedfromtheperspectiveof40000genomes