Cargando…

PlasmidTron: assembling the cause of phenotypes and genotypes from NGS data

Increasingly rich metadata are now being linked to samples that have been whole-genome sequenced. However, much of this information is ignored. This is because linking this metadata to genes, or regions of the genome, usually relies on knowing the gene sequence(s) responsible for the particular trai...

Descripción completa

Detalles Bibliográficos
Autores principales: Page, Andrew J., Wailan, Alexander, Shao, Yan, Judge, Kim, Dougan, Gordon, Klemm, Elizabeth J., Thomson, Nicholas R., Keane, Jacqueline A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Microbiology Society 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5885016/
https://www.ncbi.nlm.nih.gov/pubmed/29533742
http://dx.doi.org/10.1099/mgen.0.000164
_version_ 1783311915959713792
author Page, Andrew J.
Wailan, Alexander
Shao, Yan
Judge, Kim
Dougan, Gordon
Klemm, Elizabeth J.
Thomson, Nicholas R.
Keane, Jacqueline A.
author_facet Page, Andrew J.
Wailan, Alexander
Shao, Yan
Judge, Kim
Dougan, Gordon
Klemm, Elizabeth J.
Thomson, Nicholas R.
Keane, Jacqueline A.
author_sort Page, Andrew J.
collection PubMed
description Increasingly rich metadata are now being linked to samples that have been whole-genome sequenced. However, much of this information is ignored. This is because linking this metadata to genes, or regions of the genome, usually relies on knowing the gene sequence(s) responsible for the particular trait being measured and looking for its presence or absence in that genome. Examples of this would be the spread of antimicrobial resistance genes carried on mobile genetic elements (MGEs). However, although it is possible to routinely identify the resistance gene, identifying the unknown MGE upon which it is carried can be much more difficult if the starting point is short-read whole-genome sequence data. The reason for this is that MGEs are often full of repeats and so assemble poorly, leading to fragmented consensus sequences. Since mobile DNA, which can carry many clinically and ecologically important genes, has a different evolutionary history from the host, its distribution across the host population will, by definition, be independent of the host phylogeny. It is possible to use this phenomenon in a genome-wide association study to identify both the genes associated with the specific trait and also the DNA linked to that gene, for example the flanking sequence of the plasmid vector on which it is encoded, which follows the same patterns of distribution as the marker gene/sequence itself. We present PlasmidTron, which utilizes the phenotypic data normally available in bacterial population studies, such as antibiograms, virulence factors, or geographical information, to identify traits that are likely to be present on DNA that can randomly reassort across defined bacterial populations. It is also possible to use this methodology to associate unknown genes/sequences (e.g. plasmid backbones) with a specific molecular signature or marker (e.g. resistance gene presence or absence) using PlasmidTron. PlasmidTron uses a k-mer-based approach to identify reads associated with a phylogenetically unlinked phenotype. These reads are then assembled de novo to produce contigs in a fast and scalable-to-large manner. PlasmidTron is written in Python 3 and is available under the open source licence GNU GPL3 from https://github.com/sanger-pathogens/plasmidtron.
format Online
Article
Text
id pubmed-5885016
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Microbiology Society
record_format MEDLINE/PubMed
spelling pubmed-58850162018-04-05 PlasmidTron: assembling the cause of phenotypes and genotypes from NGS data Page, Andrew J. Wailan, Alexander Shao, Yan Judge, Kim Dougan, Gordon Klemm, Elizabeth J. Thomson, Nicholas R. Keane, Jacqueline A. Microb Genom Methods Paper Increasingly rich metadata are now being linked to samples that have been whole-genome sequenced. However, much of this information is ignored. This is because linking this metadata to genes, or regions of the genome, usually relies on knowing the gene sequence(s) responsible for the particular trait being measured and looking for its presence or absence in that genome. Examples of this would be the spread of antimicrobial resistance genes carried on mobile genetic elements (MGEs). However, although it is possible to routinely identify the resistance gene, identifying the unknown MGE upon which it is carried can be much more difficult if the starting point is short-read whole-genome sequence data. The reason for this is that MGEs are often full of repeats and so assemble poorly, leading to fragmented consensus sequences. Since mobile DNA, which can carry many clinically and ecologically important genes, has a different evolutionary history from the host, its distribution across the host population will, by definition, be independent of the host phylogeny. It is possible to use this phenomenon in a genome-wide association study to identify both the genes associated with the specific trait and also the DNA linked to that gene, for example the flanking sequence of the plasmid vector on which it is encoded, which follows the same patterns of distribution as the marker gene/sequence itself. We present PlasmidTron, which utilizes the phenotypic data normally available in bacterial population studies, such as antibiograms, virulence factors, or geographical information, to identify traits that are likely to be present on DNA that can randomly reassort across defined bacterial populations. It is also possible to use this methodology to associate unknown genes/sequences (e.g. plasmid backbones) with a specific molecular signature or marker (e.g. resistance gene presence or absence) using PlasmidTron. PlasmidTron uses a k-mer-based approach to identify reads associated with a phylogenetically unlinked phenotype. These reads are then assembled de novo to produce contigs in a fast and scalable-to-large manner. PlasmidTron is written in Python 3 and is available under the open source licence GNU GPL3 from https://github.com/sanger-pathogens/plasmidtron. Microbiology Society 2018-03-12 /pmc/articles/PMC5885016/ /pubmed/29533742 http://dx.doi.org/10.1099/mgen.0.000164 Text en © 2018 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.
spellingShingle Methods Paper
Page, Andrew J.
Wailan, Alexander
Shao, Yan
Judge, Kim
Dougan, Gordon
Klemm, Elizabeth J.
Thomson, Nicholas R.
Keane, Jacqueline A.
PlasmidTron: assembling the cause of phenotypes and genotypes from NGS data
title PlasmidTron: assembling the cause of phenotypes and genotypes from NGS data
title_full PlasmidTron: assembling the cause of phenotypes and genotypes from NGS data
title_fullStr PlasmidTron: assembling the cause of phenotypes and genotypes from NGS data
title_full_unstemmed PlasmidTron: assembling the cause of phenotypes and genotypes from NGS data
title_short PlasmidTron: assembling the cause of phenotypes and genotypes from NGS data
title_sort plasmidtron: assembling the cause of phenotypes and genotypes from ngs data
topic Methods Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5885016/
https://www.ncbi.nlm.nih.gov/pubmed/29533742
http://dx.doi.org/10.1099/mgen.0.000164
work_keys_str_mv AT pageandrewj plasmidtronassemblingthecauseofphenotypesandgenotypesfromngsdata
AT wailanalexander plasmidtronassemblingthecauseofphenotypesandgenotypesfromngsdata
AT shaoyan plasmidtronassemblingthecauseofphenotypesandgenotypesfromngsdata
AT judgekim plasmidtronassemblingthecauseofphenotypesandgenotypesfromngsdata
AT dougangordon plasmidtronassemblingthecauseofphenotypesandgenotypesfromngsdata
AT klemmelizabethj plasmidtronassemblingthecauseofphenotypesandgenotypesfromngsdata
AT thomsonnicholasr plasmidtronassemblingthecauseofphenotypesandgenotypesfromngsdata
AT keanejacquelinea plasmidtronassemblingthecauseofphenotypesandgenotypesfromngsdata