Cargando…

GGRaSP: a R-package for selecting representative genomes using Gaussian mixture models

MOTIVATION: The vast number of available sequenced bacterial genomes occasionally exceeds the facilities of comparative genomic methods or is dominated by a single outbreak strain, and thus a diverse and representative subset is required. Generation of the reduced subset currently requires a priori...

Descripción completa

Detalles Bibliográficos
Autores principales: Clarke, Thomas H, Brinkac, Lauren M, Sutton, Granger, Fouts, Derrick E
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6129299/
https://www.ncbi.nlm.nih.gov/pubmed/29668840
http://dx.doi.org/10.1093/bioinformatics/bty300
_version_ 1783353776851124224
author Clarke, Thomas H
Brinkac, Lauren M
Sutton, Granger
Fouts, Derrick E
author_facet Clarke, Thomas H
Brinkac, Lauren M
Sutton, Granger
Fouts, Derrick E
author_sort Clarke, Thomas H
collection PubMed
description MOTIVATION: The vast number of available sequenced bacterial genomes occasionally exceeds the facilities of comparative genomic methods or is dominated by a single outbreak strain, and thus a diverse and representative subset is required. Generation of the reduced subset currently requires a priori supervised clustering and sequence-only selection of medoid genomic sequences, independent of any additional genome metrics or strain attributes. RESULTS: The Gaussian Genome Representative Selector with Prioritization (GGRaSP) R-package described below generates a reduced subset of genomes that prioritizes maintaining genomes of interest to the user as well as minimizing the loss of genetic variation. The package also allows for unsupervised clustering by modeling the genomic relationships using a Gaussian mixture model to select an appropriate cluster threshold. We demonstrate the capabilities of GGRaSP by generating a reduced list of 315 genomes from a genomic dataset of 4600 Escherichia coli genomes, prioritizing selection by type strain and by genome completeness. AVAILABILITY AND IMPLEMENTAION: GGRaSP is available at https://github.com/JCVenterInstitute/ggrasp/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6129299
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-61292992018-09-12 GGRaSP: a R-package for selecting representative genomes using Gaussian mixture models Clarke, Thomas H Brinkac, Lauren M Sutton, Granger Fouts, Derrick E Bioinformatics Applications Notes MOTIVATION: The vast number of available sequenced bacterial genomes occasionally exceeds the facilities of comparative genomic methods or is dominated by a single outbreak strain, and thus a diverse and representative subset is required. Generation of the reduced subset currently requires a priori supervised clustering and sequence-only selection of medoid genomic sequences, independent of any additional genome metrics or strain attributes. RESULTS: The Gaussian Genome Representative Selector with Prioritization (GGRaSP) R-package described below generates a reduced subset of genomes that prioritizes maintaining genomes of interest to the user as well as minimizing the loss of genetic variation. The package also allows for unsupervised clustering by modeling the genomic relationships using a Gaussian mixture model to select an appropriate cluster threshold. We demonstrate the capabilities of GGRaSP by generating a reduced list of 315 genomes from a genomic dataset of 4600 Escherichia coli genomes, prioritizing selection by type strain and by genome completeness. AVAILABILITY AND IMPLEMENTAION: GGRaSP is available at https://github.com/JCVenterInstitute/ggrasp/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-09-01 2018-04-14 /pmc/articles/PMC6129299/ /pubmed/29668840 http://dx.doi.org/10.1093/bioinformatics/bty300 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Notes
Clarke, Thomas H
Brinkac, Lauren M
Sutton, Granger
Fouts, Derrick E
GGRaSP: a R-package for selecting representative genomes using Gaussian mixture models
title GGRaSP: a R-package for selecting representative genomes using Gaussian mixture models
title_full GGRaSP: a R-package for selecting representative genomes using Gaussian mixture models
title_fullStr GGRaSP: a R-package for selecting representative genomes using Gaussian mixture models
title_full_unstemmed GGRaSP: a R-package for selecting representative genomes using Gaussian mixture models
title_short GGRaSP: a R-package for selecting representative genomes using Gaussian mixture models
title_sort ggrasp: a r-package for selecting representative genomes using gaussian mixture models
topic Applications Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6129299/
https://www.ncbi.nlm.nih.gov/pubmed/29668840
http://dx.doi.org/10.1093/bioinformatics/bty300
work_keys_str_mv AT clarkethomash ggrasparpackageforselectingrepresentativegenomesusinggaussianmixturemodels
AT brinkaclaurenm ggrasparpackageforselectingrepresentativegenomesusinggaussianmixturemodels
AT suttongranger ggrasparpackageforselectingrepresentativegenomesusinggaussianmixturemodels
AT foutsderricke ggrasparpackageforselectingrepresentativegenomesusinggaussianmixturemodels