Cargando…

SNP-Associations and Phenotype Predictions from Hundreds of Microbial Genomes without Genome Alignments

SNP-association studies are a starting point for identifying genes that may be responsible for specific phenotypes, such as disease traits. The vast bulk of tools for SNP-association studies are directed toward SNPs in the human genome, and I am unaware of any tools designed specifically for such st...

Descripción completa

Detalles Bibliográficos
Autor principal: Hall, Barry G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3938750/
https://www.ncbi.nlm.nih.gov/pubmed/24587377
http://dx.doi.org/10.1371/journal.pone.0090490
_version_ 1782305645797572608
author Hall, Barry G.
author_facet Hall, Barry G.
author_sort Hall, Barry G.
collection PubMed
description SNP-association studies are a starting point for identifying genes that may be responsible for specific phenotypes, such as disease traits. The vast bulk of tools for SNP-association studies are directed toward SNPs in the human genome, and I am unaware of any tools designed specifically for such studies in bacterial or viral genomes. The PPFS (Predict Phenotypes From SNPs) package described here is an add-on to kSNP, a program that can identify SNPs in a data set of hundreds of microbial genomes. PPFS identifies those SNPs that are non-randomly associated with a phenotype based on the χ(2) probability, then uses those diagnostic SNPs for two distinct, but related, purposes: (1) to predict the phenotypes of strains whose phenotypes are unknown, and (2) to identify those diagnostic SNPs that are most likely to be causally related to the phenotype. In the example illustrated here, from a set of 68 E. coli genomes, for 67 of which the pathogenicity phenotype was known, there were 418,500 SNPs. Using the phenotypes of 36 of those strains, PPFS identified 207 diagnostic SNPs. The diagnostic SNPs predicted the phenotypes of all of the genomes with 97% accuracy. It then identified 97 SNPs whose probability of being causally related to the pathogenic phenotype was >0.999. In a second example, from a set of 116 E. coli genome sequences, using the phenotypes of 65 strains PPFS identified 101 SNPs that predicted the source host (human or non-human) with 90% accuracy.
format Online
Article
Text
id pubmed-3938750
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-39387502014-03-04 SNP-Associations and Phenotype Predictions from Hundreds of Microbial Genomes without Genome Alignments Hall, Barry G. PLoS One Research Article SNP-association studies are a starting point for identifying genes that may be responsible for specific phenotypes, such as disease traits. The vast bulk of tools for SNP-association studies are directed toward SNPs in the human genome, and I am unaware of any tools designed specifically for such studies in bacterial or viral genomes. The PPFS (Predict Phenotypes From SNPs) package described here is an add-on to kSNP, a program that can identify SNPs in a data set of hundreds of microbial genomes. PPFS identifies those SNPs that are non-randomly associated with a phenotype based on the χ(2) probability, then uses those diagnostic SNPs for two distinct, but related, purposes: (1) to predict the phenotypes of strains whose phenotypes are unknown, and (2) to identify those diagnostic SNPs that are most likely to be causally related to the phenotype. In the example illustrated here, from a set of 68 E. coli genomes, for 67 of which the pathogenicity phenotype was known, there were 418,500 SNPs. Using the phenotypes of 36 of those strains, PPFS identified 207 diagnostic SNPs. The diagnostic SNPs predicted the phenotypes of all of the genomes with 97% accuracy. It then identified 97 SNPs whose probability of being causally related to the pathogenic phenotype was >0.999. In a second example, from a set of 116 E. coli genome sequences, using the phenotypes of 65 strains PPFS identified 101 SNPs that predicted the source host (human or non-human) with 90% accuracy. Public Library of Science 2014-02-28 /pmc/articles/PMC3938750/ /pubmed/24587377 http://dx.doi.org/10.1371/journal.pone.0090490 Text en © 2014 Barry G http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Hall, Barry G.
SNP-Associations and Phenotype Predictions from Hundreds of Microbial Genomes without Genome Alignments
title SNP-Associations and Phenotype Predictions from Hundreds of Microbial Genomes without Genome Alignments
title_full SNP-Associations and Phenotype Predictions from Hundreds of Microbial Genomes without Genome Alignments
title_fullStr SNP-Associations and Phenotype Predictions from Hundreds of Microbial Genomes without Genome Alignments
title_full_unstemmed SNP-Associations and Phenotype Predictions from Hundreds of Microbial Genomes without Genome Alignments
title_short SNP-Associations and Phenotype Predictions from Hundreds of Microbial Genomes without Genome Alignments
title_sort snp-associations and phenotype predictions from hundreds of microbial genomes without genome alignments
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3938750/
https://www.ncbi.nlm.nih.gov/pubmed/24587377
http://dx.doi.org/10.1371/journal.pone.0090490
work_keys_str_mv AT hallbarryg snpassociationsandphenotypepredictionsfromhundredsofmicrobialgenomeswithoutgenomealignments