Cargando…

Imputation-Based Population Genetics Analysis of Plasmodium falciparum Malaria Parasites

Whole-genome sequencing technologies are being increasingly applied to Plasmodium falciparum clinical isolates to identify genetic determinants of malaria pathogenesis. However, genome-wide discovery methods, such as haplotype scans for signatures of natural selection, are hindered by missing genoty...

Descripción completa

Detalles Bibliográficos
Autores principales: Samad, Hanif, Coll, Francesc, Preston, Mark D., Ocholla, Harold, Fairhurst, Rick M., Clark, Taane G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4415759/
https://www.ncbi.nlm.nih.gov/pubmed/25928499
http://dx.doi.org/10.1371/journal.pgen.1005131
_version_ 1782369119506530304
author Samad, Hanif
Coll, Francesc
Preston, Mark D.
Ocholla, Harold
Fairhurst, Rick M.
Clark, Taane G.
author_facet Samad, Hanif
Coll, Francesc
Preston, Mark D.
Ocholla, Harold
Fairhurst, Rick M.
Clark, Taane G.
author_sort Samad, Hanif
collection PubMed
description Whole-genome sequencing technologies are being increasingly applied to Plasmodium falciparum clinical isolates to identify genetic determinants of malaria pathogenesis. However, genome-wide discovery methods, such as haplotype scans for signatures of natural selection, are hindered by missing genotypes in sequence data. Poor correlation between single nucleotide polymorphisms (SNPs) in the P. falciparum genome complicates efforts to apply established missing-genotype imputation methods that leverage off patterns of linkage disequilibrium (LD). The accuracy of state-of-the-art, LD-based imputation methods (IMPUTE, Beagle) was assessed by measuring allelic r(2) for 459 P. falciparum samples from malaria patients in 4 countries: Thailand, Cambodia, Gambia, and Malawi. In restricting our analysis to 86k high-quality SNPs across the populations, we found that the complete-case analysis was restricted to 21k SNPs (24.5%), despite no single SNP having more than 10% missing genotypes. The accuracy of Beagle in filling in missing genotypes was consistently high across all populations (allelic r(2), 0.87-0.96), but the performance of IMPUTE was mixed (allelic r(2), 0.34-0.99) depending on reference haplotypes and population. Positive selection analysis using Beagle-imputed haplotypes identified loci involved in resistance to chloroquine (crt) in Thailand, Cambodia, and Gambia, sulfadoxine-pyrimethamine (dhfr, dhps) in Cambodia, and artemisinin (kelch13) in Cambodia. Tajima’s D-based analysis identified genes under balancing selection that encode well-characterized vaccine candidates: apical merozoite antigen 1 (ama1) and merozoite surface protein 1 (msp1). In contrast, the complete-case analysis failed to identify any well-validated drug resistance or candidate vaccine loci, except kelch13. In a setting of low LD and modest levels of missing genotypes, using Beagle to impute P. falciparum genotypes is a viable strategy for conducting accurate large-scale population genetics and association analyses, and supporting global surveillance for drug resistance markers and candidate vaccine antigens.
format Online
Article
Text
id pubmed-4415759
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-44157592015-05-07 Imputation-Based Population Genetics Analysis of Plasmodium falciparum Malaria Parasites Samad, Hanif Coll, Francesc Preston, Mark D. Ocholla, Harold Fairhurst, Rick M. Clark, Taane G. PLoS Genet Research Article Whole-genome sequencing technologies are being increasingly applied to Plasmodium falciparum clinical isolates to identify genetic determinants of malaria pathogenesis. However, genome-wide discovery methods, such as haplotype scans for signatures of natural selection, are hindered by missing genotypes in sequence data. Poor correlation between single nucleotide polymorphisms (SNPs) in the P. falciparum genome complicates efforts to apply established missing-genotype imputation methods that leverage off patterns of linkage disequilibrium (LD). The accuracy of state-of-the-art, LD-based imputation methods (IMPUTE, Beagle) was assessed by measuring allelic r(2) for 459 P. falciparum samples from malaria patients in 4 countries: Thailand, Cambodia, Gambia, and Malawi. In restricting our analysis to 86k high-quality SNPs across the populations, we found that the complete-case analysis was restricted to 21k SNPs (24.5%), despite no single SNP having more than 10% missing genotypes. The accuracy of Beagle in filling in missing genotypes was consistently high across all populations (allelic r(2), 0.87-0.96), but the performance of IMPUTE was mixed (allelic r(2), 0.34-0.99) depending on reference haplotypes and population. Positive selection analysis using Beagle-imputed haplotypes identified loci involved in resistance to chloroquine (crt) in Thailand, Cambodia, and Gambia, sulfadoxine-pyrimethamine (dhfr, dhps) in Cambodia, and artemisinin (kelch13) in Cambodia. Tajima’s D-based analysis identified genes under balancing selection that encode well-characterized vaccine candidates: apical merozoite antigen 1 (ama1) and merozoite surface protein 1 (msp1). In contrast, the complete-case analysis failed to identify any well-validated drug resistance or candidate vaccine loci, except kelch13. In a setting of low LD and modest levels of missing genotypes, using Beagle to impute P. falciparum genotypes is a viable strategy for conducting accurate large-scale population genetics and association analyses, and supporting global surveillance for drug resistance markers and candidate vaccine antigens. Public Library of Science 2015-04-30 /pmc/articles/PMC4415759/ /pubmed/25928499 http://dx.doi.org/10.1371/journal.pgen.1005131 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration, which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
spellingShingle Research Article
Samad, Hanif
Coll, Francesc
Preston, Mark D.
Ocholla, Harold
Fairhurst, Rick M.
Clark, Taane G.
Imputation-Based Population Genetics Analysis of Plasmodium falciparum Malaria Parasites
title Imputation-Based Population Genetics Analysis of Plasmodium falciparum Malaria Parasites
title_full Imputation-Based Population Genetics Analysis of Plasmodium falciparum Malaria Parasites
title_fullStr Imputation-Based Population Genetics Analysis of Plasmodium falciparum Malaria Parasites
title_full_unstemmed Imputation-Based Population Genetics Analysis of Plasmodium falciparum Malaria Parasites
title_short Imputation-Based Population Genetics Analysis of Plasmodium falciparum Malaria Parasites
title_sort imputation-based population genetics analysis of plasmodium falciparum malaria parasites
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4415759/
https://www.ncbi.nlm.nih.gov/pubmed/25928499
http://dx.doi.org/10.1371/journal.pgen.1005131
work_keys_str_mv AT samadhanif imputationbasedpopulationgeneticsanalysisofplasmodiumfalciparummalariaparasites
AT collfrancesc imputationbasedpopulationgeneticsanalysisofplasmodiumfalciparummalariaparasites
AT prestonmarkd imputationbasedpopulationgeneticsanalysisofplasmodiumfalciparummalariaparasites
AT ochollaharold imputationbasedpopulationgeneticsanalysisofplasmodiumfalciparummalariaparasites
AT fairhurstrickm imputationbasedpopulationgeneticsanalysisofplasmodiumfalciparummalariaparasites
AT clarktaaneg imputationbasedpopulationgeneticsanalysisofplasmodiumfalciparummalariaparasites