Cargando…
Imputation-Based Population Genetics Analysis of Plasmodium falciparum Malaria Parasites
Whole-genome sequencing technologies are being increasingly applied to Plasmodium falciparum clinical isolates to identify genetic determinants of malaria pathogenesis. However, genome-wide discovery methods, such as haplotype scans for signatures of natural selection, are hindered by missing genoty...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4415759/ https://www.ncbi.nlm.nih.gov/pubmed/25928499 http://dx.doi.org/10.1371/journal.pgen.1005131 |
_version_ | 1782369119506530304 |
---|---|
author | Samad, Hanif Coll, Francesc Preston, Mark D. Ocholla, Harold Fairhurst, Rick M. Clark, Taane G. |
author_facet | Samad, Hanif Coll, Francesc Preston, Mark D. Ocholla, Harold Fairhurst, Rick M. Clark, Taane G. |
author_sort | Samad, Hanif |
collection | PubMed |
description | Whole-genome sequencing technologies are being increasingly applied to Plasmodium falciparum clinical isolates to identify genetic determinants of malaria pathogenesis. However, genome-wide discovery methods, such as haplotype scans for signatures of natural selection, are hindered by missing genotypes in sequence data. Poor correlation between single nucleotide polymorphisms (SNPs) in the P. falciparum genome complicates efforts to apply established missing-genotype imputation methods that leverage off patterns of linkage disequilibrium (LD). The accuracy of state-of-the-art, LD-based imputation methods (IMPUTE, Beagle) was assessed by measuring allelic r(2) for 459 P. falciparum samples from malaria patients in 4 countries: Thailand, Cambodia, Gambia, and Malawi. In restricting our analysis to 86k high-quality SNPs across the populations, we found that the complete-case analysis was restricted to 21k SNPs (24.5%), despite no single SNP having more than 10% missing genotypes. The accuracy of Beagle in filling in missing genotypes was consistently high across all populations (allelic r(2), 0.87-0.96), but the performance of IMPUTE was mixed (allelic r(2), 0.34-0.99) depending on reference haplotypes and population. Positive selection analysis using Beagle-imputed haplotypes identified loci involved in resistance to chloroquine (crt) in Thailand, Cambodia, and Gambia, sulfadoxine-pyrimethamine (dhfr, dhps) in Cambodia, and artemisinin (kelch13) in Cambodia. Tajima’s D-based analysis identified genes under balancing selection that encode well-characterized vaccine candidates: apical merozoite antigen 1 (ama1) and merozoite surface protein 1 (msp1). In contrast, the complete-case analysis failed to identify any well-validated drug resistance or candidate vaccine loci, except kelch13. In a setting of low LD and modest levels of missing genotypes, using Beagle to impute P. falciparum genotypes is a viable strategy for conducting accurate large-scale population genetics and association analyses, and supporting global surveillance for drug resistance markers and candidate vaccine antigens. |
format | Online Article Text |
id | pubmed-4415759 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-44157592015-05-07 Imputation-Based Population Genetics Analysis of Plasmodium falciparum Malaria Parasites Samad, Hanif Coll, Francesc Preston, Mark D. Ocholla, Harold Fairhurst, Rick M. Clark, Taane G. PLoS Genet Research Article Whole-genome sequencing technologies are being increasingly applied to Plasmodium falciparum clinical isolates to identify genetic determinants of malaria pathogenesis. However, genome-wide discovery methods, such as haplotype scans for signatures of natural selection, are hindered by missing genotypes in sequence data. Poor correlation between single nucleotide polymorphisms (SNPs) in the P. falciparum genome complicates efforts to apply established missing-genotype imputation methods that leverage off patterns of linkage disequilibrium (LD). The accuracy of state-of-the-art, LD-based imputation methods (IMPUTE, Beagle) was assessed by measuring allelic r(2) for 459 P. falciparum samples from malaria patients in 4 countries: Thailand, Cambodia, Gambia, and Malawi. In restricting our analysis to 86k high-quality SNPs across the populations, we found that the complete-case analysis was restricted to 21k SNPs (24.5%), despite no single SNP having more than 10% missing genotypes. The accuracy of Beagle in filling in missing genotypes was consistently high across all populations (allelic r(2), 0.87-0.96), but the performance of IMPUTE was mixed (allelic r(2), 0.34-0.99) depending on reference haplotypes and population. Positive selection analysis using Beagle-imputed haplotypes identified loci involved in resistance to chloroquine (crt) in Thailand, Cambodia, and Gambia, sulfadoxine-pyrimethamine (dhfr, dhps) in Cambodia, and artemisinin (kelch13) in Cambodia. Tajima’s D-based analysis identified genes under balancing selection that encode well-characterized vaccine candidates: apical merozoite antigen 1 (ama1) and merozoite surface protein 1 (msp1). In contrast, the complete-case analysis failed to identify any well-validated drug resistance or candidate vaccine loci, except kelch13. In a setting of low LD and modest levels of missing genotypes, using Beagle to impute P. falciparum genotypes is a viable strategy for conducting accurate large-scale population genetics and association analyses, and supporting global surveillance for drug resistance markers and candidate vaccine antigens. Public Library of Science 2015-04-30 /pmc/articles/PMC4415759/ /pubmed/25928499 http://dx.doi.org/10.1371/journal.pgen.1005131 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration, which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. |
spellingShingle | Research Article Samad, Hanif Coll, Francesc Preston, Mark D. Ocholla, Harold Fairhurst, Rick M. Clark, Taane G. Imputation-Based Population Genetics Analysis of Plasmodium falciparum Malaria Parasites |
title | Imputation-Based Population Genetics Analysis of Plasmodium falciparum Malaria Parasites |
title_full | Imputation-Based Population Genetics Analysis of Plasmodium falciparum Malaria Parasites |
title_fullStr | Imputation-Based Population Genetics Analysis of Plasmodium falciparum Malaria Parasites |
title_full_unstemmed | Imputation-Based Population Genetics Analysis of Plasmodium falciparum Malaria Parasites |
title_short | Imputation-Based Population Genetics Analysis of Plasmodium falciparum Malaria Parasites |
title_sort | imputation-based population genetics analysis of plasmodium falciparum malaria parasites |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4415759/ https://www.ncbi.nlm.nih.gov/pubmed/25928499 http://dx.doi.org/10.1371/journal.pgen.1005131 |
work_keys_str_mv | AT samadhanif imputationbasedpopulationgeneticsanalysisofplasmodiumfalciparummalariaparasites AT collfrancesc imputationbasedpopulationgeneticsanalysisofplasmodiumfalciparummalariaparasites AT prestonmarkd imputationbasedpopulationgeneticsanalysisofplasmodiumfalciparummalariaparasites AT ochollaharold imputationbasedpopulationgeneticsanalysisofplasmodiumfalciparummalariaparasites AT fairhurstrickm imputationbasedpopulationgeneticsanalysisofplasmodiumfalciparummalariaparasites AT clarktaaneg imputationbasedpopulationgeneticsanalysisofplasmodiumfalciparummalariaparasites |