Cargando…

Approaches to estimating inbreeding coefficients in clinical isolates of Plasmodium falciparum from genomic sequence data

BACKGROUND: The advent of whole-genome sequencing has generated increased interest in modelling the structure of strain mixture within clinical infections of Plasmodium falciparum The life cycle of the parasite implies that the mixture of multiple strains within an infected individual is related to...

Descripción completa

Detalles Bibliográficos
Autores principales: O’Brien, John D., Amenga-Etego, Lucas, Li, Ruiqi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5025560/
https://www.ncbi.nlm.nih.gov/pubmed/27634595
http://dx.doi.org/10.1186/s12936-016-1531-z
_version_ 1782453976621383680
author O’Brien, John D.
Amenga-Etego, Lucas
Li, Ruiqi
author_facet O’Brien, John D.
Amenga-Etego, Lucas
Li, Ruiqi
author_sort O’Brien, John D.
collection PubMed
description BACKGROUND: The advent of whole-genome sequencing has generated increased interest in modelling the structure of strain mixture within clinical infections of Plasmodium falciparum The life cycle of the parasite implies that the mixture of multiple strains within an infected individual is related to the out-crossing rate across populations, making methods for measuring this process in situ central to understanding the genetic epidemiology of the disease. RESULTS: This paper derives a set of new estimators for inferring inbreeding coefficients using whole genome sequence read count data from P. falciparum clinical samples, which provides resources to assess within-sample mixture that connect to extensive literatures in population genetics and conservation ecology. Features of the P. falciparum genome mean that standard methods for inbreeding coefficients and related F-statistics cannot be used directly. After reviewing an initial effort to estimate the inbreeding coefficient within clinical isolates of P. falciparum, several generalizations using both frequentist and Bayesian approaches are provided. A simpler, more intuitive frequentist estimator is shown to have nearly identical properties to the initial estimator both in simulation and in real data sets. The Bayesian approach connects these estimates to the Balding–Nichols model, a mainstay within genetic epidemiology, and a possible framework for more complex modelling. A simulation study shows strong performance for all estimators with as few as ten variants. Application to samples from the PF3K data set indicate significant across-country variation in within-sample mixture. Finally, a comparison with results from a recent mixture model for within-sample strain mixture show that inbreeding coefficients provide a strong proxy for these more complex models. CONCLUSIONS: This paper provides a set of methods for estimating inbreeding coefficients within P. falciparum samples from whole-genome sequence data, supported by simulation studies and empirical examples. It includes a substantially simple estimator with similar statistical properties to the estimator in current use. These methods will also be applicable to other species with similar life-cycles. Implementations of the methods described are available in an open-source R package pfmix. Estimates for the PF3K public data release are provide as part of this resource.
format Online
Article
Text
id pubmed-5025560
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-50255602016-09-20 Approaches to estimating inbreeding coefficients in clinical isolates of Plasmodium falciparum from genomic sequence data O’Brien, John D. Amenga-Etego, Lucas Li, Ruiqi Malar J Research BACKGROUND: The advent of whole-genome sequencing has generated increased interest in modelling the structure of strain mixture within clinical infections of Plasmodium falciparum The life cycle of the parasite implies that the mixture of multiple strains within an infected individual is related to the out-crossing rate across populations, making methods for measuring this process in situ central to understanding the genetic epidemiology of the disease. RESULTS: This paper derives a set of new estimators for inferring inbreeding coefficients using whole genome sequence read count data from P. falciparum clinical samples, which provides resources to assess within-sample mixture that connect to extensive literatures in population genetics and conservation ecology. Features of the P. falciparum genome mean that standard methods for inbreeding coefficients and related F-statistics cannot be used directly. After reviewing an initial effort to estimate the inbreeding coefficient within clinical isolates of P. falciparum, several generalizations using both frequentist and Bayesian approaches are provided. A simpler, more intuitive frequentist estimator is shown to have nearly identical properties to the initial estimator both in simulation and in real data sets. The Bayesian approach connects these estimates to the Balding–Nichols model, a mainstay within genetic epidemiology, and a possible framework for more complex modelling. A simulation study shows strong performance for all estimators with as few as ten variants. Application to samples from the PF3K data set indicate significant across-country variation in within-sample mixture. Finally, a comparison with results from a recent mixture model for within-sample strain mixture show that inbreeding coefficients provide a strong proxy for these more complex models. CONCLUSIONS: This paper provides a set of methods for estimating inbreeding coefficients within P. falciparum samples from whole-genome sequence data, supported by simulation studies and empirical examples. It includes a substantially simple estimator with similar statistical properties to the estimator in current use. These methods will also be applicable to other species with similar life-cycles. Implementations of the methods described are available in an open-source R package pfmix. Estimates for the PF3K public data release are provide as part of this resource. BioMed Central 2016-09-15 /pmc/articles/PMC5025560/ /pubmed/27634595 http://dx.doi.org/10.1186/s12936-016-1531-z Text en © The Author(s) 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
O’Brien, John D.
Amenga-Etego, Lucas
Li, Ruiqi
Approaches to estimating inbreeding coefficients in clinical isolates of Plasmodium falciparum from genomic sequence data
title Approaches to estimating inbreeding coefficients in clinical isolates of Plasmodium falciparum from genomic sequence data
title_full Approaches to estimating inbreeding coefficients in clinical isolates of Plasmodium falciparum from genomic sequence data
title_fullStr Approaches to estimating inbreeding coefficients in clinical isolates of Plasmodium falciparum from genomic sequence data
title_full_unstemmed Approaches to estimating inbreeding coefficients in clinical isolates of Plasmodium falciparum from genomic sequence data
title_short Approaches to estimating inbreeding coefficients in clinical isolates of Plasmodium falciparum from genomic sequence data
title_sort approaches to estimating inbreeding coefficients in clinical isolates of plasmodium falciparum from genomic sequence data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5025560/
https://www.ncbi.nlm.nih.gov/pubmed/27634595
http://dx.doi.org/10.1186/s12936-016-1531-z
work_keys_str_mv AT obrienjohnd approachestoestimatinginbreedingcoefficientsinclinicalisolatesofplasmodiumfalciparumfromgenomicsequencedata
AT amengaetegolucas approachestoestimatinginbreedingcoefficientsinclinicalisolatesofplasmodiumfalciparumfromgenomicsequencedata
AT liruiqi approachestoestimatinginbreedingcoefficientsinclinicalisolatesofplasmodiumfalciparumfromgenomicsequencedata