Cargando…

Mind your Ps: A probabilistic model to aid the interpretation of molecular epidemiology data

BACKGROUND: Assessing relatedness of pathogen sequences in clinical samples is a core goal in molecular epidemiology. Tools for Bayesian analysis of phylogeny, such as the BEAST software package, have been typically used in the analysis of sequence/time data in public health. However, they are compu...

Descripción completa

Detalles Bibliográficos
Autores principales: Penedos, Ana Raquel, Fernández-García, Aurora, Lazar, Mihaela, Ralh, Kajal, Williams, David, Brown, Kevin E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9006250/
https://www.ncbi.nlm.nih.gov/pubmed/35398788
http://dx.doi.org/10.1016/j.ebiom.2022.103989
_version_ 1784686626132197376
author Penedos, Ana Raquel
Fernández-García, Aurora
Lazar, Mihaela
Ralh, Kajal
Williams, David
Brown, Kevin E.
author_facet Penedos, Ana Raquel
Fernández-García, Aurora
Lazar, Mihaela
Ralh, Kajal
Williams, David
Brown, Kevin E.
author_sort Penedos, Ana Raquel
collection PubMed
description BACKGROUND: Assessing relatedness of pathogen sequences in clinical samples is a core goal in molecular epidemiology. Tools for Bayesian analysis of phylogeny, such as the BEAST software package, have been typically used in the analysis of sequence/time data in public health. However, they are computationally-, time-, and knowledge-intensive, demanding resources that many laboratories do not have available or cannot allocate frequently. METHODS: To evaluate a faster and simpler alternative method to support the routine interpretation of sequence data for epidemiology, we obtained sequences for two regions in the measles virus genome, N-450 and MF-NCR, from patient samples of genotypes B3, D4 and D8 taken between 2011 and 2017 in the UK and Romania. A mathematical model incorporating time, possible shared ancestry and the Poisson distribution describing the number of expected substitutions at a given time point was developed to exclude epidemiological relatedness between pairs of sequences. The model was validated against the commonly used Bayesian phylogenetic method using an independent dataset collected in 2017–19. FINDINGS: We demonstrate that our model, using time and sequence information to predict whether two samples may be related within a given time frame, minimises the risk of erroneous exclusion of relatedness. An easy-to-use implementation in the form of a guide and spreadsheet is provided for convenient application. INTERPRETATION: The proposed model only requires a previously calculated substitution rate for the locus and pathogen of interest. It allows for an informed but quick decision on the likelihood of relatedness between two samples within a time frame, without the need for phylogenetic reconstruction, thus facilitating rapid epidemiological interpretation of sequence data. FUNDING: This work was funded by the United Kingdom Health Security Agency (UKHSA). The World Health Organization European Regional Office funded Aurora Fernández-García and Mihaela Lazar training visits to UKHSA.
format Online
Article
Text
id pubmed-9006250
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-90062502022-04-14 Mind your Ps: A probabilistic model to aid the interpretation of molecular epidemiology data Penedos, Ana Raquel Fernández-García, Aurora Lazar, Mihaela Ralh, Kajal Williams, David Brown, Kevin E. EBioMedicine Articles BACKGROUND: Assessing relatedness of pathogen sequences in clinical samples is a core goal in molecular epidemiology. Tools for Bayesian analysis of phylogeny, such as the BEAST software package, have been typically used in the analysis of sequence/time data in public health. However, they are computationally-, time-, and knowledge-intensive, demanding resources that many laboratories do not have available or cannot allocate frequently. METHODS: To evaluate a faster and simpler alternative method to support the routine interpretation of sequence data for epidemiology, we obtained sequences for two regions in the measles virus genome, N-450 and MF-NCR, from patient samples of genotypes B3, D4 and D8 taken between 2011 and 2017 in the UK and Romania. A mathematical model incorporating time, possible shared ancestry and the Poisson distribution describing the number of expected substitutions at a given time point was developed to exclude epidemiological relatedness between pairs of sequences. The model was validated against the commonly used Bayesian phylogenetic method using an independent dataset collected in 2017–19. FINDINGS: We demonstrate that our model, using time and sequence information to predict whether two samples may be related within a given time frame, minimises the risk of erroneous exclusion of relatedness. An easy-to-use implementation in the form of a guide and spreadsheet is provided for convenient application. INTERPRETATION: The proposed model only requires a previously calculated substitution rate for the locus and pathogen of interest. It allows for an informed but quick decision on the likelihood of relatedness between two samples within a time frame, without the need for phylogenetic reconstruction, thus facilitating rapid epidemiological interpretation of sequence data. FUNDING: This work was funded by the United Kingdom Health Security Agency (UKHSA). The World Health Organization European Regional Office funded Aurora Fernández-García and Mihaela Lazar training visits to UKHSA. Elsevier 2022-04-07 /pmc/articles/PMC9006250/ /pubmed/35398788 http://dx.doi.org/10.1016/j.ebiom.2022.103989 Text en Crown Copyright © 2022 Published by Elsevier B.V. https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Articles
Penedos, Ana Raquel
Fernández-García, Aurora
Lazar, Mihaela
Ralh, Kajal
Williams, David
Brown, Kevin E.
Mind your Ps: A probabilistic model to aid the interpretation of molecular epidemiology data
title Mind your Ps: A probabilistic model to aid the interpretation of molecular epidemiology data
title_full Mind your Ps: A probabilistic model to aid the interpretation of molecular epidemiology data
title_fullStr Mind your Ps: A probabilistic model to aid the interpretation of molecular epidemiology data
title_full_unstemmed Mind your Ps: A probabilistic model to aid the interpretation of molecular epidemiology data
title_short Mind your Ps: A probabilistic model to aid the interpretation of molecular epidemiology data
title_sort mind your ps: a probabilistic model to aid the interpretation of molecular epidemiology data
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9006250/
https://www.ncbi.nlm.nih.gov/pubmed/35398788
http://dx.doi.org/10.1016/j.ebiom.2022.103989
work_keys_str_mv AT penedosanaraquel mindyourpsaprobabilisticmodeltoaidtheinterpretationofmolecularepidemiologydata
AT fernandezgarciaaurora mindyourpsaprobabilisticmodeltoaidtheinterpretationofmolecularepidemiologydata
AT lazarmihaela mindyourpsaprobabilisticmodeltoaidtheinterpretationofmolecularepidemiologydata
AT ralhkajal mindyourpsaprobabilisticmodeltoaidtheinterpretationofmolecularepidemiologydata
AT williamsdavid mindyourpsaprobabilisticmodeltoaidtheinterpretationofmolecularepidemiologydata
AT brownkevine mindyourpsaprobabilisticmodeltoaidtheinterpretationofmolecularepidemiologydata