Cargando…

Employing phylogenetic tree shape statistics to resolve the underlying host population structure

BACKGROUND: Host population structure is a key determinant of pathogen and infectious disease transmission patterns. Pathogen phylogenetic trees are useful tools to reveal the population structure underlying an epidemic. Determining whether a population is structured or not is useful in informing th...

Descripción completa

Detalles Bibliográficos
Autores principales: Kayondo, Hassan W., Ssekagiri, Alfred, Nabakooza, Grace, Bbosa, Nicholas, Ssemwanga, Deogratius, Kaleebu, Pontiano, Mwalili, Samuel, Mango, John M., Leigh Brown, Andrew J., Saenz, Roberto A., Galiwango, Ronald, Kitayimbwa, John M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8579572/
https://www.ncbi.nlm.nih.gov/pubmed/34758743
http://dx.doi.org/10.1186/s12859-021-04465-1
_version_ 1784596453386092544
author Kayondo, Hassan W.
Ssekagiri, Alfred
Nabakooza, Grace
Bbosa, Nicholas
Ssemwanga, Deogratius
Kaleebu, Pontiano
Mwalili, Samuel
Mango, John M.
Leigh Brown, Andrew J.
Saenz, Roberto A.
Galiwango, Ronald
Kitayimbwa, John M.
author_facet Kayondo, Hassan W.
Ssekagiri, Alfred
Nabakooza, Grace
Bbosa, Nicholas
Ssemwanga, Deogratius
Kaleebu, Pontiano
Mwalili, Samuel
Mango, John M.
Leigh Brown, Andrew J.
Saenz, Roberto A.
Galiwango, Ronald
Kitayimbwa, John M.
author_sort Kayondo, Hassan W.
collection PubMed
description BACKGROUND: Host population structure is a key determinant of pathogen and infectious disease transmission patterns. Pathogen phylogenetic trees are useful tools to reveal the population structure underlying an epidemic. Determining whether a population is structured or not is useful in informing the type of phylogenetic methods to be used in a given study. We employ tree statistics derived from phylogenetic trees and machine learning classification techniques to reveal an underlying population structure. RESULTS: In this paper, we simulate phylogenetic trees from both structured and non-structured host populations. We compute eight statistics for the simulated trees, which are: the number of cherries; Sackin, Colless and total cophenetic indices; ladder length; maximum depth; maximum width, and width-to-depth ratio. Based on the estimated tree statistics, we classify the simulated trees as from either a non-structured or a structured population using the decision tree (DT), K-nearest neighbor (KNN) and support vector machine (SVM). We incorporate the basic reproductive number ([Formula: see text] ) in our tree simulation procedure. Sensitivity analysis is done to investigate whether the classifiers are robust to different choice of model parameters and to size of trees. Cross-validated results for area under the curve (AUC) for receiver operating characteristic (ROC) curves yield mean values of over 0.9 for most of the classification models. CONCLUSIONS: Our classification procedure distinguishes well between trees from structured and non-structured populations using the classifiers, the two-sample Kolmogorov-Smirnov, Cucconi and Podgor-Gastwirth tests and the box plots. SVM models were more robust to changes in model parameters and tree size compared to KNN and DT classifiers. Our classification procedure was applied to real -world data and the structured population was revealed with high accuracy of [Formula: see text] using SVM-polynomial classifier.
format Online
Article
Text
id pubmed-8579572
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-85795722021-11-10 Employing phylogenetic tree shape statistics to resolve the underlying host population structure Kayondo, Hassan W. Ssekagiri, Alfred Nabakooza, Grace Bbosa, Nicholas Ssemwanga, Deogratius Kaleebu, Pontiano Mwalili, Samuel Mango, John M. Leigh Brown, Andrew J. Saenz, Roberto A. Galiwango, Ronald Kitayimbwa, John M. BMC Bioinformatics Research Article BACKGROUND: Host population structure is a key determinant of pathogen and infectious disease transmission patterns. Pathogen phylogenetic trees are useful tools to reveal the population structure underlying an epidemic. Determining whether a population is structured or not is useful in informing the type of phylogenetic methods to be used in a given study. We employ tree statistics derived from phylogenetic trees and machine learning classification techniques to reveal an underlying population structure. RESULTS: In this paper, we simulate phylogenetic trees from both structured and non-structured host populations. We compute eight statistics for the simulated trees, which are: the number of cherries; Sackin, Colless and total cophenetic indices; ladder length; maximum depth; maximum width, and width-to-depth ratio. Based on the estimated tree statistics, we classify the simulated trees as from either a non-structured or a structured population using the decision tree (DT), K-nearest neighbor (KNN) and support vector machine (SVM). We incorporate the basic reproductive number ([Formula: see text] ) in our tree simulation procedure. Sensitivity analysis is done to investigate whether the classifiers are robust to different choice of model parameters and to size of trees. Cross-validated results for area under the curve (AUC) for receiver operating characteristic (ROC) curves yield mean values of over 0.9 for most of the classification models. CONCLUSIONS: Our classification procedure distinguishes well between trees from structured and non-structured populations using the classifiers, the two-sample Kolmogorov-Smirnov, Cucconi and Podgor-Gastwirth tests and the box plots. SVM models were more robust to changes in model parameters and tree size compared to KNN and DT classifiers. Our classification procedure was applied to real -world data and the structured population was revealed with high accuracy of [Formula: see text] using SVM-polynomial classifier. BioMed Central 2021-11-10 /pmc/articles/PMC8579572/ /pubmed/34758743 http://dx.doi.org/10.1186/s12859-021-04465-1 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Kayondo, Hassan W.
Ssekagiri, Alfred
Nabakooza, Grace
Bbosa, Nicholas
Ssemwanga, Deogratius
Kaleebu, Pontiano
Mwalili, Samuel
Mango, John M.
Leigh Brown, Andrew J.
Saenz, Roberto A.
Galiwango, Ronald
Kitayimbwa, John M.
Employing phylogenetic tree shape statistics to resolve the underlying host population structure
title Employing phylogenetic tree shape statistics to resolve the underlying host population structure
title_full Employing phylogenetic tree shape statistics to resolve the underlying host population structure
title_fullStr Employing phylogenetic tree shape statistics to resolve the underlying host population structure
title_full_unstemmed Employing phylogenetic tree shape statistics to resolve the underlying host population structure
title_short Employing phylogenetic tree shape statistics to resolve the underlying host population structure
title_sort employing phylogenetic tree shape statistics to resolve the underlying host population structure
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8579572/
https://www.ncbi.nlm.nih.gov/pubmed/34758743
http://dx.doi.org/10.1186/s12859-021-04465-1
work_keys_str_mv AT kayondohassanw employingphylogenetictreeshapestatisticstoresolvetheunderlyinghostpopulationstructure
AT ssekagirialfred employingphylogenetictreeshapestatisticstoresolvetheunderlyinghostpopulationstructure
AT nabakoozagrace employingphylogenetictreeshapestatisticstoresolvetheunderlyinghostpopulationstructure
AT bbosanicholas employingphylogenetictreeshapestatisticstoresolvetheunderlyinghostpopulationstructure
AT ssemwangadeogratius employingphylogenetictreeshapestatisticstoresolvetheunderlyinghostpopulationstructure
AT kaleebupontiano employingphylogenetictreeshapestatisticstoresolvetheunderlyinghostpopulationstructure
AT mwalilisamuel employingphylogenetictreeshapestatisticstoresolvetheunderlyinghostpopulationstructure
AT mangojohnm employingphylogenetictreeshapestatisticstoresolvetheunderlyinghostpopulationstructure
AT leighbrownandrewj employingphylogenetictreeshapestatisticstoresolvetheunderlyinghostpopulationstructure
AT saenzrobertoa employingphylogenetictreeshapestatisticstoresolvetheunderlyinghostpopulationstructure
AT galiwangoronald employingphylogenetictreeshapestatisticstoresolvetheunderlyinghostpopulationstructure
AT kitayimbwajohnm employingphylogenetictreeshapestatisticstoresolvetheunderlyinghostpopulationstructure