Cargando…

Geographical classification of malaria parasites through applying machine learning to whole genome sequence data

Malaria, caused by Plasmodium parasites, is a major global health challenge. Whole genome sequencing (WGS) of Plasmodium falciparum and Plasmodium vivax genomes is providing insights into parasite genetic diversity, transmission patterns, and can inform decision making for clinical and surveillance...

Descripción completa

Detalles Bibliográficos
Autores principales: Deelder, Wouter, Manko, Emilia, Phelan, Jody E., Campino, Susana, Palla, Luigi, Clark, Taane G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9729610/
https://www.ncbi.nlm.nih.gov/pubmed/36476815
http://dx.doi.org/10.1038/s41598-022-25568-6
_version_ 1784845508029710336
author Deelder, Wouter
Manko, Emilia
Phelan, Jody E.
Campino, Susana
Palla, Luigi
Clark, Taane G.
author_facet Deelder, Wouter
Manko, Emilia
Phelan, Jody E.
Campino, Susana
Palla, Luigi
Clark, Taane G.
author_sort Deelder, Wouter
collection PubMed
description Malaria, caused by Plasmodium parasites, is a major global health challenge. Whole genome sequencing (WGS) of Plasmodium falciparum and Plasmodium vivax genomes is providing insights into parasite genetic diversity, transmission patterns, and can inform decision making for clinical and surveillance purposes. Advances in sequencing technologies are helping to generate timely and big genomic datasets, with the prospect of applying Artificial Intelligence analytical techniques (e.g., machine learning) to support programmatic malaria control and elimination. Here, we assess the potential of applying deep learning convolutional neural network approaches to predict the geographic origin of infections (continents, countries, GPS locations) using WGS data of P. falciparum (n = 5957; 27 countries) and P. vivax (n = 659; 13 countries) isolates. Using identified high-quality genome-wide single nucleotide polymorphisms (SNPs) (P. falciparum: 750 k, P. vivax: 588 k), an analysis of population structure and ancestry revealed clustering at the country-level. When predicting locations for both species, classification (compared to regression) methods had the lowest distance errors, and > 90% accuracy at a country level. Our work demonstrates the utility of machine learning approaches for geo-classification of malaria parasites. With timelier WGS data generation across more malaria-affected regions, the performance of machine learning approaches for geo-classification will improve, thereby supporting disease control activities.
format Online
Article
Text
id pubmed-9729610
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-97296102022-12-09 Geographical classification of malaria parasites through applying machine learning to whole genome sequence data Deelder, Wouter Manko, Emilia Phelan, Jody E. Campino, Susana Palla, Luigi Clark, Taane G. Sci Rep Article Malaria, caused by Plasmodium parasites, is a major global health challenge. Whole genome sequencing (WGS) of Plasmodium falciparum and Plasmodium vivax genomes is providing insights into parasite genetic diversity, transmission patterns, and can inform decision making for clinical and surveillance purposes. Advances in sequencing technologies are helping to generate timely and big genomic datasets, with the prospect of applying Artificial Intelligence analytical techniques (e.g., machine learning) to support programmatic malaria control and elimination. Here, we assess the potential of applying deep learning convolutional neural network approaches to predict the geographic origin of infections (continents, countries, GPS locations) using WGS data of P. falciparum (n = 5957; 27 countries) and P. vivax (n = 659; 13 countries) isolates. Using identified high-quality genome-wide single nucleotide polymorphisms (SNPs) (P. falciparum: 750 k, P. vivax: 588 k), an analysis of population structure and ancestry revealed clustering at the country-level. When predicting locations for both species, classification (compared to regression) methods had the lowest distance errors, and > 90% accuracy at a country level. Our work demonstrates the utility of machine learning approaches for geo-classification of malaria parasites. With timelier WGS data generation across more malaria-affected regions, the performance of machine learning approaches for geo-classification will improve, thereby supporting disease control activities. Nature Publishing Group UK 2022-12-07 /pmc/articles/PMC9729610/ /pubmed/36476815 http://dx.doi.org/10.1038/s41598-022-25568-6 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Deelder, Wouter
Manko, Emilia
Phelan, Jody E.
Campino, Susana
Palla, Luigi
Clark, Taane G.
Geographical classification of malaria parasites through applying machine learning to whole genome sequence data
title Geographical classification of malaria parasites through applying machine learning to whole genome sequence data
title_full Geographical classification of malaria parasites through applying machine learning to whole genome sequence data
title_fullStr Geographical classification of malaria parasites through applying machine learning to whole genome sequence data
title_full_unstemmed Geographical classification of malaria parasites through applying machine learning to whole genome sequence data
title_short Geographical classification of malaria parasites through applying machine learning to whole genome sequence data
title_sort geographical classification of malaria parasites through applying machine learning to whole genome sequence data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9729610/
https://www.ncbi.nlm.nih.gov/pubmed/36476815
http://dx.doi.org/10.1038/s41598-022-25568-6
work_keys_str_mv AT deelderwouter geographicalclassificationofmalariaparasitesthroughapplyingmachinelearningtowholegenomesequencedata
AT mankoemilia geographicalclassificationofmalariaparasitesthroughapplyingmachinelearningtowholegenomesequencedata
AT phelanjodye geographicalclassificationofmalariaparasitesthroughapplyingmachinelearningtowholegenomesequencedata
AT campinosusana geographicalclassificationofmalariaparasitesthroughapplyingmachinelearningtowholegenomesequencedata
AT pallaluigi geographicalclassificationofmalariaparasitesthroughapplyingmachinelearningtowholegenomesequencedata
AT clarktaaneg geographicalclassificationofmalariaparasitesthroughapplyingmachinelearningtowholegenomesequencedata