Cargando…
Geographical classification of malaria parasites through applying machine learning to whole genome sequence data
Malaria, caused by Plasmodium parasites, is a major global health challenge. Whole genome sequencing (WGS) of Plasmodium falciparum and Plasmodium vivax genomes is providing insights into parasite genetic diversity, transmission patterns, and can inform decision making for clinical and surveillance...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9729610/ https://www.ncbi.nlm.nih.gov/pubmed/36476815 http://dx.doi.org/10.1038/s41598-022-25568-6 |
_version_ | 1784845508029710336 |
---|---|
author | Deelder, Wouter Manko, Emilia Phelan, Jody E. Campino, Susana Palla, Luigi Clark, Taane G. |
author_facet | Deelder, Wouter Manko, Emilia Phelan, Jody E. Campino, Susana Palla, Luigi Clark, Taane G. |
author_sort | Deelder, Wouter |
collection | PubMed |
description | Malaria, caused by Plasmodium parasites, is a major global health challenge. Whole genome sequencing (WGS) of Plasmodium falciparum and Plasmodium vivax genomes is providing insights into parasite genetic diversity, transmission patterns, and can inform decision making for clinical and surveillance purposes. Advances in sequencing technologies are helping to generate timely and big genomic datasets, with the prospect of applying Artificial Intelligence analytical techniques (e.g., machine learning) to support programmatic malaria control and elimination. Here, we assess the potential of applying deep learning convolutional neural network approaches to predict the geographic origin of infections (continents, countries, GPS locations) using WGS data of P. falciparum (n = 5957; 27 countries) and P. vivax (n = 659; 13 countries) isolates. Using identified high-quality genome-wide single nucleotide polymorphisms (SNPs) (P. falciparum: 750 k, P. vivax: 588 k), an analysis of population structure and ancestry revealed clustering at the country-level. When predicting locations for both species, classification (compared to regression) methods had the lowest distance errors, and > 90% accuracy at a country level. Our work demonstrates the utility of machine learning approaches for geo-classification of malaria parasites. With timelier WGS data generation across more malaria-affected regions, the performance of machine learning approaches for geo-classification will improve, thereby supporting disease control activities. |
format | Online Article Text |
id | pubmed-9729610 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-97296102022-12-09 Geographical classification of malaria parasites through applying machine learning to whole genome sequence data Deelder, Wouter Manko, Emilia Phelan, Jody E. Campino, Susana Palla, Luigi Clark, Taane G. Sci Rep Article Malaria, caused by Plasmodium parasites, is a major global health challenge. Whole genome sequencing (WGS) of Plasmodium falciparum and Plasmodium vivax genomes is providing insights into parasite genetic diversity, transmission patterns, and can inform decision making for clinical and surveillance purposes. Advances in sequencing technologies are helping to generate timely and big genomic datasets, with the prospect of applying Artificial Intelligence analytical techniques (e.g., machine learning) to support programmatic malaria control and elimination. Here, we assess the potential of applying deep learning convolutional neural network approaches to predict the geographic origin of infections (continents, countries, GPS locations) using WGS data of P. falciparum (n = 5957; 27 countries) and P. vivax (n = 659; 13 countries) isolates. Using identified high-quality genome-wide single nucleotide polymorphisms (SNPs) (P. falciparum: 750 k, P. vivax: 588 k), an analysis of population structure and ancestry revealed clustering at the country-level. When predicting locations for both species, classification (compared to regression) methods had the lowest distance errors, and > 90% accuracy at a country level. Our work demonstrates the utility of machine learning approaches for geo-classification of malaria parasites. With timelier WGS data generation across more malaria-affected regions, the performance of machine learning approaches for geo-classification will improve, thereby supporting disease control activities. Nature Publishing Group UK 2022-12-07 /pmc/articles/PMC9729610/ /pubmed/36476815 http://dx.doi.org/10.1038/s41598-022-25568-6 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Deelder, Wouter Manko, Emilia Phelan, Jody E. Campino, Susana Palla, Luigi Clark, Taane G. Geographical classification of malaria parasites through applying machine learning to whole genome sequence data |
title | Geographical classification of malaria parasites through applying machine learning to whole genome sequence data |
title_full | Geographical classification of malaria parasites through applying machine learning to whole genome sequence data |
title_fullStr | Geographical classification of malaria parasites through applying machine learning to whole genome sequence data |
title_full_unstemmed | Geographical classification of malaria parasites through applying machine learning to whole genome sequence data |
title_short | Geographical classification of malaria parasites through applying machine learning to whole genome sequence data |
title_sort | geographical classification of malaria parasites through applying machine learning to whole genome sequence data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9729610/ https://www.ncbi.nlm.nih.gov/pubmed/36476815 http://dx.doi.org/10.1038/s41598-022-25568-6 |
work_keys_str_mv | AT deelderwouter geographicalclassificationofmalariaparasitesthroughapplyingmachinelearningtowholegenomesequencedata AT mankoemilia geographicalclassificationofmalariaparasitesthroughapplyingmachinelearningtowholegenomesequencedata AT phelanjodye geographicalclassificationofmalariaparasitesthroughapplyingmachinelearningtowholegenomesequencedata AT campinosusana geographicalclassificationofmalariaparasitesthroughapplyingmachinelearningtowholegenomesequencedata AT pallaluigi geographicalclassificationofmalariaparasitesthroughapplyingmachinelearningtowholegenomesequencedata AT clarktaaneg geographicalclassificationofmalariaparasitesthroughapplyingmachinelearningtowholegenomesequencedata |