Cargando…

A Straightforward HPV16 Lineage Classification Based on Machine Learning

Human Papillomavirus (HPV) is the causal agent of 5% of cancers worldwide and the main cause of cervical cancer and it is also associated with a significant percentage of oropharyngeal and anogenital cancers. More than 60% of cervical cancers are caused by HPV16 genotype, which has been classified i...

Descripción completa

Detalles Bibliográficos
Autores principales: Asensio-Puig, Laura, Alemany, Laia, Pavón, Miquel Angel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9260188/
https://www.ncbi.nlm.nih.gov/pubmed/35814487
http://dx.doi.org/10.3389/frai.2022.851841
_version_ 1784741966972452864
author Asensio-Puig, Laura
Alemany, Laia
Pavón, Miquel Angel
author_facet Asensio-Puig, Laura
Alemany, Laia
Pavón, Miquel Angel
author_sort Asensio-Puig, Laura
collection PubMed
description Human Papillomavirus (HPV) is the causal agent of 5% of cancers worldwide and the main cause of cervical cancer and it is also associated with a significant percentage of oropharyngeal and anogenital cancers. More than 60% of cervical cancers are caused by HPV16 genotype, which has been classified into lineages (A, B, C, and D). Lineages are related to the progression of cervical cancer and the current method to assess lineages is by building a Maximum Likelihood Tree (MLT); which is slow, it cannot assess poor sequenced samples, and annotation is done manually. In this study, we have developed a new model to assess HPV16 lineage using machine learning tools. A total of 645 HPV16 genomes were analyzed using Genome-Wide Association Study (GWAS), which identified 56 lineage-specific Single Nucleotide Polymorphisms (SNPs). From the SNPs found, training-test models were constructed using different algorithms such as Random Forest (RF), Support Vector Machine (SVM), and K-nearest neighbor (KNN). A distinct set of HPV16 sequences (n = 1,028), whose lineage was previously determined by MLT, was used for validation. The RF-based model allowed a precise assignment of HPV16 lineage, showing an accuracy of 99.5% in the known lineage samples. Moreover, the RF model could assess lineage to 273 samples that MLT could not determine. In terms of computer consuming time, the RF-based model was almost 40 times faster than MLT. Having a fast and efficient method for assigning HPV16 lineages, could facilitate the implementation of lineage classification as a triage or prognostic marker in the clinical setting.
format Online
Article
Text
id pubmed-9260188
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-92601882022-07-08 A Straightforward HPV16 Lineage Classification Based on Machine Learning Asensio-Puig, Laura Alemany, Laia Pavón, Miquel Angel Front Artif Intell Artificial Intelligence Human Papillomavirus (HPV) is the causal agent of 5% of cancers worldwide and the main cause of cervical cancer and it is also associated with a significant percentage of oropharyngeal and anogenital cancers. More than 60% of cervical cancers are caused by HPV16 genotype, which has been classified into lineages (A, B, C, and D). Lineages are related to the progression of cervical cancer and the current method to assess lineages is by building a Maximum Likelihood Tree (MLT); which is slow, it cannot assess poor sequenced samples, and annotation is done manually. In this study, we have developed a new model to assess HPV16 lineage using machine learning tools. A total of 645 HPV16 genomes were analyzed using Genome-Wide Association Study (GWAS), which identified 56 lineage-specific Single Nucleotide Polymorphisms (SNPs). From the SNPs found, training-test models were constructed using different algorithms such as Random Forest (RF), Support Vector Machine (SVM), and K-nearest neighbor (KNN). A distinct set of HPV16 sequences (n = 1,028), whose lineage was previously determined by MLT, was used for validation. The RF-based model allowed a precise assignment of HPV16 lineage, showing an accuracy of 99.5% in the known lineage samples. Moreover, the RF model could assess lineage to 273 samples that MLT could not determine. In terms of computer consuming time, the RF-based model was almost 40 times faster than MLT. Having a fast and efficient method for assigning HPV16 lineages, could facilitate the implementation of lineage classification as a triage or prognostic marker in the clinical setting. Frontiers Media S.A. 2022-06-23 /pmc/articles/PMC9260188/ /pubmed/35814487 http://dx.doi.org/10.3389/frai.2022.851841 Text en Copyright © 2022 Asensio-Puig, Alemany and Pavón. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Artificial Intelligence
Asensio-Puig, Laura
Alemany, Laia
Pavón, Miquel Angel
A Straightforward HPV16 Lineage Classification Based on Machine Learning
title A Straightforward HPV16 Lineage Classification Based on Machine Learning
title_full A Straightforward HPV16 Lineage Classification Based on Machine Learning
title_fullStr A Straightforward HPV16 Lineage Classification Based on Machine Learning
title_full_unstemmed A Straightforward HPV16 Lineage Classification Based on Machine Learning
title_short A Straightforward HPV16 Lineage Classification Based on Machine Learning
title_sort straightforward hpv16 lineage classification based on machine learning
topic Artificial Intelligence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9260188/
https://www.ncbi.nlm.nih.gov/pubmed/35814487
http://dx.doi.org/10.3389/frai.2022.851841
work_keys_str_mv AT asensiopuiglaura astraightforwardhpv16lineageclassificationbasedonmachinelearning
AT alemanylaia astraightforwardhpv16lineageclassificationbasedonmachinelearning
AT pavonmiquelangel astraightforwardhpv16lineageclassificationbasedonmachinelearning
AT asensiopuiglaura straightforwardhpv16lineageclassificationbasedonmachinelearning
AT alemanylaia straightforwardhpv16lineageclassificationbasedonmachinelearning
AT pavonmiquelangel straightforwardhpv16lineageclassificationbasedonmachinelearning