Cargando…

An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes

For many disease-causing virus species, global diversity is clustered into a taxonomy of subtypes with clinical significance. In particular, the classification of infections among the subtypes of human immunodeficiency virus type 1 (HIV-1) is a routine component of clinical management, and there are...

Descripción completa

Detalles Bibliográficos
Autores principales: Solis-Reyes, Stephen, Avino, Mariano, Poon, Art, Kari, Lila
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6235296/
https://www.ncbi.nlm.nih.gov/pubmed/30427878
http://dx.doi.org/10.1371/journal.pone.0206409
_version_ 1783370851811328000
author Solis-Reyes, Stephen
Avino, Mariano
Poon, Art
Kari, Lila
author_facet Solis-Reyes, Stephen
Avino, Mariano
Poon, Art
Kari, Lila
author_sort Solis-Reyes, Stephen
collection PubMed
description For many disease-causing virus species, global diversity is clustered into a taxonomy of subtypes with clinical significance. In particular, the classification of infections among the subtypes of human immunodeficiency virus type 1 (HIV-1) is a routine component of clinical management, and there are now many classification algorithms available for this purpose. Although several of these algorithms are similar in accuracy and speed, the majority are proprietary and require laboratories to transmit HIV-1 sequence data over the network to remote servers. This potentially exposes sensitive patient data to unauthorized access, and makes it impossible to determine how classifications are made and to maintain the data provenance of clinical bioinformatic workflows. We propose an open-source supervised and alignment-free subtyping method (Kameris) that operates on k-mer frequencies in HIV-1 sequences. We performed a detailed study of the accuracy and performance of subtype classification in comparison to four state-of-the-art programs. Based on our testing data set of manually curated real-world HIV-1 sequences (n = 2, 784), Kameris obtained an overall accuracy of 97%, which matches or exceeds all other tested software, with a processing rate of over 1,500 sequences per second. Furthermore, our fully standalone general-purpose software provides key advantages in terms of data security and privacy, transparency and reproducibility. Finally, we show that our method is readily adaptable to subtype classification of other viruses including dengue, influenza A, and hepatitis B and C virus.
format Online
Article
Text
id pubmed-6235296
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-62352962018-12-01 An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes Solis-Reyes, Stephen Avino, Mariano Poon, Art Kari, Lila PLoS One Research Article For many disease-causing virus species, global diversity is clustered into a taxonomy of subtypes with clinical significance. In particular, the classification of infections among the subtypes of human immunodeficiency virus type 1 (HIV-1) is a routine component of clinical management, and there are now many classification algorithms available for this purpose. Although several of these algorithms are similar in accuracy and speed, the majority are proprietary and require laboratories to transmit HIV-1 sequence data over the network to remote servers. This potentially exposes sensitive patient data to unauthorized access, and makes it impossible to determine how classifications are made and to maintain the data provenance of clinical bioinformatic workflows. We propose an open-source supervised and alignment-free subtyping method (Kameris) that operates on k-mer frequencies in HIV-1 sequences. We performed a detailed study of the accuracy and performance of subtype classification in comparison to four state-of-the-art programs. Based on our testing data set of manually curated real-world HIV-1 sequences (n = 2, 784), Kameris obtained an overall accuracy of 97%, which matches or exceeds all other tested software, with a processing rate of over 1,500 sequences per second. Furthermore, our fully standalone general-purpose software provides key advantages in terms of data security and privacy, transparency and reproducibility. Finally, we show that our method is readily adaptable to subtype classification of other viruses including dengue, influenza A, and hepatitis B and C virus. Public Library of Science 2018-11-14 /pmc/articles/PMC6235296/ /pubmed/30427878 http://dx.doi.org/10.1371/journal.pone.0206409 Text en © 2018 Solis-Reyes et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Solis-Reyes, Stephen
Avino, Mariano
Poon, Art
Kari, Lila
An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes
title An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes
title_full An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes
title_fullStr An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes
title_full_unstemmed An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes
title_short An open-source k-mer based machine learning tool for fast and accurate subtyping of HIV-1 genomes
title_sort open-source k-mer based machine learning tool for fast and accurate subtyping of hiv-1 genomes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6235296/
https://www.ncbi.nlm.nih.gov/pubmed/30427878
http://dx.doi.org/10.1371/journal.pone.0206409
work_keys_str_mv AT solisreyesstephen anopensourcekmerbasedmachinelearningtoolforfastandaccuratesubtypingofhiv1genomes
AT avinomariano anopensourcekmerbasedmachinelearningtoolforfastandaccuratesubtypingofhiv1genomes
AT poonart anopensourcekmerbasedmachinelearningtoolforfastandaccuratesubtypingofhiv1genomes
AT karilila anopensourcekmerbasedmachinelearningtoolforfastandaccuratesubtypingofhiv1genomes
AT solisreyesstephen opensourcekmerbasedmachinelearningtoolforfastandaccuratesubtypingofhiv1genomes
AT avinomariano opensourcekmerbasedmachinelearningtoolforfastandaccuratesubtypingofhiv1genomes
AT poonart opensourcekmerbasedmachinelearningtoolforfastandaccuratesubtypingofhiv1genomes
AT karilila opensourcekmerbasedmachinelearningtoolforfastandaccuratesubtypingofhiv1genomes