Cargando…

Applications of Machine Learning for the Classification of Porcine Reproductive and Respiratory Syndrome Virus Sublineages Using Amino Acid Scores of ORF5 Gene

Porcine reproductive and respiratory syndrome is an infectious disease of pigs caused by PRRS virus (PRRSV). A modified live-attenuated vaccine has been widely used to control the spread of PRRSV and the classification of field strains is a key for a successful control and prevention. Restriction fr...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Jeonghoon, Lee, Kyuyoung, Rupasinghe, Ruwini, Rezaei, Shahbaz, Martínez-López, Beatriz, Liu, Xin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8345883/
https://www.ncbi.nlm.nih.gov/pubmed/34368274
http://dx.doi.org/10.3389/fvets.2021.683134
_version_ 1783734735490514944
author Kim, Jeonghoon
Lee, Kyuyoung
Rupasinghe, Ruwini
Rezaei, Shahbaz
Martínez-López, Beatriz
Liu, Xin
author_facet Kim, Jeonghoon
Lee, Kyuyoung
Rupasinghe, Ruwini
Rezaei, Shahbaz
Martínez-López, Beatriz
Liu, Xin
author_sort Kim, Jeonghoon
collection PubMed
description Porcine reproductive and respiratory syndrome is an infectious disease of pigs caused by PRRS virus (PRRSV). A modified live-attenuated vaccine has been widely used to control the spread of PRRSV and the classification of field strains is a key for a successful control and prevention. Restriction fragment length polymorphism targeting the Open reading frame 5 (ORF5) genes is widely used to classify PRRSV strains but showed unstable accuracy. Phylogenetic analysis is a powerful tool for PRRSV classification with consistent accuracy but it demands large computational power as the number of sequences gets increased. Our study aimed to apply four machine learning (ML) algorithms, random forest, k-nearest neighbor, support vector machine and multilayer perceptron, to classify field PRRSV strains into four clades using amino acid scores based on ORF5 gene sequence. Our study used amino acid sequences of ORF5 gene in 1931 field PRRSV strains collected in the US from 2012 to 2020. Phylogenetic analysis was used to labels field PRRSV strains into one of four clades: Lineage 5 or three clades in Linage 1. We measured accuracy and time consumption of classification using four ML approaches by different size of gene sequences. We found that all four ML algorithms classify a large number of field strains in a very short time (<2.5 s) with very high accuracy (>0.99 Area under curve of the Receiver of operating characteristics curve). Furthermore, the random forest approach detects a total of 4 key amino acid positions for the classification of field PRRSV strains into four clades. Our finding will provide an insightful idea to develop a rapid and accurate classification model using genetic information, which also enables us to handle large genome datasets in real time or semi-real time for data-driven decision-making and more timely surveillance.
format Online
Article
Text
id pubmed-8345883
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-83458832021-08-07 Applications of Machine Learning for the Classification of Porcine Reproductive and Respiratory Syndrome Virus Sublineages Using Amino Acid Scores of ORF5 Gene Kim, Jeonghoon Lee, Kyuyoung Rupasinghe, Ruwini Rezaei, Shahbaz Martínez-López, Beatriz Liu, Xin Front Vet Sci Veterinary Science Porcine reproductive and respiratory syndrome is an infectious disease of pigs caused by PRRS virus (PRRSV). A modified live-attenuated vaccine has been widely used to control the spread of PRRSV and the classification of field strains is a key for a successful control and prevention. Restriction fragment length polymorphism targeting the Open reading frame 5 (ORF5) genes is widely used to classify PRRSV strains but showed unstable accuracy. Phylogenetic analysis is a powerful tool for PRRSV classification with consistent accuracy but it demands large computational power as the number of sequences gets increased. Our study aimed to apply four machine learning (ML) algorithms, random forest, k-nearest neighbor, support vector machine and multilayer perceptron, to classify field PRRSV strains into four clades using amino acid scores based on ORF5 gene sequence. Our study used amino acid sequences of ORF5 gene in 1931 field PRRSV strains collected in the US from 2012 to 2020. Phylogenetic analysis was used to labels field PRRSV strains into one of four clades: Lineage 5 or three clades in Linage 1. We measured accuracy and time consumption of classification using four ML approaches by different size of gene sequences. We found that all four ML algorithms classify a large number of field strains in a very short time (<2.5 s) with very high accuracy (>0.99 Area under curve of the Receiver of operating characteristics curve). Furthermore, the random forest approach detects a total of 4 key amino acid positions for the classification of field PRRSV strains into four clades. Our finding will provide an insightful idea to develop a rapid and accurate classification model using genetic information, which also enables us to handle large genome datasets in real time or semi-real time for data-driven decision-making and more timely surveillance. Frontiers Media S.A. 2021-07-23 /pmc/articles/PMC8345883/ /pubmed/34368274 http://dx.doi.org/10.3389/fvets.2021.683134 Text en Copyright © 2021 Kim, Lee, Rupasinghe, Rezaei, Martínez-López and Liu. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Veterinary Science
Kim, Jeonghoon
Lee, Kyuyoung
Rupasinghe, Ruwini
Rezaei, Shahbaz
Martínez-López, Beatriz
Liu, Xin
Applications of Machine Learning for the Classification of Porcine Reproductive and Respiratory Syndrome Virus Sublineages Using Amino Acid Scores of ORF5 Gene
title Applications of Machine Learning for the Classification of Porcine Reproductive and Respiratory Syndrome Virus Sublineages Using Amino Acid Scores of ORF5 Gene
title_full Applications of Machine Learning for the Classification of Porcine Reproductive and Respiratory Syndrome Virus Sublineages Using Amino Acid Scores of ORF5 Gene
title_fullStr Applications of Machine Learning for the Classification of Porcine Reproductive and Respiratory Syndrome Virus Sublineages Using Amino Acid Scores of ORF5 Gene
title_full_unstemmed Applications of Machine Learning for the Classification of Porcine Reproductive and Respiratory Syndrome Virus Sublineages Using Amino Acid Scores of ORF5 Gene
title_short Applications of Machine Learning for the Classification of Porcine Reproductive and Respiratory Syndrome Virus Sublineages Using Amino Acid Scores of ORF5 Gene
title_sort applications of machine learning for the classification of porcine reproductive and respiratory syndrome virus sublineages using amino acid scores of orf5 gene
topic Veterinary Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8345883/
https://www.ncbi.nlm.nih.gov/pubmed/34368274
http://dx.doi.org/10.3389/fvets.2021.683134
work_keys_str_mv AT kimjeonghoon applicationsofmachinelearningfortheclassificationofporcinereproductiveandrespiratorysyndromevirussublineagesusingaminoacidscoresoforf5gene
AT leekyuyoung applicationsofmachinelearningfortheclassificationofporcinereproductiveandrespiratorysyndromevirussublineagesusingaminoacidscoresoforf5gene
AT rupasingheruwini applicationsofmachinelearningfortheclassificationofporcinereproductiveandrespiratorysyndromevirussublineagesusingaminoacidscoresoforf5gene
AT rezaeishahbaz applicationsofmachinelearningfortheclassificationofporcinereproductiveandrespiratorysyndromevirussublineagesusingaminoacidscoresoforf5gene
AT martinezlopezbeatriz applicationsofmachinelearningfortheclassificationofporcinereproductiveandrespiratorysyndromevirussublineagesusingaminoacidscoresoforf5gene
AT liuxin applicationsofmachinelearningfortheclassificationofporcinereproductiveandrespiratorysyndromevirussublineagesusingaminoacidscoresoforf5gene