Cargando…

MP3: A Software Tool for the Prediction of Pathogenic Proteins in Genomic and Metagenomic Data

The identification of virulent proteins in any de-novo sequenced genome is useful in estimating its pathogenic ability and understanding the mechanism of pathogenesis. Similarly, the identification of such proteins could be valuable in comparing the metagenome of healthy and diseased individuals and...

Descripción completa

Detalles Bibliográficos
Autores principales: Gupta, Ankit, Kapil, Rohan, Dhakan, Darshan B., Sharma, Vineet K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3988012/
https://www.ncbi.nlm.nih.gov/pubmed/24736651
http://dx.doi.org/10.1371/journal.pone.0093907
_version_ 1782311961887768576
author Gupta, Ankit
Kapil, Rohan
Dhakan, Darshan B.
Sharma, Vineet K.
author_facet Gupta, Ankit
Kapil, Rohan
Dhakan, Darshan B.
Sharma, Vineet K.
author_sort Gupta, Ankit
collection PubMed
description The identification of virulent proteins in any de-novo sequenced genome is useful in estimating its pathogenic ability and understanding the mechanism of pathogenesis. Similarly, the identification of such proteins could be valuable in comparing the metagenome of healthy and diseased individuals and estimating the proportion of pathogenic species. However, the common challenge in both the above tasks is the identification of virulent proteins since a significant proportion of genomic and metagenomic proteins are novel and yet unannotated. The currently available tools which carry out the identification of virulent proteins provide limited accuracy and cannot be used on large datasets. Therefore, we have developed an MP3 standalone tool and web server for the prediction of pathogenic proteins in both genomic and metagenomic datasets. MP3 is developed using an integrated Support Vector Machine (SVM) and Hidden Markov Model (HMM) approach to carry out highly fast, sensitive and accurate prediction of pathogenic proteins. It displayed Sensitivity, Specificity, MCC and accuracy values of 92%, 100%, 0.92 and 96%, respectively, on blind dataset constructed using complete proteins. On the two metagenomic blind datasets (Blind A: 51–100 amino acids and Blind B: 30–50 amino acids), it displayed Sensitivity, Specificity, MCC and accuracy values of 82.39%, 97.86%, 0.80 and 89.32% for Blind A and 71.60%, 94.48%, 0.67 and 81.86% for Blind B, respectively. In addition, the performance of MP3 was validated on selected bacterial genomic and real metagenomic datasets. To our knowledge, MP3 is the only program that specializes in fast and accurate identification of partial pathogenic proteins predicted from short (100–150 bp) metagenomic reads and also performs exceptionally well on complete protein sequences. MP3 is publicly available at http://metagenomics.iiserb.ac.in/mp3/index.php.
format Online
Article
Text
id pubmed-3988012
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-39880122014-04-21 MP3: A Software Tool for the Prediction of Pathogenic Proteins in Genomic and Metagenomic Data Gupta, Ankit Kapil, Rohan Dhakan, Darshan B. Sharma, Vineet K. PLoS One Research Article The identification of virulent proteins in any de-novo sequenced genome is useful in estimating its pathogenic ability and understanding the mechanism of pathogenesis. Similarly, the identification of such proteins could be valuable in comparing the metagenome of healthy and diseased individuals and estimating the proportion of pathogenic species. However, the common challenge in both the above tasks is the identification of virulent proteins since a significant proportion of genomic and metagenomic proteins are novel and yet unannotated. The currently available tools which carry out the identification of virulent proteins provide limited accuracy and cannot be used on large datasets. Therefore, we have developed an MP3 standalone tool and web server for the prediction of pathogenic proteins in both genomic and metagenomic datasets. MP3 is developed using an integrated Support Vector Machine (SVM) and Hidden Markov Model (HMM) approach to carry out highly fast, sensitive and accurate prediction of pathogenic proteins. It displayed Sensitivity, Specificity, MCC and accuracy values of 92%, 100%, 0.92 and 96%, respectively, on blind dataset constructed using complete proteins. On the two metagenomic blind datasets (Blind A: 51–100 amino acids and Blind B: 30–50 amino acids), it displayed Sensitivity, Specificity, MCC and accuracy values of 82.39%, 97.86%, 0.80 and 89.32% for Blind A and 71.60%, 94.48%, 0.67 and 81.86% for Blind B, respectively. In addition, the performance of MP3 was validated on selected bacterial genomic and real metagenomic datasets. To our knowledge, MP3 is the only program that specializes in fast and accurate identification of partial pathogenic proteins predicted from short (100–150 bp) metagenomic reads and also performs exceptionally well on complete protein sequences. MP3 is publicly available at http://metagenomics.iiserb.ac.in/mp3/index.php. Public Library of Science 2014-04-15 /pmc/articles/PMC3988012/ /pubmed/24736651 http://dx.doi.org/10.1371/journal.pone.0093907 Text en © 2014 Gupta et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Gupta, Ankit
Kapil, Rohan
Dhakan, Darshan B.
Sharma, Vineet K.
MP3: A Software Tool for the Prediction of Pathogenic Proteins in Genomic and Metagenomic Data
title MP3: A Software Tool for the Prediction of Pathogenic Proteins in Genomic and Metagenomic Data
title_full MP3: A Software Tool for the Prediction of Pathogenic Proteins in Genomic and Metagenomic Data
title_fullStr MP3: A Software Tool for the Prediction of Pathogenic Proteins in Genomic and Metagenomic Data
title_full_unstemmed MP3: A Software Tool for the Prediction of Pathogenic Proteins in Genomic and Metagenomic Data
title_short MP3: A Software Tool for the Prediction of Pathogenic Proteins in Genomic and Metagenomic Data
title_sort mp3: a software tool for the prediction of pathogenic proteins in genomic and metagenomic data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3988012/
https://www.ncbi.nlm.nih.gov/pubmed/24736651
http://dx.doi.org/10.1371/journal.pone.0093907
work_keys_str_mv AT guptaankit mp3asoftwaretoolforthepredictionofpathogenicproteinsingenomicandmetagenomicdata
AT kapilrohan mp3asoftwaretoolforthepredictionofpathogenicproteinsingenomicandmetagenomicdata
AT dhakandarshanb mp3asoftwaretoolforthepredictionofpathogenicproteinsingenomicandmetagenomicdata
AT sharmavineetk mp3asoftwaretoolforthepredictionofpathogenicproteinsingenomicandmetagenomicdata