Cargando…

Accurate estimation of isoelectric point of protein and peptide based on amino acid sequences

Motivation: In any macromolecular polyprotic system—for example protein, DNA or RNA—the isoelectric point—commonly referred to as the pI—can be defined as the point of singularity in a titration curve, corresponding to the solution pH value at which the net overall surface charge—and thus the electr...

Descripción completa

Detalles Bibliográficos
Autores principales:	Audain, Enrique, Ramos, Yassel, Hermjakob, Henning, Flower, Darren R., Perez-Riverol, Yasset
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2016
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5939969/ https://www.ncbi.nlm.nih.gov/pubmed/26568629 http://dx.doi.org/10.1093/bioinformatics/btv674

_version_	1783321029015240704
author	Audain, Enrique Ramos, Yassel Hermjakob, Henning Flower, Darren R. Perez-Riverol, Yasset
author_facet	Audain, Enrique Ramos, Yassel Hermjakob, Henning Flower, Darren R. Perez-Riverol, Yasset
author_sort	Audain, Enrique
collection	PubMed
description	Motivation: In any macromolecular polyprotic system—for example protein, DNA or RNA—the isoelectric point—commonly referred to as the pI—can be defined as the point of singularity in a titration curve, corresponding to the solution pH value at which the net overall surface charge—and thus the electrophoretic mobility—of the ampholyte sums to zero. Different modern analytical biochemistry and proteomics methods depend on the isoelectric point as a principal feature for protein and peptide characterization. Protein separation by isoelectric point is a critical part of 2-D gel electrophoresis, a key precursor of proteomics, where discrete spots can be digested in-gel, and proteins subsequently identified by analytical mass spectrometry. Peptide fractionation according to their pI is also widely used in current proteomics sample preparation procedures previous to the LC-MS/MS analysis. Therefore accurate theoretical prediction of pI would expedite such analysis. While such pI calculation is widely used, it remains largely untested, motivating our efforts to benchmark pI prediction methods. Results: Using data from the database PIP-DB and one publically available dataset as our reference gold standard, we have undertaken the benchmarking of pI calculation methods. We find that methods vary in their accuracy and are highly sensitive to the choice of basis set. The machine-learning algorithms, especially the SVM-based algorithm, showed a superior performance when studying peptide mixtures. In general, learning-based pI prediction methods (such as Cofactor, SVM and Branca) require a large training dataset and their resulting performance will strongly depend of the quality of that data. In contrast with Iterative methods, machine-learning algorithms have the advantage of being able to add new features to improve the accuracy of prediction. Contact: yperez@ebi.ac.uk Availability and Implementation: The software and data are freely available at https://github.com/ypriverol/pIR. Supplementary information: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-5939969
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-59399692018-08-07 Accurate estimation of isoelectric point of protein and peptide based on amino acid sequences Audain, Enrique Ramos, Yassel Hermjakob, Henning Flower, Darren R. Perez-Riverol, Yasset Bioinformatics Original Papers Motivation: In any macromolecular polyprotic system—for example protein, DNA or RNA—the isoelectric point—commonly referred to as the pI—can be defined as the point of singularity in a titration curve, corresponding to the solution pH value at which the net overall surface charge—and thus the electrophoretic mobility—of the ampholyte sums to zero. Different modern analytical biochemistry and proteomics methods depend on the isoelectric point as a principal feature for protein and peptide characterization. Protein separation by isoelectric point is a critical part of 2-D gel electrophoresis, a key precursor of proteomics, where discrete spots can be digested in-gel, and proteins subsequently identified by analytical mass spectrometry. Peptide fractionation according to their pI is also widely used in current proteomics sample preparation procedures previous to the LC-MS/MS analysis. Therefore accurate theoretical prediction of pI would expedite such analysis. While such pI calculation is widely used, it remains largely untested, motivating our efforts to benchmark pI prediction methods. Results: Using data from the database PIP-DB and one publically available dataset as our reference gold standard, we have undertaken the benchmarking of pI calculation methods. We find that methods vary in their accuracy and are highly sensitive to the choice of basis set. The machine-learning algorithms, especially the SVM-based algorithm, showed a superior performance when studying peptide mixtures. In general, learning-based pI prediction methods (such as Cofactor, SVM and Branca) require a large training dataset and their resulting performance will strongly depend of the quality of that data. In contrast with Iterative methods, machine-learning algorithms have the advantage of being able to add new features to improve the accuracy of prediction. Contact: yperez@ebi.ac.uk Availability and Implementation: The software and data are freely available at https://github.com/ypriverol/pIR. Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-03-15 2015-11-14 /pmc/articles/PMC5939969/ /pubmed/26568629 http://dx.doi.org/10.1093/bioinformatics/btv674 Text en © The Author 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Papers Audain, Enrique Ramos, Yassel Hermjakob, Henning Flower, Darren R. Perez-Riverol, Yasset Accurate estimation of isoelectric point of protein and peptide based on amino acid sequences
title	Accurate estimation of isoelectric point of protein and peptide based on amino acid sequences
title_full	Accurate estimation of isoelectric point of protein and peptide based on amino acid sequences
title_fullStr	Accurate estimation of isoelectric point of protein and peptide based on amino acid sequences
title_full_unstemmed	Accurate estimation of isoelectric point of protein and peptide based on amino acid sequences
title_short	Accurate estimation of isoelectric point of protein and peptide based on amino acid sequences
title_sort	accurate estimation of isoelectric point of protein and peptide based on amino acid sequences
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5939969/ https://www.ncbi.nlm.nih.gov/pubmed/26568629 http://dx.doi.org/10.1093/bioinformatics/btv674
work_keys_str_mv	AT audainenrique accurateestimationofisoelectricpointofproteinandpeptidebasedonaminoacidsequences AT ramosyassel accurateestimationofisoelectricpointofproteinandpeptidebasedonaminoacidsequences AT hermjakobhenning accurateestimationofisoelectricpointofproteinandpeptidebasedonaminoacidsequences AT flowerdarrenr accurateestimationofisoelectricpointofproteinandpeptidebasedonaminoacidsequences AT perezriverolyasset accurateestimationofisoelectricpointofproteinandpeptidebasedonaminoacidsequences

Accurate estimation of isoelectric point of protein and peptide based on amino acid sequences

Ejemplares similares