Cargando…

Quantiprot - a Python package for quantitative analysis of protein sequences

BACKGROUND: The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solu...

Descripción completa

Detalles Bibliográficos
Autores principales: Konopka, Bogumił M., Marciniak, Marta, Dyrka, Witold
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5512976/
https://www.ncbi.nlm.nih.gov/pubmed/28716000
http://dx.doi.org/10.1186/s12859-017-1751-4
_version_ 1783250566664683520
author Konopka, Bogumił M.
Marciniak, Marta
Dyrka, Witold
author_facet Konopka, Bogumił M.
Marciniak, Marta
Dyrka, Witold
author_sort Konopka, Bogumił M.
collection PubMed
description BACKGROUND: The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solution space, where sequences can be related to each other and differences can be meaningfully interpreted. RESULTS: Quantiprot is a software package in Python, which provides a simple and consistent interface to multiple methods for quantitative characterization of protein sequences. The package can be used to calculate dozens of characteristics directly from sequences or using physico-chemical properties of amino acids. Besides basic measures, Quantiprot performs quantitative analysis of recurrence and determinism in the sequence, calculates distribution of n-grams and computes the Zipf’s law coefficient. CONCLUSIONS: We propose three main fields of application of the Quantiprot package. First, quantitative characteristics can be used in alignment-free similarity searches, and in clustering of large and/or divergent sequence sets. Second, a feature space defined by quantitative properties can be used in comparative studies of protein families and organisms. Third, the feature space can be used for evaluating generative models, where large number of sequences generated by the model can be compared to actually observed sequences.
format Online
Article
Text
id pubmed-5512976
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-55129762017-07-19 Quantiprot - a Python package for quantitative analysis of protein sequences Konopka, Bogumił M. Marciniak, Marta Dyrka, Witold BMC Bioinformatics Software BACKGROUND: The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solution space, where sequences can be related to each other and differences can be meaningfully interpreted. RESULTS: Quantiprot is a software package in Python, which provides a simple and consistent interface to multiple methods for quantitative characterization of protein sequences. The package can be used to calculate dozens of characteristics directly from sequences or using physico-chemical properties of amino acids. Besides basic measures, Quantiprot performs quantitative analysis of recurrence and determinism in the sequence, calculates distribution of n-grams and computes the Zipf’s law coefficient. CONCLUSIONS: We propose three main fields of application of the Quantiprot package. First, quantitative characteristics can be used in alignment-free similarity searches, and in clustering of large and/or divergent sequence sets. Second, a feature space defined by quantitative properties can be used in comparative studies of protein families and organisms. Third, the feature space can be used for evaluating generative models, where large number of sequences generated by the model can be compared to actually observed sequences. BioMed Central 2017-07-17 /pmc/articles/PMC5512976/ /pubmed/28716000 http://dx.doi.org/10.1186/s12859-017-1751-4 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Konopka, Bogumił M.
Marciniak, Marta
Dyrka, Witold
Quantiprot - a Python package for quantitative analysis of protein sequences
title Quantiprot - a Python package for quantitative analysis of protein sequences
title_full Quantiprot - a Python package for quantitative analysis of protein sequences
title_fullStr Quantiprot - a Python package for quantitative analysis of protein sequences
title_full_unstemmed Quantiprot - a Python package for quantitative analysis of protein sequences
title_short Quantiprot - a Python package for quantitative analysis of protein sequences
title_sort quantiprot - a python package for quantitative analysis of protein sequences
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5512976/
https://www.ncbi.nlm.nih.gov/pubmed/28716000
http://dx.doi.org/10.1186/s12859-017-1751-4
work_keys_str_mv AT konopkabogumiłm quantiprotapythonpackageforquantitativeanalysisofproteinsequences
AT marciniakmarta quantiprotapythonpackageforquantitativeanalysisofproteinsequences
AT dyrkawitold quantiprotapythonpackageforquantitativeanalysisofproteinsequences