Cargando…

Interpol: An R package for preprocessing of protein sequences

BACKGROUND: Most machine learning techniques currently applied in the literature need a fixed dimensionality of input data. However, this requirement is frequently violated by real input data, such as DNA and protein sequences, that often differ in length due to insertions and deletions. It is also...

Descripción completa

Detalles Bibliográficos
Autores principales:	Heider, Dominik, Hoffmann, Daniel
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Short Report
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3138420/ https://www.ncbi.nlm.nih.gov/pubmed/21682849 http://dx.doi.org/10.1186/1756-0381-4-16

_version_	1782208381453336576
author	Heider, Dominik Hoffmann, Daniel
author_facet	Heider, Dominik Hoffmann, Daniel
author_sort	Heider, Dominik
collection	PubMed
description	BACKGROUND: Most machine learning techniques currently applied in the literature need a fixed dimensionality of input data. However, this requirement is frequently violated by real input data, such as DNA and protein sequences, that often differ in length due to insertions and deletions. It is also notable that performance in classification and regression is often improved by numerical encoding of amino acids, compared to the commonly used sparse encoding. RESULTS: The software "Interpol" encodes amino acid sequences as numerical descriptor vectors using a database of currently 532 descriptors (mainly from AAindex), and normalizes sequences to uniform length with one of five linear or non-linear interpolation algorithms. Interpol is distributed with open source as platform independent R-package. It is typically used for preprocessing of amino acid sequences for classification or regression. CONCLUSIONS: The functionality of Interpol widens the spectrum of machine learning methods that can be applied to biological sequences, and it will in many cases improve their performance in classification and regression.
format	Online Article Text
id	pubmed-3138420
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-31384202011-07-19 Interpol: An R package for preprocessing of protein sequences Heider, Dominik Hoffmann, Daniel BioData Min Short Report BACKGROUND: Most machine learning techniques currently applied in the literature need a fixed dimensionality of input data. However, this requirement is frequently violated by real input data, such as DNA and protein sequences, that often differ in length due to insertions and deletions. It is also notable that performance in classification and regression is often improved by numerical encoding of amino acids, compared to the commonly used sparse encoding. RESULTS: The software "Interpol" encodes amino acid sequences as numerical descriptor vectors using a database of currently 532 descriptors (mainly from AAindex), and normalizes sequences to uniform length with one of five linear or non-linear interpolation algorithms. Interpol is distributed with open source as platform independent R-package. It is typically used for preprocessing of amino acid sequences for classification or regression. CONCLUSIONS: The functionality of Interpol widens the spectrum of machine learning methods that can be applied to biological sequences, and it will in many cases improve their performance in classification and regression. BioMed Central 2011-06-17 /pmc/articles/PMC3138420/ /pubmed/21682849 http://dx.doi.org/10.1186/1756-0381-4-16 Text en Copyright ©2011 Heider and Hoffmann; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Short Report Heider, Dominik Hoffmann, Daniel Interpol: An R package for preprocessing of protein sequences
title	Interpol: An R package for preprocessing of protein sequences
title_full	Interpol: An R package for preprocessing of protein sequences
title_fullStr	Interpol: An R package for preprocessing of protein sequences
title_full_unstemmed	Interpol: An R package for preprocessing of protein sequences
title_short	Interpol: An R package for preprocessing of protein sequences
title_sort	interpol: an r package for preprocessing of protein sequences
topic	Short Report
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3138420/ https://www.ncbi.nlm.nih.gov/pubmed/21682849 http://dx.doi.org/10.1186/1756-0381-4-16
work_keys_str_mv	AT heiderdominik interpolanrpackageforpreprocessingofproteinsequences AT hoffmanndaniel interpolanrpackageforpreprocessingofproteinsequences

Interpol: An R package for preprocessing of protein sequences

Ejemplares similares