Cargando…

SLALOM, a flexible method for the identification and statistical analysis of overlapping continuous sequence elements in sequence- and time-series data

BACKGROUND: Protein or nucleic acid sequences contain a multitude of associated annotations representing continuous sequence elements (CSEs). Comparing these CSEs is needed, whenever we want to match identical annotations or integrate distinctive ones. Currently, there is no ready-to-use software av...

Descripción completa

Detalles Bibliográficos
Autores principales:	Prytuliak, Roman, Pfeiffer, Friedhelm, Habermann, Bianca Hermine
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2018
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5787307/ https://www.ncbi.nlm.nih.gov/pubmed/29373955 http://dx.doi.org/10.1186/s12859-018-2020-x

_version_	1783295910124453888
author	Prytuliak, Roman Pfeiffer, Friedhelm Habermann, Bianca Hermine
author_facet	Prytuliak, Roman Pfeiffer, Friedhelm Habermann, Bianca Hermine
author_sort	Prytuliak, Roman
collection	PubMed
description	BACKGROUND: Protein or nucleic acid sequences contain a multitude of associated annotations representing continuous sequence elements (CSEs). Comparing these CSEs is needed, whenever we want to match identical annotations or integrate distinctive ones. Currently, there is no ready-to-use software available that provides comprehensive statistical readout for comparing two annotations of the same type with each other, which can be adapted to the application logic of the scientific question. RESULTS: We have developed a method, SLALOM (for StatisticaL Analysis of Locus Overlap Method), to perform comparative analysis of sequence annotations in a highly flexible way. SLALOM implements six major operation modes and a number of additional options that can answer a variety of statistical questions about a pair of input annotations of a given sequence collection. We demonstrate the results of SLALOM on three different examples from biology and economics and compare our method to already existing software. We discuss the importance of carefully choosing the application logic to address specific scientific questions. CONCLUSION: SLALOM is a highly versatile, command-line based method for comparing annotations in a collection of sequences, with a statistical read-out for performance evaluation and benchmarking of predictors and gene annotation pipelines. Abstraction from sequence content even allows SLALOM to compare other kinds of positional data including, for example, data coming from time series. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2020-x) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5787307
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-57873072018-02-08 SLALOM, a flexible method for the identification and statistical analysis of overlapping continuous sequence elements in sequence- and time-series data Prytuliak, Roman Pfeiffer, Friedhelm Habermann, Bianca Hermine BMC Bioinformatics Methodology Article BACKGROUND: Protein or nucleic acid sequences contain a multitude of associated annotations representing continuous sequence elements (CSEs). Comparing these CSEs is needed, whenever we want to match identical annotations or integrate distinctive ones. Currently, there is no ready-to-use software available that provides comprehensive statistical readout for comparing two annotations of the same type with each other, which can be adapted to the application logic of the scientific question. RESULTS: We have developed a method, SLALOM (for StatisticaL Analysis of Locus Overlap Method), to perform comparative analysis of sequence annotations in a highly flexible way. SLALOM implements six major operation modes and a number of additional options that can answer a variety of statistical questions about a pair of input annotations of a given sequence collection. We demonstrate the results of SLALOM on three different examples from biology and economics and compare our method to already existing software. We discuss the importance of carefully choosing the application logic to address specific scientific questions. CONCLUSION: SLALOM is a highly versatile, command-line based method for comparing annotations in a collection of sequences, with a statistical read-out for performance evaluation and benchmarking of predictors and gene annotation pipelines. Abstraction from sequence content even allows SLALOM to compare other kinds of positional data including, for example, data coming from time series. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2020-x) contains supplementary material, which is available to authorized users. BioMed Central 2018-01-26 /pmc/articles/PMC5787307/ /pubmed/29373955 http://dx.doi.org/10.1186/s12859-018-2020-x Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Prytuliak, Roman Pfeiffer, Friedhelm Habermann, Bianca Hermine SLALOM, a flexible method for the identification and statistical analysis of overlapping continuous sequence elements in sequence- and time-series data
title	SLALOM, a flexible method for the identification and statistical analysis of overlapping continuous sequence elements in sequence- and time-series data
title_full	SLALOM, a flexible method for the identification and statistical analysis of overlapping continuous sequence elements in sequence- and time-series data
title_fullStr	SLALOM, a flexible method for the identification and statistical analysis of overlapping continuous sequence elements in sequence- and time-series data
title_full_unstemmed	SLALOM, a flexible method for the identification and statistical analysis of overlapping continuous sequence elements in sequence- and time-series data
title_short	SLALOM, a flexible method for the identification and statistical analysis of overlapping continuous sequence elements in sequence- and time-series data
title_sort	slalom, a flexible method for the identification and statistical analysis of overlapping continuous sequence elements in sequence- and time-series data
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5787307/ https://www.ncbi.nlm.nih.gov/pubmed/29373955 http://dx.doi.org/10.1186/s12859-018-2020-x
work_keys_str_mv	AT prytuliakroman slalomaflexiblemethodfortheidentificationandstatisticalanalysisofoverlappingcontinuoussequenceelementsinsequenceandtimeseriesdata AT pfeifferfriedhelm slalomaflexiblemethodfortheidentificationandstatisticalanalysisofoverlappingcontinuoussequenceelementsinsequenceandtimeseriesdata AT habermannbiancahermine slalomaflexiblemethodfortheidentificationandstatisticalanalysisofoverlappingcontinuoussequenceelementsinsequenceandtimeseriesdata

SLALOM, a flexible method for the identification and statistical analysis of overlapping continuous sequence elements in sequence- and time-series data

Ejemplares similares