Cargando…

Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences

Non-parametric and semi-parametric resampling procedures are widely used to perform support estimation in computational biology and bioinformatics. Among the most widely used methods in this class is the standard bootstrap method, which consists of random sampling with replacement. While not requiri...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Wei, Smith, Jack, Hejase, Hussein A., Liu, Kevin J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7164268/
https://www.ncbi.nlm.nih.gov/pubmed/32322294
http://dx.doi.org/10.1186/s13015-020-00167-0
_version_ 1783523261606264832
author Wang, Wei
Smith, Jack
Hejase, Hussein A.
Liu, Kevin J.
author_facet Wang, Wei
Smith, Jack
Hejase, Hussein A.
Liu, Kevin J.
author_sort Wang, Wei
collection PubMed
description Non-parametric and semi-parametric resampling procedures are widely used to perform support estimation in computational biology and bioinformatics. Among the most widely used methods in this class is the standard bootstrap method, which consists of random sampling with replacement. While not requiring assumptions about any particular parametric model for resampling purposes, the bootstrap and related techniques assume that sites are independent and identically distributed (i.i.d.). The i.i.d. assumption can be an over-simplification for many problems in computational biology and bioinformatics. In particular, sequential dependence within biomolecular sequences is often an essential biological feature due to biochemical function, evolutionary processes such as recombination, and other factors. To relax the simplifying i.i.d. assumption, we propose a new non-parametric/semi-parametric sequential resampling technique that generalizes “Heads-or-Tails” mirrored inputs, a simple but clever technique due to Landan and Graur. The generalized procedure takes the form of random walks along either aligned or unaligned biomolecular sequences. We refer to our new method as the SERES (or “SEquential RESampling”) method. To demonstrate the performance of the new technique, we apply SERES to estimate support for the multiple sequence alignment problem. Using simulated and empirical data, we show that SERES-based support estimation yields comparable or typically better performance compared to state-of-the-art methods.
format Online
Article
Text
id pubmed-7164268
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-71642682020-04-22 Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences Wang, Wei Smith, Jack Hejase, Hussein A. Liu, Kevin J. Algorithms Mol Biol Research Non-parametric and semi-parametric resampling procedures are widely used to perform support estimation in computational biology and bioinformatics. Among the most widely used methods in this class is the standard bootstrap method, which consists of random sampling with replacement. While not requiring assumptions about any particular parametric model for resampling purposes, the bootstrap and related techniques assume that sites are independent and identically distributed (i.i.d.). The i.i.d. assumption can be an over-simplification for many problems in computational biology and bioinformatics. In particular, sequential dependence within biomolecular sequences is often an essential biological feature due to biochemical function, evolutionary processes such as recombination, and other factors. To relax the simplifying i.i.d. assumption, we propose a new non-parametric/semi-parametric sequential resampling technique that generalizes “Heads-or-Tails” mirrored inputs, a simple but clever technique due to Landan and Graur. The generalized procedure takes the form of random walks along either aligned or unaligned biomolecular sequences. We refer to our new method as the SERES (or “SEquential RESampling”) method. To demonstrate the performance of the new technique, we apply SERES to estimate support for the multiple sequence alignment problem. Using simulated and empirical data, we show that SERES-based support estimation yields comparable or typically better performance compared to state-of-the-art methods. BioMed Central 2020-04-16 /pmc/articles/PMC7164268/ /pubmed/32322294 http://dx.doi.org/10.1186/s13015-020-00167-0 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Wang, Wei
Smith, Jack
Hejase, Hussein A.
Liu, Kevin J.
Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences
title Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences
title_full Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences
title_fullStr Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences
title_full_unstemmed Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences
title_short Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences
title_sort non-parametric and semi-parametric support estimation using sequential resampling random walks on biomolecular sequences
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7164268/
https://www.ncbi.nlm.nih.gov/pubmed/32322294
http://dx.doi.org/10.1186/s13015-020-00167-0
work_keys_str_mv AT wangwei nonparametricandsemiparametricsupportestimationusingsequentialresamplingrandomwalksonbiomolecularsequences
AT smithjack nonparametricandsemiparametricsupportestimationusingsequentialresamplingrandomwalksonbiomolecularsequences
AT hejasehusseina nonparametricandsemiparametricsupportestimationusingsequentialresamplingrandomwalksonbiomolecularsequences
AT liukevinj nonparametricandsemiparametricsupportestimationusingsequentialresamplingrandomwalksonbiomolecularsequences