Cargando…
Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences
Non-parametric and semi-parametric resampling procedures are widely used to perform support estimation in computational biology and bioinformatics. Among the most widely used methods in this class is the standard bootstrap method, which consists of random sampling with replacement. While not requiri...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7164268/ https://www.ncbi.nlm.nih.gov/pubmed/32322294 http://dx.doi.org/10.1186/s13015-020-00167-0 |
_version_ | 1783523261606264832 |
---|---|
author | Wang, Wei Smith, Jack Hejase, Hussein A. Liu, Kevin J. |
author_facet | Wang, Wei Smith, Jack Hejase, Hussein A. Liu, Kevin J. |
author_sort | Wang, Wei |
collection | PubMed |
description | Non-parametric and semi-parametric resampling procedures are widely used to perform support estimation in computational biology and bioinformatics. Among the most widely used methods in this class is the standard bootstrap method, which consists of random sampling with replacement. While not requiring assumptions about any particular parametric model for resampling purposes, the bootstrap and related techniques assume that sites are independent and identically distributed (i.i.d.). The i.i.d. assumption can be an over-simplification for many problems in computational biology and bioinformatics. In particular, sequential dependence within biomolecular sequences is often an essential biological feature due to biochemical function, evolutionary processes such as recombination, and other factors. To relax the simplifying i.i.d. assumption, we propose a new non-parametric/semi-parametric sequential resampling technique that generalizes “Heads-or-Tails” mirrored inputs, a simple but clever technique due to Landan and Graur. The generalized procedure takes the form of random walks along either aligned or unaligned biomolecular sequences. We refer to our new method as the SERES (or “SEquential RESampling”) method. To demonstrate the performance of the new technique, we apply SERES to estimate support for the multiple sequence alignment problem. Using simulated and empirical data, we show that SERES-based support estimation yields comparable or typically better performance compared to state-of-the-art methods. |
format | Online Article Text |
id | pubmed-7164268 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-71642682020-04-22 Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences Wang, Wei Smith, Jack Hejase, Hussein A. Liu, Kevin J. Algorithms Mol Biol Research Non-parametric and semi-parametric resampling procedures are widely used to perform support estimation in computational biology and bioinformatics. Among the most widely used methods in this class is the standard bootstrap method, which consists of random sampling with replacement. While not requiring assumptions about any particular parametric model for resampling purposes, the bootstrap and related techniques assume that sites are independent and identically distributed (i.i.d.). The i.i.d. assumption can be an over-simplification for many problems in computational biology and bioinformatics. In particular, sequential dependence within biomolecular sequences is often an essential biological feature due to biochemical function, evolutionary processes such as recombination, and other factors. To relax the simplifying i.i.d. assumption, we propose a new non-parametric/semi-parametric sequential resampling technique that generalizes “Heads-or-Tails” mirrored inputs, a simple but clever technique due to Landan and Graur. The generalized procedure takes the form of random walks along either aligned or unaligned biomolecular sequences. We refer to our new method as the SERES (or “SEquential RESampling”) method. To demonstrate the performance of the new technique, we apply SERES to estimate support for the multiple sequence alignment problem. Using simulated and empirical data, we show that SERES-based support estimation yields comparable or typically better performance compared to state-of-the-art methods. BioMed Central 2020-04-16 /pmc/articles/PMC7164268/ /pubmed/32322294 http://dx.doi.org/10.1186/s13015-020-00167-0 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Wang, Wei Smith, Jack Hejase, Hussein A. Liu, Kevin J. Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences |
title | Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences |
title_full | Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences |
title_fullStr | Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences |
title_full_unstemmed | Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences |
title_short | Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences |
title_sort | non-parametric and semi-parametric support estimation using sequential resampling random walks on biomolecular sequences |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7164268/ https://www.ncbi.nlm.nih.gov/pubmed/32322294 http://dx.doi.org/10.1186/s13015-020-00167-0 |
work_keys_str_mv | AT wangwei nonparametricandsemiparametricsupportestimationusingsequentialresamplingrandomwalksonbiomolecularsequences AT smithjack nonparametricandsemiparametricsupportestimationusingsequentialresamplingrandomwalksonbiomolecularsequences AT hejasehusseina nonparametricandsemiparametricsupportestimationusingsequentialresamplingrandomwalksonbiomolecularsequences AT liukevinj nonparametricandsemiparametricsupportestimationusingsequentialresamplingrandomwalksonbiomolecularsequences |