Cargando…
Differentiable Learning of Sequence-Specific Minimizer Schemes with DeepMinimizer
Minimizers are widely used to sample representative k-mers from biological sequences in many applications, such as read mapping and taxonomy prediction. In most scenarios, having the minimizer scheme select as few k-mer positions as possible (i.e., having a low density) is desirable to reduce comput...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Mary Ann Liebert, Inc., publishers
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9807081/ https://www.ncbi.nlm.nih.gov/pubmed/36095142 http://dx.doi.org/10.1089/cmb.2022.0275 |
_version_ | 1784862640361701376 |
---|---|
author | Hoang, Minh Zheng, Hongyu Kingsford, Carl |
author_facet | Hoang, Minh Zheng, Hongyu Kingsford, Carl |
author_sort | Hoang, Minh |
collection | PubMed |
description | Minimizers are widely used to sample representative k-mers from biological sequences in many applications, such as read mapping and taxonomy prediction. In most scenarios, having the minimizer scheme select as few k-mer positions as possible (i.e., having a low density) is desirable to reduce computation and memory cost. Despite the growing interest in minimizers, learning an effective scheme with optimal density is still an open question, as it requires solving an apparently challenging discrete optimization problem on the permutation space of k-mer orderings. Most existing schemes are designed to work well in expectation over random sequences, which have limited applicability to many practical tools. On the other hand, several methods have been proposed to construct minimizer schemes for a specific target sequence. These methods, however, only approximate the original objective with likewise discrete surrogate tasks that are not able to significantly improve the density performance. This article introduces the first continuous relaxation of the density minimizing objective, DeepMinimizer, which employs a novel Deep Learning twin architecture to simultaneously ensure both validity and performance of the minimizer scheme. Our surrogate objective is fully differentiable and, therefore, amenable to efficient gradient-based optimization using GPU computing. Finally, we demonstrate that DeepMinimizer discovers minimizer schemes that significantly outperform state-of-the-art constructions on human genomic sequences. |
format | Online Article Text |
id | pubmed-9807081 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Mary Ann Liebert, Inc., publishers |
record_format | MEDLINE/PubMed |
spelling | pubmed-98070812023-01-10 Differentiable Learning of Sequence-Specific Minimizer Schemes with DeepMinimizer Hoang, Minh Zheng, Hongyu Kingsford, Carl J Comput Biol Research Articles Minimizers are widely used to sample representative k-mers from biological sequences in many applications, such as read mapping and taxonomy prediction. In most scenarios, having the minimizer scheme select as few k-mer positions as possible (i.e., having a low density) is desirable to reduce computation and memory cost. Despite the growing interest in minimizers, learning an effective scheme with optimal density is still an open question, as it requires solving an apparently challenging discrete optimization problem on the permutation space of k-mer orderings. Most existing schemes are designed to work well in expectation over random sequences, which have limited applicability to many practical tools. On the other hand, several methods have been proposed to construct minimizer schemes for a specific target sequence. These methods, however, only approximate the original objective with likewise discrete surrogate tasks that are not able to significantly improve the density performance. This article introduces the first continuous relaxation of the density minimizing objective, DeepMinimizer, which employs a novel Deep Learning twin architecture to simultaneously ensure both validity and performance of the minimizer scheme. Our surrogate objective is fully differentiable and, therefore, amenable to efficient gradient-based optimization using GPU computing. Finally, we demonstrate that DeepMinimizer discovers minimizer schemes that significantly outperform state-of-the-art constructions on human genomic sequences. Mary Ann Liebert, Inc., publishers 2022-12-01 2022-12-13 /pmc/articles/PMC9807081/ /pubmed/36095142 http://dx.doi.org/10.1089/cmb.2022.0275 Text en © Minh Hoang, et al., 2022. Published by Mary Ann Liebert, Inc. https://creativecommons.org/licenses/by/4.0/This Open Access article is distributed under the terms of the Creative Commons License [CC-BY] (http://creativecommons.org/licenses/by/4.0 (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Articles Hoang, Minh Zheng, Hongyu Kingsford, Carl Differentiable Learning of Sequence-Specific Minimizer Schemes with DeepMinimizer |
title | Differentiable Learning of Sequence-Specific Minimizer Schemes with DeepMinimizer |
title_full | Differentiable Learning of Sequence-Specific Minimizer Schemes with DeepMinimizer |
title_fullStr | Differentiable Learning of Sequence-Specific Minimizer Schemes with DeepMinimizer |
title_full_unstemmed | Differentiable Learning of Sequence-Specific Minimizer Schemes with DeepMinimizer |
title_short | Differentiable Learning of Sequence-Specific Minimizer Schemes with DeepMinimizer |
title_sort | differentiable learning of sequence-specific minimizer schemes with deepminimizer |
topic | Research Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9807081/ https://www.ncbi.nlm.nih.gov/pubmed/36095142 http://dx.doi.org/10.1089/cmb.2022.0275 |
work_keys_str_mv | AT hoangminh differentiablelearningofsequencespecificminimizerschemeswithdeepminimizer AT zhenghongyu differentiablelearningofsequencespecificminimizerschemeswithdeepminimizer AT kingsfordcarl differentiablelearningofsequencespecificminimizerschemeswithdeepminimizer |