Cargando…

Differentiable Learning of Sequence-Specific Minimizer Schemes with DeepMinimizer

Minimizers are widely used to sample representative k-mers from biological sequences in many applications, such as read mapping and taxonomy prediction. In most scenarios, having the minimizer scheme select as few k-mer positions as possible (i.e., having a low density) is desirable to reduce comput...

Descripción completa

Detalles Bibliográficos
Autores principales: Hoang, Minh, Zheng, Hongyu, Kingsford, Carl
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Mary Ann Liebert, Inc., publishers 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9807081/
https://www.ncbi.nlm.nih.gov/pubmed/36095142
http://dx.doi.org/10.1089/cmb.2022.0275
_version_ 1784862640361701376
author Hoang, Minh
Zheng, Hongyu
Kingsford, Carl
author_facet Hoang, Minh
Zheng, Hongyu
Kingsford, Carl
author_sort Hoang, Minh
collection PubMed
description Minimizers are widely used to sample representative k-mers from biological sequences in many applications, such as read mapping and taxonomy prediction. In most scenarios, having the minimizer scheme select as few k-mer positions as possible (i.e., having a low density) is desirable to reduce computation and memory cost. Despite the growing interest in minimizers, learning an effective scheme with optimal density is still an open question, as it requires solving an apparently challenging discrete optimization problem on the permutation space of k-mer orderings. Most existing schemes are designed to work well in expectation over random sequences, which have limited applicability to many practical tools. On the other hand, several methods have been proposed to construct minimizer schemes for a specific target sequence. These methods, however, only approximate the original objective with likewise discrete surrogate tasks that are not able to significantly improve the density performance. This article introduces the first continuous relaxation of the density minimizing objective, DeepMinimizer, which employs a novel Deep Learning twin architecture to simultaneously ensure both validity and performance of the minimizer scheme. Our surrogate objective is fully differentiable and, therefore, amenable to efficient gradient-based optimization using GPU computing. Finally, we demonstrate that DeepMinimizer discovers minimizer schemes that significantly outperform state-of-the-art constructions on human genomic sequences.
format Online
Article
Text
id pubmed-9807081
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Mary Ann Liebert, Inc., publishers
record_format MEDLINE/PubMed
spelling pubmed-98070812023-01-10 Differentiable Learning of Sequence-Specific Minimizer Schemes with DeepMinimizer Hoang, Minh Zheng, Hongyu Kingsford, Carl J Comput Biol Research Articles Minimizers are widely used to sample representative k-mers from biological sequences in many applications, such as read mapping and taxonomy prediction. In most scenarios, having the minimizer scheme select as few k-mer positions as possible (i.e., having a low density) is desirable to reduce computation and memory cost. Despite the growing interest in minimizers, learning an effective scheme with optimal density is still an open question, as it requires solving an apparently challenging discrete optimization problem on the permutation space of k-mer orderings. Most existing schemes are designed to work well in expectation over random sequences, which have limited applicability to many practical tools. On the other hand, several methods have been proposed to construct minimizer schemes for a specific target sequence. These methods, however, only approximate the original objective with likewise discrete surrogate tasks that are not able to significantly improve the density performance. This article introduces the first continuous relaxation of the density minimizing objective, DeepMinimizer, which employs a novel Deep Learning twin architecture to simultaneously ensure both validity and performance of the minimizer scheme. Our surrogate objective is fully differentiable and, therefore, amenable to efficient gradient-based optimization using GPU computing. Finally, we demonstrate that DeepMinimizer discovers minimizer schemes that significantly outperform state-of-the-art constructions on human genomic sequences. Mary Ann Liebert, Inc., publishers 2022-12-01 2022-12-13 /pmc/articles/PMC9807081/ /pubmed/36095142 http://dx.doi.org/10.1089/cmb.2022.0275 Text en © Minh Hoang, et al., 2022. Published by Mary Ann Liebert, Inc. https://creativecommons.org/licenses/by/4.0/This Open Access article is distributed under the terms of the Creative Commons License [CC-BY] (http://creativecommons.org/licenses/by/4.0 (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Articles
Hoang, Minh
Zheng, Hongyu
Kingsford, Carl
Differentiable Learning of Sequence-Specific Minimizer Schemes with DeepMinimizer
title Differentiable Learning of Sequence-Specific Minimizer Schemes with DeepMinimizer
title_full Differentiable Learning of Sequence-Specific Minimizer Schemes with DeepMinimizer
title_fullStr Differentiable Learning of Sequence-Specific Minimizer Schemes with DeepMinimizer
title_full_unstemmed Differentiable Learning of Sequence-Specific Minimizer Schemes with DeepMinimizer
title_short Differentiable Learning of Sequence-Specific Minimizer Schemes with DeepMinimizer
title_sort differentiable learning of sequence-specific minimizer schemes with deepminimizer
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9807081/
https://www.ncbi.nlm.nih.gov/pubmed/36095142
http://dx.doi.org/10.1089/cmb.2022.0275
work_keys_str_mv AT hoangminh differentiablelearningofsequencespecificminimizerschemeswithdeepminimizer
AT zhenghongyu differentiablelearningofsequencespecificminimizerschemeswithdeepminimizer
AT kingsfordcarl differentiablelearningofsequencespecificminimizerschemeswithdeepminimizer