Cargando…

A safety framework for flow decomposition problems via integer linear programming

MOTIVATION: Many important problems in Bioinformatics (e.g. assembly or multiassembly) admit multiple solutions, while the final objective is to report only one. A common approach to deal with this uncertainty is finding “safe” partial solutions (e.g. contigs) which are common to all solutions. Prev...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dias, Fernando H C, Cáceres, Manuel, Williams, Lucia, Mumey, Brendan, Tomescu, Alexandru I
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2023
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10628435/ https://www.ncbi.nlm.nih.gov/pubmed/37862229 http://dx.doi.org/10.1093/bioinformatics/btad640

_version_	1785131757855571968
author	Dias, Fernando H C Cáceres, Manuel Williams, Lucia Mumey, Brendan Tomescu, Alexandru I
author_facet	Dias, Fernando H C Cáceres, Manuel Williams, Lucia Mumey, Brendan Tomescu, Alexandru I
author_sort	Dias, Fernando H C
collection	PubMed
description	MOTIVATION: Many important problems in Bioinformatics (e.g. assembly or multiassembly) admit multiple solutions, while the final objective is to report only one. A common approach to deal with this uncertainty is finding “safe” partial solutions (e.g. contigs) which are common to all solutions. Previous research on safety has focused on polynomially time solvable problems, whereas many successful and natural models are NP-hard to solve, leaving a lack of “safety tools” for such problems. We propose the first method for computing all safe solutions for an NP-hard problem, “minimum flow decomposition” (MFD). We obtain our results by developing a “safety test” for paths based on a general integer linear programming (ILP) formulation. Moreover, we provide implementations with practical optimizations aimed to reduce the total ILP time, the most efficient of these being based on a recursive group-testing procedure. RESULTS: Experimental results on transcriptome datasets show that all safe paths for MFDs correctly recover up to 90% of the full RNA transcripts, which is at least 25% more than previously known safe paths. Moreover, despite the NP-hardness of the problem, we can report all safe paths for 99.8% of the over 27 000 non-trivial graphs of this dataset in only 1.5 h. Our results suggest that, on perfect data, there is less ambiguity than thought in the notoriously hard RNA assembly problem. AVAILABILITY AND IMPLEMENTATION: https://github.com/algbio/mfd-safety.
format	Online Article Text
id	pubmed-10628435
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-106284352023-11-08 A safety framework for flow decomposition problems via integer linear programming Dias, Fernando H C Cáceres, Manuel Williams, Lucia Mumey, Brendan Tomescu, Alexandru I Bioinformatics Original Paper MOTIVATION: Many important problems in Bioinformatics (e.g. assembly or multiassembly) admit multiple solutions, while the final objective is to report only one. A common approach to deal with this uncertainty is finding “safe” partial solutions (e.g. contigs) which are common to all solutions. Previous research on safety has focused on polynomially time solvable problems, whereas many successful and natural models are NP-hard to solve, leaving a lack of “safety tools” for such problems. We propose the first method for computing all safe solutions for an NP-hard problem, “minimum flow decomposition” (MFD). We obtain our results by developing a “safety test” for paths based on a general integer linear programming (ILP) formulation. Moreover, we provide implementations with practical optimizations aimed to reduce the total ILP time, the most efficient of these being based on a recursive group-testing procedure. RESULTS: Experimental results on transcriptome datasets show that all safe paths for MFDs correctly recover up to 90% of the full RNA transcripts, which is at least 25% more than previously known safe paths. Moreover, despite the NP-hardness of the problem, we can report all safe paths for 99.8% of the over 27 000 non-trivial graphs of this dataset in only 1.5 h. Our results suggest that, on perfect data, there is less ambiguity than thought in the notoriously hard RNA assembly problem. AVAILABILITY AND IMPLEMENTATION: https://github.com/algbio/mfd-safety. Oxford University Press 2023-10-20 /pmc/articles/PMC10628435/ /pubmed/37862229 http://dx.doi.org/10.1093/bioinformatics/btad640 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Paper Dias, Fernando H C Cáceres, Manuel Williams, Lucia Mumey, Brendan Tomescu, Alexandru I A safety framework for flow decomposition problems via integer linear programming
title	A safety framework for flow decomposition problems via integer linear programming
title_full	A safety framework for flow decomposition problems via integer linear programming
title_fullStr	A safety framework for flow decomposition problems via integer linear programming
title_full_unstemmed	A safety framework for flow decomposition problems via integer linear programming
title_short	A safety framework for flow decomposition problems via integer linear programming
title_sort	safety framework for flow decomposition problems via integer linear programming
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10628435/ https://www.ncbi.nlm.nih.gov/pubmed/37862229 http://dx.doi.org/10.1093/bioinformatics/btad640
work_keys_str_mv	AT diasfernandohc asafetyframeworkforflowdecompositionproblemsviaintegerlinearprogramming AT caceresmanuel asafetyframeworkforflowdecompositionproblemsviaintegerlinearprogramming AT williamslucia asafetyframeworkforflowdecompositionproblemsviaintegerlinearprogramming AT mumeybrendan asafetyframeworkforflowdecompositionproblemsviaintegerlinearprogramming AT tomescualexandrui asafetyframeworkforflowdecompositionproblemsviaintegerlinearprogramming AT diasfernandohc safetyframeworkforflowdecompositionproblemsviaintegerlinearprogramming AT caceresmanuel safetyframeworkforflowdecompositionproblemsviaintegerlinearprogramming AT williamslucia safetyframeworkforflowdecompositionproblemsviaintegerlinearprogramming AT mumeybrendan safetyframeworkforflowdecompositionproblemsviaintegerlinearprogramming AT tomescualexandrui safetyframeworkforflowdecompositionproblemsviaintegerlinearprogramming

A safety framework for flow decomposition problems via integer linear programming

Ejemplares similares