Cargando…

Improving RNA Assembly via Safety and Completeness in Flow Decompositions

Decomposing a network flow into weighted paths is a problem with numerous applications, ranging from networking, transportation planning, to bioinformatics. In some applications we look for a decomposition that is optimal with respect to some property, such as the number of paths used, robustness to...

Descripción completa

Detalles Bibliográficos
Autores principales: Khan, Shahbaz, Kortelainen, Milla, Cáceres, Manuel, Williams, Lucia, Tomescu, Alexandru I.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Mary Ann Liebert, Inc., publishers 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9807076/
https://www.ncbi.nlm.nih.gov/pubmed/36288562
http://dx.doi.org/10.1089/cmb.2022.0261
_version_ 1784862639127527424
author Khan, Shahbaz
Kortelainen, Milla
Cáceres, Manuel
Williams, Lucia
Tomescu, Alexandru I.
author_facet Khan, Shahbaz
Kortelainen, Milla
Cáceres, Manuel
Williams, Lucia
Tomescu, Alexandru I.
author_sort Khan, Shahbaz
collection PubMed
description Decomposing a network flow into weighted paths is a problem with numerous applications, ranging from networking, transportation planning, to bioinformatics. In some applications we look for a decomposition that is optimal with respect to some property, such as the number of paths used, robustness to edge deletion, or length of the longest path. However, in many bioinformatic applications, we seek a specific decomposition where the paths correspond to some underlying data that generated the flow. In these cases, no optimization criteria guarantee the identification of the correct decomposition. Therefore, we propose to instead report the safe paths, which are subpaths of at least one path in every flow decomposition. In this work, we give the first local characterization of safe paths for flow decompositions in directed acyclic graphs, leading to a practical algorithm for finding the complete set of safe paths. In addition, we evaluate our algorithm on RNA transcript data sets against a trivial safe algorithm (extended unitigs), the recently proposed safe paths for path covers (TCBB 2021) and the popular heuristic greedy-width. On the one hand, we found that besides maintaining perfect precision, our safe and complete algorithm reports a significantly higher coverage ([Formula: see text] more) compared with the other safe algorithms. On the other hand, the greedy-width algorithm although reporting a better coverage, it also reports a significantly lower precision on complex graphs (for genes expressing a large number of transcripts). Overall, our safe and complete algorithm outperforms (by [Formula: see text]) greedy-width on a unified metric (F-score) considering both coverage and precision when the evaluated data set has a significant number of complex graphs. Moreover, it also has a superior time ([Formula: see text]) and space performance ([Formula: see text]), resulting in a better and more practical approach for bioinformatic applications of flow decomposition.
format Online
Article
Text
id pubmed-9807076
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Mary Ann Liebert, Inc., publishers
record_format MEDLINE/PubMed
spelling pubmed-98070762023-01-10 Improving RNA Assembly via Safety and Completeness in Flow Decompositions Khan, Shahbaz Kortelainen, Milla Cáceres, Manuel Williams, Lucia Tomescu, Alexandru I. J Comput Biol Research Articles Decomposing a network flow into weighted paths is a problem with numerous applications, ranging from networking, transportation planning, to bioinformatics. In some applications we look for a decomposition that is optimal with respect to some property, such as the number of paths used, robustness to edge deletion, or length of the longest path. However, in many bioinformatic applications, we seek a specific decomposition where the paths correspond to some underlying data that generated the flow. In these cases, no optimization criteria guarantee the identification of the correct decomposition. Therefore, we propose to instead report the safe paths, which are subpaths of at least one path in every flow decomposition. In this work, we give the first local characterization of safe paths for flow decompositions in directed acyclic graphs, leading to a practical algorithm for finding the complete set of safe paths. In addition, we evaluate our algorithm on RNA transcript data sets against a trivial safe algorithm (extended unitigs), the recently proposed safe paths for path covers (TCBB 2021) and the popular heuristic greedy-width. On the one hand, we found that besides maintaining perfect precision, our safe and complete algorithm reports a significantly higher coverage ([Formula: see text] more) compared with the other safe algorithms. On the other hand, the greedy-width algorithm although reporting a better coverage, it also reports a significantly lower precision on complex graphs (for genes expressing a large number of transcripts). Overall, our safe and complete algorithm outperforms (by [Formula: see text]) greedy-width on a unified metric (F-score) considering both coverage and precision when the evaluated data set has a significant number of complex graphs. Moreover, it also has a superior time ([Formula: see text]) and space performance ([Formula: see text]), resulting in a better and more practical approach for bioinformatic applications of flow decomposition. Mary Ann Liebert, Inc., publishers 2022-12-01 2022-12-13 /pmc/articles/PMC9807076/ /pubmed/36288562 http://dx.doi.org/10.1089/cmb.2022.0261 Text en © Shahbaz Khan, et al., 2022. Published by Mary Ann Liebert, Inc. https://creativecommons.org/licenses/by/4.0/This Open Access article is distributed under the terms of the Creative Commons License [CC-BY] (http://creativecommons.org/licenses/by/4.0 (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle Research Articles
Khan, Shahbaz
Kortelainen, Milla
Cáceres, Manuel
Williams, Lucia
Tomescu, Alexandru I.
Improving RNA Assembly via Safety and Completeness in Flow Decompositions
title Improving RNA Assembly via Safety and Completeness in Flow Decompositions
title_full Improving RNA Assembly via Safety and Completeness in Flow Decompositions
title_fullStr Improving RNA Assembly via Safety and Completeness in Flow Decompositions
title_full_unstemmed Improving RNA Assembly via Safety and Completeness in Flow Decompositions
title_short Improving RNA Assembly via Safety and Completeness in Flow Decompositions
title_sort improving rna assembly via safety and completeness in flow decompositions
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9807076/
https://www.ncbi.nlm.nih.gov/pubmed/36288562
http://dx.doi.org/10.1089/cmb.2022.0261
work_keys_str_mv AT khanshahbaz improvingrnaassemblyviasafetyandcompletenessinflowdecompositions
AT kortelainenmilla improvingrnaassemblyviasafetyandcompletenessinflowdecompositions
AT caceresmanuel improvingrnaassemblyviasafetyandcompletenessinflowdecompositions
AT williamslucia improvingrnaassemblyviasafetyandcompletenessinflowdecompositions
AT tomescualexandrui improvingrnaassemblyviasafetyandcompletenessinflowdecompositions