Cargando…

A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem

Searching for the Multiple Longest Common Subsequences (MLCS) of multiple sequences is a classical NP-hard problem, which has been used in many applications. One of the most effective exact approaches for the MLCS problem is based on dominant point graph, which is a kind of directed acyclic graph (D...

Descripción completa

Detalles Bibliográficos
Autores principales: Peng, Zhan, Wang, Yuping
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5552671/
https://www.ncbi.nlm.nih.gov/pubmed/28848600
http://dx.doi.org/10.3389/fgene.2017.00104
_version_ 1783256490648272896
author Peng, Zhan
Wang, Yuping
author_facet Peng, Zhan
Wang, Yuping
author_sort Peng, Zhan
collection PubMed
description Searching for the Multiple Longest Common Subsequences (MLCS) of multiple sequences is a classical NP-hard problem, which has been used in many applications. One of the most effective exact approaches for the MLCS problem is based on dominant point graph, which is a kind of directed acyclic graph (DAG). However, the time and space efficiency of the leading dominant point graph based approaches is still unsatisfactory: constructing the dominated point graph used by these approaches requires a huge amount of time and space, which hinders the applications of these approaches to large-scale and long sequences. To address this issue, in this paper, we propose a new time and space efficient graph model called the Leveled-DAG for the MLCS problem. The Leveled-DAG can timely eliminate all the nodes in the graph that cannot contribute to the construction of MLCS during constructing. At any moment, only the current level and some previously generated nodes in the graph need to be kept in memory, which can greatly reduce the memory consumption. Also, the final graph contains only one node in which all of the wanted MLCS are saved, thus, no additional operations for searching the MLCS are needed. The experiments are conducted on real biological sequences with different numbers and lengths respectively, and the proposed algorithm is compared with three state-of-the-art algorithms. The experimental results show that the time and space needed for the Leveled-DAG approach are smaller than those for the compared algorithms especially on large-scale and long sequences.
format Online
Article
Text
id pubmed-5552671
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-55526712017-08-28 A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem Peng, Zhan Wang, Yuping Front Genet Genetics Searching for the Multiple Longest Common Subsequences (MLCS) of multiple sequences is a classical NP-hard problem, which has been used in many applications. One of the most effective exact approaches for the MLCS problem is based on dominant point graph, which is a kind of directed acyclic graph (DAG). However, the time and space efficiency of the leading dominant point graph based approaches is still unsatisfactory: constructing the dominated point graph used by these approaches requires a huge amount of time and space, which hinders the applications of these approaches to large-scale and long sequences. To address this issue, in this paper, we propose a new time and space efficient graph model called the Leveled-DAG for the MLCS problem. The Leveled-DAG can timely eliminate all the nodes in the graph that cannot contribute to the construction of MLCS during constructing. At any moment, only the current level and some previously generated nodes in the graph need to be kept in memory, which can greatly reduce the memory consumption. Also, the final graph contains only one node in which all of the wanted MLCS are saved, thus, no additional operations for searching the MLCS are needed. The experiments are conducted on real biological sequences with different numbers and lengths respectively, and the proposed algorithm is compared with three state-of-the-art algorithms. The experimental results show that the time and space needed for the Leveled-DAG approach are smaller than those for the compared algorithms especially on large-scale and long sequences. Frontiers Media S.A. 2017-08-09 /pmc/articles/PMC5552671/ /pubmed/28848600 http://dx.doi.org/10.3389/fgene.2017.00104 Text en Copyright © 2017 Peng and Wang. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Peng, Zhan
Wang, Yuping
A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem
title A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem
title_full A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem
title_fullStr A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem
title_full_unstemmed A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem
title_short A Novel Efficient Graph Model for the Multiple Longest Common Subsequences (MLCS) Problem
title_sort novel efficient graph model for the multiple longest common subsequences (mlcs) problem
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5552671/
https://www.ncbi.nlm.nih.gov/pubmed/28848600
http://dx.doi.org/10.3389/fgene.2017.00104
work_keys_str_mv AT pengzhan anovelefficientgraphmodelforthemultiplelongestcommonsubsequencesmlcsproblem
AT wangyuping anovelefficientgraphmodelforthemultiplelongestcommonsubsequencesmlcsproblem
AT pengzhan novelefficientgraphmodelforthemultiplelongestcommonsubsequencesmlcsproblem
AT wangyuping novelefficientgraphmodelforthemultiplelongestcommonsubsequencesmlcsproblem