Cargando…
Chromosome structures: reduction of certain problems with unequal gene content and gene paralogs to integer linear programming
BACKGROUND: Chromosome structure is a very limited model of the genome including the information about its chromosomes such as their linear or circular organization, the order of genes on them, and the DNA strand encoding a gene. Gene lengths, nucleotide composition, and intergenic regions are ignor...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5719933/ https://www.ncbi.nlm.nih.gov/pubmed/29212445 http://dx.doi.org/10.1186/s12859-017-1944-x |
_version_ | 1783284582684033024 |
---|---|
author | Lyubetsky, Vassily Gershgorin, Roman Gorbunov, Konstantin |
author_facet | Lyubetsky, Vassily Gershgorin, Roman Gorbunov, Konstantin |
author_sort | Lyubetsky, Vassily |
collection | PubMed |
description | BACKGROUND: Chromosome structure is a very limited model of the genome including the information about its chromosomes such as their linear or circular organization, the order of genes on them, and the DNA strand encoding a gene. Gene lengths, nucleotide composition, and intergenic regions are ignored. Although highly incomplete, such structure can be used in many cases, e.g., to reconstruct phylogeny and evolutionary events, to identify gene synteny, regulatory elements and promoters (considering highly conserved elements), etc. Three problems are considered; all assume unequal gene content and the presence of gene paralogs. The distance problem is to determine the minimum number of operations required to transform one chromosome structure into another and the corresponding transformation itself including the identification of paralogs in two structures. We use the DCJ model which is one of the most studied combinatorial rearrangement models. Double-, sesqui-, and single-operations as well as deletion and insertion of a chromosome region are considered in the model; the single ones comprise cut and join. In the reconstruction problem, a phylogenetic tree with chromosome structures in the leaves is given. It is necessary to assign the structures to inner nodes of the tree to minimize the sum of distances between terminal structures of each edge and to identify the mutual paralogs in a fairly large set of structures. A linear algorithm is known for the distance problem without paralogs, while the presence of paralogs makes it NP-hard. If paralogs are allowed but the insertion and deletion operations are missing (and special constraints are imposed), the reduction of the distance problem to integer linear programming is known. Apparently, the reconstruction problem is NP-hard even in the absence of paralogs. The problem of contigs is to find the optimal arrangements for each given set of contigs, which also includes the mutual identification of paralogs. RESULTS: We proved that these problems can be reduced to integer linear programming formulations, which allows an algorithm to redefine the problems to implement a very special case of the integer linear programming tool. The results were tested on synthetic and biological samples. CONCLUSIONS: Three well-known problems were reduced to a very special case of integer linear programming, which is a new method of their solutions. Integer linear programming is clearly among the main computational methods and, as generally accepted, is fast on average; in particular, computation systems specifically targeted at it are available. The challenges are to reduce the size of the corresponding integer linear programming formulations and to incorporate a more detailed biological concept in our model of the reconstruction. |
format | Online Article Text |
id | pubmed-5719933 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-57199332017-12-11 Chromosome structures: reduction of certain problems with unequal gene content and gene paralogs to integer linear programming Lyubetsky, Vassily Gershgorin, Roman Gorbunov, Konstantin BMC Bioinformatics Research Article BACKGROUND: Chromosome structure is a very limited model of the genome including the information about its chromosomes such as their linear or circular organization, the order of genes on them, and the DNA strand encoding a gene. Gene lengths, nucleotide composition, and intergenic regions are ignored. Although highly incomplete, such structure can be used in many cases, e.g., to reconstruct phylogeny and evolutionary events, to identify gene synteny, regulatory elements and promoters (considering highly conserved elements), etc. Three problems are considered; all assume unequal gene content and the presence of gene paralogs. The distance problem is to determine the minimum number of operations required to transform one chromosome structure into another and the corresponding transformation itself including the identification of paralogs in two structures. We use the DCJ model which is one of the most studied combinatorial rearrangement models. Double-, sesqui-, and single-operations as well as deletion and insertion of a chromosome region are considered in the model; the single ones comprise cut and join. In the reconstruction problem, a phylogenetic tree with chromosome structures in the leaves is given. It is necessary to assign the structures to inner nodes of the tree to minimize the sum of distances between terminal structures of each edge and to identify the mutual paralogs in a fairly large set of structures. A linear algorithm is known for the distance problem without paralogs, while the presence of paralogs makes it NP-hard. If paralogs are allowed but the insertion and deletion operations are missing (and special constraints are imposed), the reduction of the distance problem to integer linear programming is known. Apparently, the reconstruction problem is NP-hard even in the absence of paralogs. The problem of contigs is to find the optimal arrangements for each given set of contigs, which also includes the mutual identification of paralogs. RESULTS: We proved that these problems can be reduced to integer linear programming formulations, which allows an algorithm to redefine the problems to implement a very special case of the integer linear programming tool. The results were tested on synthetic and biological samples. CONCLUSIONS: Three well-known problems were reduced to a very special case of integer linear programming, which is a new method of their solutions. Integer linear programming is clearly among the main computational methods and, as generally accepted, is fast on average; in particular, computation systems specifically targeted at it are available. The challenges are to reduce the size of the corresponding integer linear programming formulations and to incorporate a more detailed biological concept in our model of the reconstruction. BioMed Central 2017-12-06 /pmc/articles/PMC5719933/ /pubmed/29212445 http://dx.doi.org/10.1186/s12859-017-1944-x Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Lyubetsky, Vassily Gershgorin, Roman Gorbunov, Konstantin Chromosome structures: reduction of certain problems with unequal gene content and gene paralogs to integer linear programming |
title | Chromosome structures: reduction of certain problems with unequal gene content and gene paralogs to integer linear programming |
title_full | Chromosome structures: reduction of certain problems with unequal gene content and gene paralogs to integer linear programming |
title_fullStr | Chromosome structures: reduction of certain problems with unequal gene content and gene paralogs to integer linear programming |
title_full_unstemmed | Chromosome structures: reduction of certain problems with unequal gene content and gene paralogs to integer linear programming |
title_short | Chromosome structures: reduction of certain problems with unequal gene content and gene paralogs to integer linear programming |
title_sort | chromosome structures: reduction of certain problems with unequal gene content and gene paralogs to integer linear programming |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5719933/ https://www.ncbi.nlm.nih.gov/pubmed/29212445 http://dx.doi.org/10.1186/s12859-017-1944-x |
work_keys_str_mv | AT lyubetskyvassily chromosomestructuresreductionofcertainproblemswithunequalgenecontentandgeneparalogstointegerlinearprogramming AT gershgorinroman chromosomestructuresreductionofcertainproblemswithunequalgenecontentandgeneparalogstointegerlinearprogramming AT gorbunovkonstantin chromosomestructuresreductionofcertainproblemswithunequalgenecontentandgeneparalogstointegerlinearprogramming |