Cargando…

Screening synteny blocks in pairwise genome comparisons through integer programming

BACKGROUND: It is difficult to accurately interpret chromosomal correspondences such as true orthology and paralogy due to significant divergence of genomes from a common ancestor. Analyses are particularly problematic among lineages that have repeatedly experienced whole genome duplication (WGD) ev...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tang, Haibao, Lyons, Eric, Pedersen, Brent, Schnable, James C, Paterson, Andrew H, Freeling, Michael
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3088904/ https://www.ncbi.nlm.nih.gov/pubmed/21501495 http://dx.doi.org/10.1186/1471-2105-12-102

_version_	1782202951041810432
author	Tang, Haibao Lyons, Eric Pedersen, Brent Schnable, James C Paterson, Andrew H Freeling, Michael
author_facet	Tang, Haibao Lyons, Eric Pedersen, Brent Schnable, James C Paterson, Andrew H Freeling, Michael
author_sort	Tang, Haibao
collection	PubMed
description	BACKGROUND: It is difficult to accurately interpret chromosomal correspondences such as true orthology and paralogy due to significant divergence of genomes from a common ancestor. Analyses are particularly problematic among lineages that have repeatedly experienced whole genome duplication (WGD) events. To compare multiple "subgenomes" derived from genome duplications, we need to relax the traditional requirements of "one-to-one" syntenic matchings of genomic regions in order to reflect "one-to-many" or more generally "many-to-many" matchings. However this relaxation may result in the identification of synteny blocks that are derived from ancient shared WGDs that are not of interest. For many downstream analyses, we need to eliminate weak, low scoring alignments from pairwise genome comparisons. Our goal is to objectively select subset of synteny blocks whose total scores are maximized while respecting the duplication history of the genomes in comparison. We call this "quota-based" screening of synteny blocks in order to appropriately fill a quota of syntenic relationships within one genome or between two genomes having WGD events. RESULTS: We have formulated the synteny block screening as an optimization problem known as "Binary Integer Programming" (BIP), which is solved using existing linear programming solvers. The computer program QUOTA-ALIGN performs this task by creating a clear objective function that maximizes the compatible set of synteny blocks under given constraints on overlaps and depths (corresponding to the duplication history in respective genomes). Such a procedure is useful for any pairwise synteny alignments, but is most useful in lineages affected by multiple WGDs, like plants or fish lineages. For example, there should be a 1:2 ploidy relationship between genome A and B if genome B had an independent WGD subsequent to the divergence of the two genomes. We show through simulations and real examples using plant genomes in the rosid superorder that the quota-based screening can eliminate ambiguous synteny blocks and focus on specific genomic evolutionary events, like the divergence of lineages (in cross-species comparisons) and the most recent WGD (in self comparisons). CONCLUSIONS: The QUOTA-ALIGN algorithm screens a set of synteny blocks to retain only those compatible with a user specified ploidy relationship between two genomes. These blocks, in turn, may be used for additional downstream analyses such as identifying true orthologous regions in interspecific comparisons. There are two major contributions of QUOTA-ALIGN: 1) reducing the block screening task to a BIP problem, which is novel; 2) providing an efficient software pipeline starting from all-against-all BLAST to the screened synteny blocks with dot plot visualizations. Python codes and full documentations are publicly available http://github.com/tanghaibao/quota-alignment. QUOTA-ALIGN program is also integrated as a major component in SynMap http://genomevolution.com/CoGe/SynMap.pl, offering easier access to thousands of genomes for non-programmers.
format	Text
id	pubmed-3088904
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-30889042011-05-07 Screening synteny blocks in pairwise genome comparisons through integer programming Tang, Haibao Lyons, Eric Pedersen, Brent Schnable, James C Paterson, Andrew H Freeling, Michael BMC Bioinformatics Methodology Article BACKGROUND: It is difficult to accurately interpret chromosomal correspondences such as true orthology and paralogy due to significant divergence of genomes from a common ancestor. Analyses are particularly problematic among lineages that have repeatedly experienced whole genome duplication (WGD) events. To compare multiple "subgenomes" derived from genome duplications, we need to relax the traditional requirements of "one-to-one" syntenic matchings of genomic regions in order to reflect "one-to-many" or more generally "many-to-many" matchings. However this relaxation may result in the identification of synteny blocks that are derived from ancient shared WGDs that are not of interest. For many downstream analyses, we need to eliminate weak, low scoring alignments from pairwise genome comparisons. Our goal is to objectively select subset of synteny blocks whose total scores are maximized while respecting the duplication history of the genomes in comparison. We call this "quota-based" screening of synteny blocks in order to appropriately fill a quota of syntenic relationships within one genome or between two genomes having WGD events. RESULTS: We have formulated the synteny block screening as an optimization problem known as "Binary Integer Programming" (BIP), which is solved using existing linear programming solvers. The computer program QUOTA-ALIGN performs this task by creating a clear objective function that maximizes the compatible set of synteny blocks under given constraints on overlaps and depths (corresponding to the duplication history in respective genomes). Such a procedure is useful for any pairwise synteny alignments, but is most useful in lineages affected by multiple WGDs, like plants or fish lineages. For example, there should be a 1:2 ploidy relationship between genome A and B if genome B had an independent WGD subsequent to the divergence of the two genomes. We show through simulations and real examples using plant genomes in the rosid superorder that the quota-based screening can eliminate ambiguous synteny blocks and focus on specific genomic evolutionary events, like the divergence of lineages (in cross-species comparisons) and the most recent WGD (in self comparisons). CONCLUSIONS: The QUOTA-ALIGN algorithm screens a set of synteny blocks to retain only those compatible with a user specified ploidy relationship between two genomes. These blocks, in turn, may be used for additional downstream analyses such as identifying true orthologous regions in interspecific comparisons. There are two major contributions of QUOTA-ALIGN: 1) reducing the block screening task to a BIP problem, which is novel; 2) providing an efficient software pipeline starting from all-against-all BLAST to the screened synteny blocks with dot plot visualizations. Python codes and full documentations are publicly available http://github.com/tanghaibao/quota-alignment. QUOTA-ALIGN program is also integrated as a major component in SynMap http://genomevolution.com/CoGe/SynMap.pl, offering easier access to thousands of genomes for non-programmers. BioMed Central 2011-04-18 /pmc/articles/PMC3088904/ /pubmed/21501495 http://dx.doi.org/10.1186/1471-2105-12-102 Text en Copyright ©2011 Tang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Tang, Haibao Lyons, Eric Pedersen, Brent Schnable, James C Paterson, Andrew H Freeling, Michael Screening synteny blocks in pairwise genome comparisons through integer programming
title	Screening synteny blocks in pairwise genome comparisons through integer programming
title_full	Screening synteny blocks in pairwise genome comparisons through integer programming
title_fullStr	Screening synteny blocks in pairwise genome comparisons through integer programming
title_full_unstemmed	Screening synteny blocks in pairwise genome comparisons through integer programming
title_short	Screening synteny blocks in pairwise genome comparisons through integer programming
title_sort	screening synteny blocks in pairwise genome comparisons through integer programming
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3088904/ https://www.ncbi.nlm.nih.gov/pubmed/21501495 http://dx.doi.org/10.1186/1471-2105-12-102
work_keys_str_mv	AT tanghaibao screeningsyntenyblocksinpairwisegenomecomparisonsthroughintegerprogramming AT lyonseric screeningsyntenyblocksinpairwisegenomecomparisonsthroughintegerprogramming AT pedersenbrent screeningsyntenyblocksinpairwisegenomecomparisonsthroughintegerprogramming AT schnablejamesc screeningsyntenyblocksinpairwisegenomecomparisonsthroughintegerprogramming AT patersonandrewh screeningsyntenyblocksinpairwisegenomecomparisonsthroughintegerprogramming AT freelingmichael screeningsyntenyblocksinpairwisegenomecomparisonsthroughintegerprogramming

Screening synteny blocks in pairwise genome comparisons through integer programming

Ejemplares similares