Cargando…
Fractionation statistics
BACKGROUND: Paralog reduction, the loss of duplicate genes after whole genome duplication (WGD) is a pervasive process. Whether this loss proceeds gene by gene or through deletion of multi-gene DNA segments is controversial, as is the question of fractionation bias, namely whether one homeologous ch...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3283312/ https://www.ncbi.nlm.nih.gov/pubmed/22152148 http://dx.doi.org/10.1186/1471-2105-12-S9-S5 |
_version_ | 1782224182741827584 |
---|---|
author | Wang, Baoyong Zheng, Chunfang Sankoff, David |
author_facet | Wang, Baoyong Zheng, Chunfang Sankoff, David |
author_sort | Wang, Baoyong |
collection | PubMed |
description | BACKGROUND: Paralog reduction, the loss of duplicate genes after whole genome duplication (WGD) is a pervasive process. Whether this loss proceeds gene by gene or through deletion of multi-gene DNA segments is controversial, as is the question of fractionation bias, namely whether one homeologous chromosome is more vulnerable to gene deletion than the other. RESULTS: As a null hypothesis, we first assume deletion events, on one homeolog only, excise a geometrically distributed number of genes with unknown mean µ, and these events combine to produce deleted runs of length l, distributed approximately as a negative binomial with unknown parameter r, itself a random variable with distribution π(·). A more realistic model requires deletion events on both homeologs distributed as a truncated geometric. We simulate the distribution of run lengths l in both models, as well as the underlying π(r), as a function of µ, and show how sampling l allows us to estimate µ. We apply this to data on a total of 15 genomes descended from 6 distinct WGD events and show how to correct the bias towards shorter runs caused by genome rearrangements. Because of the difficulty in deriving π(·) analytically, we develop a deterministic recurrence to calculate each π(r) as a function of µ and the proportion of unreduced paralog pairs. CONCLUSIONS: The parameter µ can be estimated based on run lengths of single-copy regions. Estimates of µ in real data do not exclude the possibility that duplicate gene deletion is largely gene by gene, although it may sometimes involve longer segments. |
format | Online Article Text |
id | pubmed-3283312 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-32833122012-02-22 Fractionation statistics Wang, Baoyong Zheng, Chunfang Sankoff, David BMC Bioinformatics Proceedings BACKGROUND: Paralog reduction, the loss of duplicate genes after whole genome duplication (WGD) is a pervasive process. Whether this loss proceeds gene by gene or through deletion of multi-gene DNA segments is controversial, as is the question of fractionation bias, namely whether one homeologous chromosome is more vulnerable to gene deletion than the other. RESULTS: As a null hypothesis, we first assume deletion events, on one homeolog only, excise a geometrically distributed number of genes with unknown mean µ, and these events combine to produce deleted runs of length l, distributed approximately as a negative binomial with unknown parameter r, itself a random variable with distribution π(·). A more realistic model requires deletion events on both homeologs distributed as a truncated geometric. We simulate the distribution of run lengths l in both models, as well as the underlying π(r), as a function of µ, and show how sampling l allows us to estimate µ. We apply this to data on a total of 15 genomes descended from 6 distinct WGD events and show how to correct the bias towards shorter runs caused by genome rearrangements. Because of the difficulty in deriving π(·) analytically, we develop a deterministic recurrence to calculate each π(r) as a function of µ and the proportion of unreduced paralog pairs. CONCLUSIONS: The parameter µ can be estimated based on run lengths of single-copy regions. Estimates of µ in real data do not exclude the possibility that duplicate gene deletion is largely gene by gene, although it may sometimes involve longer segments. BioMed Central 2011-10-05 /pmc/articles/PMC3283312/ /pubmed/22152148 http://dx.doi.org/10.1186/1471-2105-12-S9-S5 Text en Copyright ©2011 Wang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Proceedings Wang, Baoyong Zheng, Chunfang Sankoff, David Fractionation statistics |
title | Fractionation statistics |
title_full | Fractionation statistics |
title_fullStr | Fractionation statistics |
title_full_unstemmed | Fractionation statistics |
title_short | Fractionation statistics |
title_sort | fractionation statistics |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3283312/ https://www.ncbi.nlm.nih.gov/pubmed/22152148 http://dx.doi.org/10.1186/1471-2105-12-S9-S5 |
work_keys_str_mv | AT wangbaoyong fractionationstatistics AT zhengchunfang fractionationstatistics AT sankoffdavid fractionationstatistics |