Cargando…

Fractionation statistics

BACKGROUND: Paralog reduction, the loss of duplicate genes after whole genome duplication (WGD) is a pervasive process. Whether this loss proceeds gene by gene or through deletion of multi-gene DNA segments is controversial, as is the question of fractionation bias, namely whether one homeologous ch...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Baoyong, Zheng, Chunfang, Sankoff, David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3283312/
https://www.ncbi.nlm.nih.gov/pubmed/22152148
http://dx.doi.org/10.1186/1471-2105-12-S9-S5
_version_ 1782224182741827584
author Wang, Baoyong
Zheng, Chunfang
Sankoff, David
author_facet Wang, Baoyong
Zheng, Chunfang
Sankoff, David
author_sort Wang, Baoyong
collection PubMed
description BACKGROUND: Paralog reduction, the loss of duplicate genes after whole genome duplication (WGD) is a pervasive process. Whether this loss proceeds gene by gene or through deletion of multi-gene DNA segments is controversial, as is the question of fractionation bias, namely whether one homeologous chromosome is more vulnerable to gene deletion than the other. RESULTS: As a null hypothesis, we first assume deletion events, on one homeolog only, excise a geometrically distributed number of genes with unknown mean µ, and these events combine to produce deleted runs of length l, distributed approximately as a negative binomial with unknown parameter r, itself a random variable with distribution π(·). A more realistic model requires deletion events on both homeologs distributed as a truncated geometric. We simulate the distribution of run lengths l in both models, as well as the underlying π(r), as a function of µ, and show how sampling l allows us to estimate µ. We apply this to data on a total of 15 genomes descended from 6 distinct WGD events and show how to correct the bias towards shorter runs caused by genome rearrangements. Because of the difficulty in deriving π(·) analytically, we develop a deterministic recurrence to calculate each π(r) as a function of µ and the proportion of unreduced paralog pairs. CONCLUSIONS: The parameter µ can be estimated based on run lengths of single-copy regions. Estimates of µ in real data do not exclude the possibility that duplicate gene deletion is largely gene by gene, although it may sometimes involve longer segments.
format Online
Article
Text
id pubmed-3283312
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32833122012-02-22 Fractionation statistics Wang, Baoyong Zheng, Chunfang Sankoff, David BMC Bioinformatics Proceedings BACKGROUND: Paralog reduction, the loss of duplicate genes after whole genome duplication (WGD) is a pervasive process. Whether this loss proceeds gene by gene or through deletion of multi-gene DNA segments is controversial, as is the question of fractionation bias, namely whether one homeologous chromosome is more vulnerable to gene deletion than the other. RESULTS: As a null hypothesis, we first assume deletion events, on one homeolog only, excise a geometrically distributed number of genes with unknown mean µ, and these events combine to produce deleted runs of length l, distributed approximately as a negative binomial with unknown parameter r, itself a random variable with distribution π(·). A more realistic model requires deletion events on both homeologs distributed as a truncated geometric. We simulate the distribution of run lengths l in both models, as well as the underlying π(r), as a function of µ, and show how sampling l allows us to estimate µ. We apply this to data on a total of 15 genomes descended from 6 distinct WGD events and show how to correct the bias towards shorter runs caused by genome rearrangements. Because of the difficulty in deriving π(·) analytically, we develop a deterministic recurrence to calculate each π(r) as a function of µ and the proportion of unreduced paralog pairs. CONCLUSIONS: The parameter µ can be estimated based on run lengths of single-copy regions. Estimates of µ in real data do not exclude the possibility that duplicate gene deletion is largely gene by gene, although it may sometimes involve longer segments. BioMed Central 2011-10-05 /pmc/articles/PMC3283312/ /pubmed/22152148 http://dx.doi.org/10.1186/1471-2105-12-S9-S5 Text en Copyright ©2011 Wang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Wang, Baoyong
Zheng, Chunfang
Sankoff, David
Fractionation statistics
title Fractionation statistics
title_full Fractionation statistics
title_fullStr Fractionation statistics
title_full_unstemmed Fractionation statistics
title_short Fractionation statistics
title_sort fractionation statistics
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3283312/
https://www.ncbi.nlm.nih.gov/pubmed/22152148
http://dx.doi.org/10.1186/1471-2105-12-S9-S5
work_keys_str_mv AT wangbaoyong fractionationstatistics
AT zhengchunfang fractionationstatistics
AT sankoffdavid fractionationstatistics