Cargando…

Size distribution of function-based human gene sets and the split–merge model

The sizes of paralogues—gene families produced by ancestral duplication—are known to follow a power-law distribution. We examine the size distribution of gene sets or gene families where genes are grouped by a similar function or share a common property. The size distribution of Human Gene Nomenclat...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Wentian, Fontanelli, Oscar, Miramontes, Pedro
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Royal Society 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5108952/
https://www.ncbi.nlm.nih.gov/pubmed/27853602
http://dx.doi.org/10.1098/rsos.160275
_version_ 1782467447011409920
author Li, Wentian
Fontanelli, Oscar
Miramontes, Pedro
author_facet Li, Wentian
Fontanelli, Oscar
Miramontes, Pedro
author_sort Li, Wentian
collection PubMed
description The sizes of paralogues—gene families produced by ancestral duplication—are known to follow a power-law distribution. We examine the size distribution of gene sets or gene families where genes are grouped by a similar function or share a common property. The size distribution of Human Gene Nomenclature Committee (HGNC) gene sets deviate from the power-law, and can be fitted much better by a beta rank function. We propose a simple mechanism to break a power-law size distribution by a combination of splitting and merging operations. The largest gene sets are split into two to account for the subfunctional categories, and a small proportion of other gene sets are merged into larger sets as new common themes might be realized. These operations are not uncommon for a curator of gene sets. A simulation shows that iteration of these operations changes the size distribution of Ensembl paralogues and could lead to a distribution fitted by a rank beta function. We further illustrate application of beta rank function by the example of distribution of transcription factors and drug target genes among HGNC gene families.
format Online
Article
Text
id pubmed-5108952
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher The Royal Society
record_format MEDLINE/PubMed
spelling pubmed-51089522016-11-16 Size distribution of function-based human gene sets and the split–merge model Li, Wentian Fontanelli, Oscar Miramontes, Pedro R Soc Open Sci Biology (Whole Organism) The sizes of paralogues—gene families produced by ancestral duplication—are known to follow a power-law distribution. We examine the size distribution of gene sets or gene families where genes are grouped by a similar function or share a common property. The size distribution of Human Gene Nomenclature Committee (HGNC) gene sets deviate from the power-law, and can be fitted much better by a beta rank function. We propose a simple mechanism to break a power-law size distribution by a combination of splitting and merging operations. The largest gene sets are split into two to account for the subfunctional categories, and a small proportion of other gene sets are merged into larger sets as new common themes might be realized. These operations are not uncommon for a curator of gene sets. A simulation shows that iteration of these operations changes the size distribution of Ensembl paralogues and could lead to a distribution fitted by a rank beta function. We further illustrate application of beta rank function by the example of distribution of transcription factors and drug target genes among HGNC gene families. The Royal Society 2016-08-03 /pmc/articles/PMC5108952/ /pubmed/27853602 http://dx.doi.org/10.1098/rsos.160275 Text en © 2016 The Authors. http://creativecommons.org/licenses/by/4.0/ Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.
spellingShingle Biology (Whole Organism)
Li, Wentian
Fontanelli, Oscar
Miramontes, Pedro
Size distribution of function-based human gene sets and the split–merge model
title Size distribution of function-based human gene sets and the split–merge model
title_full Size distribution of function-based human gene sets and the split–merge model
title_fullStr Size distribution of function-based human gene sets and the split–merge model
title_full_unstemmed Size distribution of function-based human gene sets and the split–merge model
title_short Size distribution of function-based human gene sets and the split–merge model
title_sort size distribution of function-based human gene sets and the split–merge model
topic Biology (Whole Organism)
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5108952/
https://www.ncbi.nlm.nih.gov/pubmed/27853602
http://dx.doi.org/10.1098/rsos.160275
work_keys_str_mv AT liwentian sizedistributionoffunctionbasedhumangenesetsandthesplitmergemodel
AT fontanellioscar sizedistributionoffunctionbasedhumangenesetsandthesplitmergemodel
AT miramontespedro sizedistributionoffunctionbasedhumangenesetsandthesplitmergemodel