Cargando…
Information compression exploits patterns of genome composition to discriminate populations and highlight regions of evolutionary interest
BACKGROUND: Genomic information allows population relatedness to be inferred and selected genes to be identified. Single nucleotide polymorphism microarray (SNP-chip) data, a proxy for genome composition, contains patterns in allele order and proportion. These patterns can be quantified by compressi...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015654/ https://www.ncbi.nlm.nih.gov/pubmed/24606587 http://dx.doi.org/10.1186/1471-2105-15-66 |
_version_ | 1782315371359895552 |
---|---|
author | Hudson, Nicholas J Porto-Neto, Laercio R Kijas, James McWilliam, Sean Taft, Ryan J Reverter, Antonio |
author_facet | Hudson, Nicholas J Porto-Neto, Laercio R Kijas, James McWilliam, Sean Taft, Ryan J Reverter, Antonio |
author_sort | Hudson, Nicholas J |
collection | PubMed |
description | BACKGROUND: Genomic information allows population relatedness to be inferred and selected genes to be identified. Single nucleotide polymorphism microarray (SNP-chip) data, a proxy for genome composition, contains patterns in allele order and proportion. These patterns can be quantified by compression efficiency (CE). In principle, the composition of an entire genome can be represented by a CE number quantifying allele representation and order. RESULTS: We applied a compression algorithm (DEFLATE) to genome-wide high-density SNP data from 4,155 human, 1,800 cattle, 1,222 sheep, 81 dogs and 49 mice samples. All human ethnic groups can be clustered by CE and the clusters recover phylogeography based on traditional fixation index (F(ST)) analyses. CE analysis of other mammals results in segregation by breed or species, and is sensitive to admixture and past effective population size. This clustering is a consequence of individual patterns such as runs of homozygosity. Intriguingly, a related approach can also be used to identify genomic loci that show population-specific CE segregation. A high resolution CE ‘sliding window’ scan across the human genome, organised at the population level, revealed genes known to be under evolutionary pressure. These include SLC24A5 (European and Gujarati Indian skin pigmentation), HERC2 (European eye color), LCT (European and Maasai milk digestion) and EDAR (Asian hair thickness). We also identified a set of previously unidentified loci with high population-specific CE scores including the chromatin remodeler SCMH1 in Africans and EDA2R in Asians. Closer inspection reveals that these prioritised genomic regions do not correspond to simple runs of homozygosity but rather compositionally complex regions that are shared by many individuals of a given population. Unlike F(ST), CE analyses do not require ab initio population comparisons and are amenable to the hemizygous X chromosome. CONCLUSIONS: We conclude with a discussion of the implications of CE for a complex systems science view of genome evolution. CE allows one to clearly visualise the evolution of individual genomes and populations through a formal, mathematically-rigorous information space. Overall, CE makes a set of biological predictions, some of which are unique and await functional validation. |
format | Online Article Text |
id | pubmed-4015654 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-40156542014-05-23 Information compression exploits patterns of genome composition to discriminate populations and highlight regions of evolutionary interest Hudson, Nicholas J Porto-Neto, Laercio R Kijas, James McWilliam, Sean Taft, Ryan J Reverter, Antonio BMC Bioinformatics Research Article BACKGROUND: Genomic information allows population relatedness to be inferred and selected genes to be identified. Single nucleotide polymorphism microarray (SNP-chip) data, a proxy for genome composition, contains patterns in allele order and proportion. These patterns can be quantified by compression efficiency (CE). In principle, the composition of an entire genome can be represented by a CE number quantifying allele representation and order. RESULTS: We applied a compression algorithm (DEFLATE) to genome-wide high-density SNP data from 4,155 human, 1,800 cattle, 1,222 sheep, 81 dogs and 49 mice samples. All human ethnic groups can be clustered by CE and the clusters recover phylogeography based on traditional fixation index (F(ST)) analyses. CE analysis of other mammals results in segregation by breed or species, and is sensitive to admixture and past effective population size. This clustering is a consequence of individual patterns such as runs of homozygosity. Intriguingly, a related approach can also be used to identify genomic loci that show population-specific CE segregation. A high resolution CE ‘sliding window’ scan across the human genome, organised at the population level, revealed genes known to be under evolutionary pressure. These include SLC24A5 (European and Gujarati Indian skin pigmentation), HERC2 (European eye color), LCT (European and Maasai milk digestion) and EDAR (Asian hair thickness). We also identified a set of previously unidentified loci with high population-specific CE scores including the chromatin remodeler SCMH1 in Africans and EDA2R in Asians. Closer inspection reveals that these prioritised genomic regions do not correspond to simple runs of homozygosity but rather compositionally complex regions that are shared by many individuals of a given population. Unlike F(ST), CE analyses do not require ab initio population comparisons and are amenable to the hemizygous X chromosome. CONCLUSIONS: We conclude with a discussion of the implications of CE for a complex systems science view of genome evolution. CE allows one to clearly visualise the evolution of individual genomes and populations through a formal, mathematically-rigorous information space. Overall, CE makes a set of biological predictions, some of which are unique and await functional validation. BioMed Central 2014-03-07 /pmc/articles/PMC4015654/ /pubmed/24606587 http://dx.doi.org/10.1186/1471-2105-15-66 Text en Copyright © 2014 Hudson et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Hudson, Nicholas J Porto-Neto, Laercio R Kijas, James McWilliam, Sean Taft, Ryan J Reverter, Antonio Information compression exploits patterns of genome composition to discriminate populations and highlight regions of evolutionary interest |
title | Information compression exploits patterns of genome composition to discriminate populations and highlight regions of evolutionary interest |
title_full | Information compression exploits patterns of genome composition to discriminate populations and highlight regions of evolutionary interest |
title_fullStr | Information compression exploits patterns of genome composition to discriminate populations and highlight regions of evolutionary interest |
title_full_unstemmed | Information compression exploits patterns of genome composition to discriminate populations and highlight regions of evolutionary interest |
title_short | Information compression exploits patterns of genome composition to discriminate populations and highlight regions of evolutionary interest |
title_sort | information compression exploits patterns of genome composition to discriminate populations and highlight regions of evolutionary interest |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015654/ https://www.ncbi.nlm.nih.gov/pubmed/24606587 http://dx.doi.org/10.1186/1471-2105-15-66 |
work_keys_str_mv | AT hudsonnicholasj informationcompressionexploitspatternsofgenomecompositiontodiscriminatepopulationsandhighlightregionsofevolutionaryinterest AT portonetolaercior informationcompressionexploitspatternsofgenomecompositiontodiscriminatepopulationsandhighlightregionsofevolutionaryinterest AT kijasjames informationcompressionexploitspatternsofgenomecompositiontodiscriminatepopulationsandhighlightregionsofevolutionaryinterest AT mcwilliamsean informationcompressionexploitspatternsofgenomecompositiontodiscriminatepopulationsandhighlightregionsofevolutionaryinterest AT taftryanj informationcompressionexploitspatternsofgenomecompositiontodiscriminatepopulationsandhighlightregionsofevolutionaryinterest AT reverterantonio informationcompressionexploitspatternsofgenomecompositiontodiscriminatepopulationsandhighlightregionsofevolutionaryinterest |