Cargando…

ARG-based genome-wide analysis of cacao cultivars

BACKGROUND: Ancestral recombinations graph (ARG) is a topological structure that captures the relationship between the extant genomic sequences in terms of genetic events including recombinations. IRiS is a system that estimates the ARG on sequences of individuals, at genomic scales, capturing the r...

Descripción completa

Detalles Bibliográficos
Autores principales: Utro, Filippo, Cornejo, Omar Eduardo, Livingstone, Donald, Motamayor, Juan Carlos, Parida, Laxmi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3526434/
https://www.ncbi.nlm.nih.gov/pubmed/23281769
http://dx.doi.org/10.1186/1471-2105-13-S19-S17
_version_ 1782253559492902912
author Utro, Filippo
Cornejo, Omar Eduardo
Livingstone, Donald
Motamayor, Juan Carlos
Parida, Laxmi
author_facet Utro, Filippo
Cornejo, Omar Eduardo
Livingstone, Donald
Motamayor, Juan Carlos
Parida, Laxmi
author_sort Utro, Filippo
collection PubMed
description BACKGROUND: Ancestral recombinations graph (ARG) is a topological structure that captures the relationship between the extant genomic sequences in terms of genetic events including recombinations. IRiS is a system that estimates the ARG on sequences of individuals, at genomic scales, capturing the relationship between these individuals of the species. Recently, this system was used to estimate the ARG of the recombining X Chromosome of a collection of human populations using relatively dense, bi-allelic SNP data. RESULTS: While the ARG is a natural model for capturing the inter-relationship between a single chromosome of the individuals of a species, it is not immediately apparent how the model can utilize whole-genome (across chromosomes) diploid data. Also, the sheer complexity of an ARG structure presents a challenge to graph visualization techniques. In this paper we examine the ARG reconstruction for (1) genome-wide or multiple chromosomes, (2) multi-allelic and (3) extremely sparse data. To aid in the visualization of the results of the reconstructed ARG, we additionally construct a much simplified topology, a classification tree, suggested by the ARG. As the test case, we study the problem of extracting the relationship between populations of Theobroma cacao. The chocolate tree is an outcrossing species in the wild, due to self-incompatibility mechanisms at play. Thus a principled approach to understanding the inter-relationships between the different populations must take the shuffling of the genomic segments into account. The polymorphisms in the test data are short tandem repeats (STR) and are multi-allelic (sometimes as high as 30 distinct possible values at a locus). Each is at a genomic location that is bilaterally transmitted, hence the ARG is a natural model for this data. Another characteristic of this plant data set is that while it is genome-wide, across 10 linkage groups or chromosomes, it is very sparse, i.e., only 96 loci from a genome of approximately 400 megabases. The results are visualized both as MDS plots and as classification trees. To evaluate the accuracy of the ARG approach, we compare the results with those available in literature. CONCLUSIONS: We have extended the ARG model to incorporate genome-wide (ensemble of multiple chromosomes) data in a natural way. We present a simple scheme to implement this in practice. Finally, this is the first time that a plant population data set is being studied by estimating its underlying ARG. We demonstrate an overall precision of 0.92 and an overall recall of 0.93 of the ARG-based classification, with respect to the gold standard. While we have corroborated the classification of the samples with that in literature, this opens the door to other potential studies that can be made on the ARG.
format Online
Article
Text
id pubmed-3526434
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35264342013-01-10 ARG-based genome-wide analysis of cacao cultivars Utro, Filippo Cornejo, Omar Eduardo Livingstone, Donald Motamayor, Juan Carlos Parida, Laxmi BMC Bioinformatics Proceedings BACKGROUND: Ancestral recombinations graph (ARG) is a topological structure that captures the relationship between the extant genomic sequences in terms of genetic events including recombinations. IRiS is a system that estimates the ARG on sequences of individuals, at genomic scales, capturing the relationship between these individuals of the species. Recently, this system was used to estimate the ARG of the recombining X Chromosome of a collection of human populations using relatively dense, bi-allelic SNP data. RESULTS: While the ARG is a natural model for capturing the inter-relationship between a single chromosome of the individuals of a species, it is not immediately apparent how the model can utilize whole-genome (across chromosomes) diploid data. Also, the sheer complexity of an ARG structure presents a challenge to graph visualization techniques. In this paper we examine the ARG reconstruction for (1) genome-wide or multiple chromosomes, (2) multi-allelic and (3) extremely sparse data. To aid in the visualization of the results of the reconstructed ARG, we additionally construct a much simplified topology, a classification tree, suggested by the ARG. As the test case, we study the problem of extracting the relationship between populations of Theobroma cacao. The chocolate tree is an outcrossing species in the wild, due to self-incompatibility mechanisms at play. Thus a principled approach to understanding the inter-relationships between the different populations must take the shuffling of the genomic segments into account. The polymorphisms in the test data are short tandem repeats (STR) and are multi-allelic (sometimes as high as 30 distinct possible values at a locus). Each is at a genomic location that is bilaterally transmitted, hence the ARG is a natural model for this data. Another characteristic of this plant data set is that while it is genome-wide, across 10 linkage groups or chromosomes, it is very sparse, i.e., only 96 loci from a genome of approximately 400 megabases. The results are visualized both as MDS plots and as classification trees. To evaluate the accuracy of the ARG approach, we compare the results with those available in literature. CONCLUSIONS: We have extended the ARG model to incorporate genome-wide (ensemble of multiple chromosomes) data in a natural way. We present a simple scheme to implement this in practice. Finally, this is the first time that a plant population data set is being studied by estimating its underlying ARG. We demonstrate an overall precision of 0.92 and an overall recall of 0.93 of the ARG-based classification, with respect to the gold standard. While we have corroborated the classification of the samples with that in literature, this opens the door to other potential studies that can be made on the ARG. BioMed Central 2012-12-19 /pmc/articles/PMC3526434/ /pubmed/23281769 http://dx.doi.org/10.1186/1471-2105-13-S19-S17 Text en Copyright ©2012 Utro et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Utro, Filippo
Cornejo, Omar Eduardo
Livingstone, Donald
Motamayor, Juan Carlos
Parida, Laxmi
ARG-based genome-wide analysis of cacao cultivars
title ARG-based genome-wide analysis of cacao cultivars
title_full ARG-based genome-wide analysis of cacao cultivars
title_fullStr ARG-based genome-wide analysis of cacao cultivars
title_full_unstemmed ARG-based genome-wide analysis of cacao cultivars
title_short ARG-based genome-wide analysis of cacao cultivars
title_sort arg-based genome-wide analysis of cacao cultivars
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3526434/
https://www.ncbi.nlm.nih.gov/pubmed/23281769
http://dx.doi.org/10.1186/1471-2105-13-S19-S17
work_keys_str_mv AT utrofilippo argbasedgenomewideanalysisofcacaocultivars
AT cornejoomareduardo argbasedgenomewideanalysisofcacaocultivars
AT livingstonedonald argbasedgenomewideanalysisofcacaocultivars
AT motamayorjuancarlos argbasedgenomewideanalysisofcacaocultivars
AT paridalaxmi argbasedgenomewideanalysisofcacaocultivars