Cargando…

Copy number variation signature to predict human ancestry

BACKGROUND: Copy number variations (CNVs) are genomic structural variants that are found in healthy populations and have been observed to be associated with disease susceptibility. Existing methods for CNV detection are often performed on a sample-by-sample basis, which is not ideal for large datase...

Descripción completa

Detalles Bibliográficos
Autores principales: Pronold, Melissa, Vali, Marzieh, Pique-Regi, Roger, Asgharzadeh, Shahab
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3598683/
https://www.ncbi.nlm.nih.gov/pubmed/23270563
http://dx.doi.org/10.1186/1471-2105-13-336
_version_ 1782262799597043712
author Pronold, Melissa
Vali, Marzieh
Pique-Regi, Roger
Asgharzadeh, Shahab
author_facet Pronold, Melissa
Vali, Marzieh
Pique-Regi, Roger
Asgharzadeh, Shahab
author_sort Pronold, Melissa
collection PubMed
description BACKGROUND: Copy number variations (CNVs) are genomic structural variants that are found in healthy populations and have been observed to be associated with disease susceptibility. Existing methods for CNV detection are often performed on a sample-by-sample basis, which is not ideal for large datasets where common CNVs must be estimated by comparing the frequency of CNVs in the individual samples. Here we describe a simple and novel approach to locate genome-wide CNVs common to a specific population, using human ancestry as the phenotype. RESULTS: We utilized our previously published Genome Alteration Detection Analysis (GADA) algorithm to identify common ancestry CNVs (caCNVs) and built a caCNV model to predict population structure. We identified a 73 caCNV signature using a training set of 225 healthy individuals from European, Asian, and African ancestry. The signature was validated on an independent test set of 300 individuals with similar ancestral background. The error rate in predicting ancestry in this test set was 2% using the 73 caCNV signature. Among the caCNVs identified, several were previously confirmed experimentally to vary by ancestry. Our signature also contains a caCNV region with a single microRNA (MIR270), which represents the first reported variation of microRNA by ancestry. CONCLUSIONS: We developed a new methodology to identify common CNVs and demonstrated its performance by building a caCNV signature to predict human ancestry with high accuracy. The utility of our approach could be extended to large case–control studies to identify CNV signatures for other phenotypes such as disease susceptibility and drug response.
format Online
Article
Text
id pubmed-3598683
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35986832013-03-20 Copy number variation signature to predict human ancestry Pronold, Melissa Vali, Marzieh Pique-Regi, Roger Asgharzadeh, Shahab BMC Bioinformatics Methodology Article BACKGROUND: Copy number variations (CNVs) are genomic structural variants that are found in healthy populations and have been observed to be associated with disease susceptibility. Existing methods for CNV detection are often performed on a sample-by-sample basis, which is not ideal for large datasets where common CNVs must be estimated by comparing the frequency of CNVs in the individual samples. Here we describe a simple and novel approach to locate genome-wide CNVs common to a specific population, using human ancestry as the phenotype. RESULTS: We utilized our previously published Genome Alteration Detection Analysis (GADA) algorithm to identify common ancestry CNVs (caCNVs) and built a caCNV model to predict population structure. We identified a 73 caCNV signature using a training set of 225 healthy individuals from European, Asian, and African ancestry. The signature was validated on an independent test set of 300 individuals with similar ancestral background. The error rate in predicting ancestry in this test set was 2% using the 73 caCNV signature. Among the caCNVs identified, several were previously confirmed experimentally to vary by ancestry. Our signature also contains a caCNV region with a single microRNA (MIR270), which represents the first reported variation of microRNA by ancestry. CONCLUSIONS: We developed a new methodology to identify common CNVs and demonstrated its performance by building a caCNV signature to predict human ancestry with high accuracy. The utility of our approach could be extended to large case–control studies to identify CNV signatures for other phenotypes such as disease susceptibility and drug response. BioMed Central 2012-12-27 /pmc/articles/PMC3598683/ /pubmed/23270563 http://dx.doi.org/10.1186/1471-2105-13-336 Text en Copyright ©2012 Pronold et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Pronold, Melissa
Vali, Marzieh
Pique-Regi, Roger
Asgharzadeh, Shahab
Copy number variation signature to predict human ancestry
title Copy number variation signature to predict human ancestry
title_full Copy number variation signature to predict human ancestry
title_fullStr Copy number variation signature to predict human ancestry
title_full_unstemmed Copy number variation signature to predict human ancestry
title_short Copy number variation signature to predict human ancestry
title_sort copy number variation signature to predict human ancestry
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3598683/
https://www.ncbi.nlm.nih.gov/pubmed/23270563
http://dx.doi.org/10.1186/1471-2105-13-336
work_keys_str_mv AT pronoldmelissa copynumbervariationsignaturetopredicthumanancestry
AT valimarzieh copynumbervariationsignaturetopredicthumanancestry
AT piqueregiroger copynumbervariationsignaturetopredicthumanancestry
AT asgharzadehshahab copynumbervariationsignaturetopredicthumanancestry