Cargando…
Inferring Haplotypes of Copy Number Variations From High-Throughput Data With Uncertainty
Accurate information on haplotypes and diplotypes (haplotype pairs) is required for population-genetic analyses; however, microarrays do not provide data on a haplotype or diplotype at a copy number variation (CNV) locus; they only provide data on the total number of copies over a diplotype or an un...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Genetics Society of America
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3276117/ https://www.ncbi.nlm.nih.gov/pubmed/22384316 http://dx.doi.org/10.1534/g3.111.000174 |
_version_ | 1782223330632269824 |
---|---|
author | Kato, Mamoru Yoon, Seungtai Hosono, Naoya Leotta, Anthony Sebat, Jonathan Tsunoda, Tatsuhiko Zhang, Michael Q. |
author_facet | Kato, Mamoru Yoon, Seungtai Hosono, Naoya Leotta, Anthony Sebat, Jonathan Tsunoda, Tatsuhiko Zhang, Michael Q. |
author_sort | Kato, Mamoru |
collection | PubMed |
description | Accurate information on haplotypes and diplotypes (haplotype pairs) is required for population-genetic analyses; however, microarrays do not provide data on a haplotype or diplotype at a copy number variation (CNV) locus; they only provide data on the total number of copies over a diplotype or an unphased sequence genotype (e.g., AAB, unlike AB of single nucleotide polymorphism). Moreover, such copy numbers or genotypes are often incorrectly determined when microarray signal intensities derived from different copy numbers or genotypes are not clearly separated due to noise. Here we report an algorithm to infer CNV haplotypes and individuals’ diplotypes at multiple loci from noisy microarray data, utilizing the probability that a signal intensity may be derived from different underlying copy numbers or genotypes. Performing simulation studies based on known diplotypes and an error model obtained from real microarray data, we demonstrate that this probabilistic approach succeeds in accurate inference (error rate: 1–2%) from noisy data, whereas previous deterministic approaches failed (error rate: 12–18%). Applying this algorithm to real microarray data, we estimated haplotype frequencies and diplotypes in 1486 CNV regions for 100 individuals. Our algorithm will facilitate accurate population-genetic analyses and powerful disease association studies of CNVs. |
format | Online Article Text |
id | pubmed-3276117 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | Genetics Society of America |
record_format | MEDLINE/PubMed |
spelling | pubmed-32761172012-03-01 Inferring Haplotypes of Copy Number Variations From High-Throughput Data With Uncertainty Kato, Mamoru Yoon, Seungtai Hosono, Naoya Leotta, Anthony Sebat, Jonathan Tsunoda, Tatsuhiko Zhang, Michael Q. G3 (Bethesda) Investigation Accurate information on haplotypes and diplotypes (haplotype pairs) is required for population-genetic analyses; however, microarrays do not provide data on a haplotype or diplotype at a copy number variation (CNV) locus; they only provide data on the total number of copies over a diplotype or an unphased sequence genotype (e.g., AAB, unlike AB of single nucleotide polymorphism). Moreover, such copy numbers or genotypes are often incorrectly determined when microarray signal intensities derived from different copy numbers or genotypes are not clearly separated due to noise. Here we report an algorithm to infer CNV haplotypes and individuals’ diplotypes at multiple loci from noisy microarray data, utilizing the probability that a signal intensity may be derived from different underlying copy numbers or genotypes. Performing simulation studies based on known diplotypes and an error model obtained from real microarray data, we demonstrate that this probabilistic approach succeeds in accurate inference (error rate: 1–2%) from noisy data, whereas previous deterministic approaches failed (error rate: 12–18%). Applying this algorithm to real microarray data, we estimated haplotype frequencies and diplotypes in 1486 CNV regions for 100 individuals. Our algorithm will facilitate accurate population-genetic analyses and powerful disease association studies of CNVs. Genetics Society of America 2011-06-01 /pmc/articles/PMC3276117/ /pubmed/22384316 http://dx.doi.org/10.1534/g3.111.000174 Text en Copyright © 2011 Kato et al. http://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Unported License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Investigation Kato, Mamoru Yoon, Seungtai Hosono, Naoya Leotta, Anthony Sebat, Jonathan Tsunoda, Tatsuhiko Zhang, Michael Q. Inferring Haplotypes of Copy Number Variations From High-Throughput Data With Uncertainty |
title | Inferring Haplotypes of Copy Number Variations From High-Throughput Data With Uncertainty |
title_full | Inferring Haplotypes of Copy Number Variations From High-Throughput Data With Uncertainty |
title_fullStr | Inferring Haplotypes of Copy Number Variations From High-Throughput Data With Uncertainty |
title_full_unstemmed | Inferring Haplotypes of Copy Number Variations From High-Throughput Data With Uncertainty |
title_short | Inferring Haplotypes of Copy Number Variations From High-Throughput Data With Uncertainty |
title_sort | inferring haplotypes of copy number variations from high-throughput data with uncertainty |
topic | Investigation |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3276117/ https://www.ncbi.nlm.nih.gov/pubmed/22384316 http://dx.doi.org/10.1534/g3.111.000174 |
work_keys_str_mv | AT katomamoru inferringhaplotypesofcopynumbervariationsfromhighthroughputdatawithuncertainty AT yoonseungtai inferringhaplotypesofcopynumbervariationsfromhighthroughputdatawithuncertainty AT hosononaoya inferringhaplotypesofcopynumbervariationsfromhighthroughputdatawithuncertainty AT leottaanthony inferringhaplotypesofcopynumbervariationsfromhighthroughputdatawithuncertainty AT sebatjonathan inferringhaplotypesofcopynumbervariationsfromhighthroughputdatawithuncertainty AT tsunodatatsuhiko inferringhaplotypesofcopynumbervariationsfromhighthroughputdatawithuncertainty AT zhangmichaelq inferringhaplotypesofcopynumbervariationsfromhighthroughputdatawithuncertainty |