Cargando…

Inferring Haplotypes of Copy Number Variations From High-Throughput Data With Uncertainty

Accurate information on haplotypes and diplotypes (haplotype pairs) is required for population-genetic analyses; however, microarrays do not provide data on a haplotype or diplotype at a copy number variation (CNV) locus; they only provide data on the total number of copies over a diplotype or an un...

Descripción completa

Detalles Bibliográficos
Autores principales: Kato, Mamoru, Yoon, Seungtai, Hosono, Naoya, Leotta, Anthony, Sebat, Jonathan, Tsunoda, Tatsuhiko, Zhang, Michael Q.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3276117/
https://www.ncbi.nlm.nih.gov/pubmed/22384316
http://dx.doi.org/10.1534/g3.111.000174
_version_ 1782223330632269824
author Kato, Mamoru
Yoon, Seungtai
Hosono, Naoya
Leotta, Anthony
Sebat, Jonathan
Tsunoda, Tatsuhiko
Zhang, Michael Q.
author_facet Kato, Mamoru
Yoon, Seungtai
Hosono, Naoya
Leotta, Anthony
Sebat, Jonathan
Tsunoda, Tatsuhiko
Zhang, Michael Q.
author_sort Kato, Mamoru
collection PubMed
description Accurate information on haplotypes and diplotypes (haplotype pairs) is required for population-genetic analyses; however, microarrays do not provide data on a haplotype or diplotype at a copy number variation (CNV) locus; they only provide data on the total number of copies over a diplotype or an unphased sequence genotype (e.g., AAB, unlike AB of single nucleotide polymorphism). Moreover, such copy numbers or genotypes are often incorrectly determined when microarray signal intensities derived from different copy numbers or genotypes are not clearly separated due to noise. Here we report an algorithm to infer CNV haplotypes and individuals’ diplotypes at multiple loci from noisy microarray data, utilizing the probability that a signal intensity may be derived from different underlying copy numbers or genotypes. Performing simulation studies based on known diplotypes and an error model obtained from real microarray data, we demonstrate that this probabilistic approach succeeds in accurate inference (error rate: 1–2%) from noisy data, whereas previous deterministic approaches failed (error rate: 12–18%). Applying this algorithm to real microarray data, we estimated haplotype frequencies and diplotypes in 1486 CNV regions for 100 individuals. Our algorithm will facilitate accurate population-genetic analyses and powerful disease association studies of CNVs.
format Online
Article
Text
id pubmed-3276117
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-32761172012-03-01 Inferring Haplotypes of Copy Number Variations From High-Throughput Data With Uncertainty Kato, Mamoru Yoon, Seungtai Hosono, Naoya Leotta, Anthony Sebat, Jonathan Tsunoda, Tatsuhiko Zhang, Michael Q. G3 (Bethesda) Investigation Accurate information on haplotypes and diplotypes (haplotype pairs) is required for population-genetic analyses; however, microarrays do not provide data on a haplotype or diplotype at a copy number variation (CNV) locus; they only provide data on the total number of copies over a diplotype or an unphased sequence genotype (e.g., AAB, unlike AB of single nucleotide polymorphism). Moreover, such copy numbers or genotypes are often incorrectly determined when microarray signal intensities derived from different copy numbers or genotypes are not clearly separated due to noise. Here we report an algorithm to infer CNV haplotypes and individuals’ diplotypes at multiple loci from noisy microarray data, utilizing the probability that a signal intensity may be derived from different underlying copy numbers or genotypes. Performing simulation studies based on known diplotypes and an error model obtained from real microarray data, we demonstrate that this probabilistic approach succeeds in accurate inference (error rate: 1–2%) from noisy data, whereas previous deterministic approaches failed (error rate: 12–18%). Applying this algorithm to real microarray data, we estimated haplotype frequencies and diplotypes in 1486 CNV regions for 100 individuals. Our algorithm will facilitate accurate population-genetic analyses and powerful disease association studies of CNVs. Genetics Society of America 2011-06-01 /pmc/articles/PMC3276117/ /pubmed/22384316 http://dx.doi.org/10.1534/g3.111.000174 Text en Copyright © 2011 Kato et al. http://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Unported License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Investigation
Kato, Mamoru
Yoon, Seungtai
Hosono, Naoya
Leotta, Anthony
Sebat, Jonathan
Tsunoda, Tatsuhiko
Zhang, Michael Q.
Inferring Haplotypes of Copy Number Variations From High-Throughput Data With Uncertainty
title Inferring Haplotypes of Copy Number Variations From High-Throughput Data With Uncertainty
title_full Inferring Haplotypes of Copy Number Variations From High-Throughput Data With Uncertainty
title_fullStr Inferring Haplotypes of Copy Number Variations From High-Throughput Data With Uncertainty
title_full_unstemmed Inferring Haplotypes of Copy Number Variations From High-Throughput Data With Uncertainty
title_short Inferring Haplotypes of Copy Number Variations From High-Throughput Data With Uncertainty
title_sort inferring haplotypes of copy number variations from high-throughput data with uncertainty
topic Investigation
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3276117/
https://www.ncbi.nlm.nih.gov/pubmed/22384316
http://dx.doi.org/10.1534/g3.111.000174
work_keys_str_mv AT katomamoru inferringhaplotypesofcopynumbervariationsfromhighthroughputdatawithuncertainty
AT yoonseungtai inferringhaplotypesofcopynumbervariationsfromhighthroughputdatawithuncertainty
AT hosononaoya inferringhaplotypesofcopynumbervariationsfromhighthroughputdatawithuncertainty
AT leottaanthony inferringhaplotypesofcopynumbervariationsfromhighthroughputdatawithuncertainty
AT sebatjonathan inferringhaplotypesofcopynumbervariationsfromhighthroughputdatawithuncertainty
AT tsunodatatsuhiko inferringhaplotypesofcopynumbervariationsfromhighthroughputdatawithuncertainty
AT zhangmichaelq inferringhaplotypesofcopynumbervariationsfromhighthroughputdatawithuncertainty