Cargando…

Estimating copy numbers of alleles from population-scale high-throughput sequencing data

BACKGROUND: With the recent development of microarray and high-throughput sequencing (HTS) technologies, a number of studies have revealed catalogs of copy number variants (CNVs) and their association with phenotypes and complex traits. In parallel, a number of approaches to predict CNV regions and...

Descripción completa

Detalles Bibliográficos
Autores principales: Mimori, Takahiro, Nariai, Naoki, Kojima, Kaname, Sato, Yukuto, Kawai, Yosuke, Yamaguchi-Kabata, Yumi, Nagasaki, Masao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331703/
https://www.ncbi.nlm.nih.gov/pubmed/25707811
http://dx.doi.org/10.1186/1471-2105-16-S1-S4
_version_ 1782357762484731904
author Mimori, Takahiro
Nariai, Naoki
Kojima, Kaname
Sato, Yukuto
Kawai, Yosuke
Yamaguchi-Kabata, Yumi
Nagasaki, Masao
author_facet Mimori, Takahiro
Nariai, Naoki
Kojima, Kaname
Sato, Yukuto
Kawai, Yosuke
Yamaguchi-Kabata, Yumi
Nagasaki, Masao
author_sort Mimori, Takahiro
collection PubMed
description BACKGROUND: With the recent development of microarray and high-throughput sequencing (HTS) technologies, a number of studies have revealed catalogs of copy number variants (CNVs) and their association with phenotypes and complex traits. In parallel, a number of approaches to predict CNV regions and genotypes are proposed for both microarray and HTS data. However, only a few approaches focus on haplotyping of CNV loci. RESULTS: We propose a novel approach to infer copy unit alleles and their numbers in each sample simultaneously from population-scale HTS data by variational Bayesian inference on a generative probabilistic model inspired by latent Dirichlet allocation, which is a well studied model for document classification problems. In simulation studies, we evaluated concordance between inferred and true copy unit alleles for lower-, middle-, and higher-copy number dataset, in which precision and recall were ≥ 0.9 for data with mean coverage ≥ 10× per copy unit. We also applied the approach to HTS data of 1123 samples at highly variable salivary amylase gene locus and a pseudogene locus, and confirmed consistency of the estimated alleles within samples belonging to a trio of CEPH/Utah pedigree 1463 with 11 offspring. CONCLUSIONS: Our proposed approach enables detailed analysis of copy number variations, such as association study between copy unit alleles and phenotypes or biological features including human diseases.
format Online
Article
Text
id pubmed-4331703
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43317032015-03-19 Estimating copy numbers of alleles from population-scale high-throughput sequencing data Mimori, Takahiro Nariai, Naoki Kojima, Kaname Sato, Yukuto Kawai, Yosuke Yamaguchi-Kabata, Yumi Nagasaki, Masao BMC Bioinformatics Proceedings BACKGROUND: With the recent development of microarray and high-throughput sequencing (HTS) technologies, a number of studies have revealed catalogs of copy number variants (CNVs) and their association with phenotypes and complex traits. In parallel, a number of approaches to predict CNV regions and genotypes are proposed for both microarray and HTS data. However, only a few approaches focus on haplotyping of CNV loci. RESULTS: We propose a novel approach to infer copy unit alleles and their numbers in each sample simultaneously from population-scale HTS data by variational Bayesian inference on a generative probabilistic model inspired by latent Dirichlet allocation, which is a well studied model for document classification problems. In simulation studies, we evaluated concordance between inferred and true copy unit alleles for lower-, middle-, and higher-copy number dataset, in which precision and recall were ≥ 0.9 for data with mean coverage ≥ 10× per copy unit. We also applied the approach to HTS data of 1123 samples at highly variable salivary amylase gene locus and a pseudogene locus, and confirmed consistency of the estimated alleles within samples belonging to a trio of CEPH/Utah pedigree 1463 with 11 offspring. CONCLUSIONS: Our proposed approach enables detailed analysis of copy number variations, such as association study between copy unit alleles and phenotypes or biological features including human diseases. BioMed Central 2015-01-21 /pmc/articles/PMC4331703/ /pubmed/25707811 http://dx.doi.org/10.1186/1471-2105-16-S1-S4 Text en Copyright © 2015 Mimori et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Proceedings
Mimori, Takahiro
Nariai, Naoki
Kojima, Kaname
Sato, Yukuto
Kawai, Yosuke
Yamaguchi-Kabata, Yumi
Nagasaki, Masao
Estimating copy numbers of alleles from population-scale high-throughput sequencing data
title Estimating copy numbers of alleles from population-scale high-throughput sequencing data
title_full Estimating copy numbers of alleles from population-scale high-throughput sequencing data
title_fullStr Estimating copy numbers of alleles from population-scale high-throughput sequencing data
title_full_unstemmed Estimating copy numbers of alleles from population-scale high-throughput sequencing data
title_short Estimating copy numbers of alleles from population-scale high-throughput sequencing data
title_sort estimating copy numbers of alleles from population-scale high-throughput sequencing data
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331703/
https://www.ncbi.nlm.nih.gov/pubmed/25707811
http://dx.doi.org/10.1186/1471-2105-16-S1-S4
work_keys_str_mv AT mimoritakahiro estimatingcopynumbersofallelesfrompopulationscalehighthroughputsequencingdata
AT nariainaoki estimatingcopynumbersofallelesfrompopulationscalehighthroughputsequencingdata
AT kojimakaname estimatingcopynumbersofallelesfrompopulationscalehighthroughputsequencingdata
AT satoyukuto estimatingcopynumbersofallelesfrompopulationscalehighthroughputsequencingdata
AT kawaiyosuke estimatingcopynumbersofallelesfrompopulationscalehighthroughputsequencingdata
AT yamaguchikabatayumi estimatingcopynumbersofallelesfrompopulationscalehighthroughputsequencingdata
AT nagasakimasao estimatingcopynumbersofallelesfrompopulationscalehighthroughputsequencingdata