Cargando…
Estimating copy numbers of alleles from population-scale high-throughput sequencing data
BACKGROUND: With the recent development of microarray and high-throughput sequencing (HTS) technologies, a number of studies have revealed catalogs of copy number variants (CNVs) and their association with phenotypes and complex traits. In parallel, a number of approaches to predict CNV regions and...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331703/ https://www.ncbi.nlm.nih.gov/pubmed/25707811 http://dx.doi.org/10.1186/1471-2105-16-S1-S4 |
_version_ | 1782357762484731904 |
---|---|
author | Mimori, Takahiro Nariai, Naoki Kojima, Kaname Sato, Yukuto Kawai, Yosuke Yamaguchi-Kabata, Yumi Nagasaki, Masao |
author_facet | Mimori, Takahiro Nariai, Naoki Kojima, Kaname Sato, Yukuto Kawai, Yosuke Yamaguchi-Kabata, Yumi Nagasaki, Masao |
author_sort | Mimori, Takahiro |
collection | PubMed |
description | BACKGROUND: With the recent development of microarray and high-throughput sequencing (HTS) technologies, a number of studies have revealed catalogs of copy number variants (CNVs) and their association with phenotypes and complex traits. In parallel, a number of approaches to predict CNV regions and genotypes are proposed for both microarray and HTS data. However, only a few approaches focus on haplotyping of CNV loci. RESULTS: We propose a novel approach to infer copy unit alleles and their numbers in each sample simultaneously from population-scale HTS data by variational Bayesian inference on a generative probabilistic model inspired by latent Dirichlet allocation, which is a well studied model for document classification problems. In simulation studies, we evaluated concordance between inferred and true copy unit alleles for lower-, middle-, and higher-copy number dataset, in which precision and recall were ≥ 0.9 for data with mean coverage ≥ 10× per copy unit. We also applied the approach to HTS data of 1123 samples at highly variable salivary amylase gene locus and a pseudogene locus, and confirmed consistency of the estimated alleles within samples belonging to a trio of CEPH/Utah pedigree 1463 with 11 offspring. CONCLUSIONS: Our proposed approach enables detailed analysis of copy number variations, such as association study between copy unit alleles and phenotypes or biological features including human diseases. |
format | Online Article Text |
id | pubmed-4331703 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-43317032015-03-19 Estimating copy numbers of alleles from population-scale high-throughput sequencing data Mimori, Takahiro Nariai, Naoki Kojima, Kaname Sato, Yukuto Kawai, Yosuke Yamaguchi-Kabata, Yumi Nagasaki, Masao BMC Bioinformatics Proceedings BACKGROUND: With the recent development of microarray and high-throughput sequencing (HTS) technologies, a number of studies have revealed catalogs of copy number variants (CNVs) and their association with phenotypes and complex traits. In parallel, a number of approaches to predict CNV regions and genotypes are proposed for both microarray and HTS data. However, only a few approaches focus on haplotyping of CNV loci. RESULTS: We propose a novel approach to infer copy unit alleles and their numbers in each sample simultaneously from population-scale HTS data by variational Bayesian inference on a generative probabilistic model inspired by latent Dirichlet allocation, which is a well studied model for document classification problems. In simulation studies, we evaluated concordance between inferred and true copy unit alleles for lower-, middle-, and higher-copy number dataset, in which precision and recall were ≥ 0.9 for data with mean coverage ≥ 10× per copy unit. We also applied the approach to HTS data of 1123 samples at highly variable salivary amylase gene locus and a pseudogene locus, and confirmed consistency of the estimated alleles within samples belonging to a trio of CEPH/Utah pedigree 1463 with 11 offspring. CONCLUSIONS: Our proposed approach enables detailed analysis of copy number variations, such as association study between copy unit alleles and phenotypes or biological features including human diseases. BioMed Central 2015-01-21 /pmc/articles/PMC4331703/ /pubmed/25707811 http://dx.doi.org/10.1186/1471-2105-16-S1-S4 Text en Copyright © 2015 Mimori et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Proceedings Mimori, Takahiro Nariai, Naoki Kojima, Kaname Sato, Yukuto Kawai, Yosuke Yamaguchi-Kabata, Yumi Nagasaki, Masao Estimating copy numbers of alleles from population-scale high-throughput sequencing data |
title | Estimating copy numbers of alleles from population-scale high-throughput sequencing data |
title_full | Estimating copy numbers of alleles from population-scale high-throughput sequencing data |
title_fullStr | Estimating copy numbers of alleles from population-scale high-throughput sequencing data |
title_full_unstemmed | Estimating copy numbers of alleles from population-scale high-throughput sequencing data |
title_short | Estimating copy numbers of alleles from population-scale high-throughput sequencing data |
title_sort | estimating copy numbers of alleles from population-scale high-throughput sequencing data |
topic | Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331703/ https://www.ncbi.nlm.nih.gov/pubmed/25707811 http://dx.doi.org/10.1186/1471-2105-16-S1-S4 |
work_keys_str_mv | AT mimoritakahiro estimatingcopynumbersofallelesfrompopulationscalehighthroughputsequencingdata AT nariainaoki estimatingcopynumbersofallelesfrompopulationscalehighthroughputsequencingdata AT kojimakaname estimatingcopynumbersofallelesfrompopulationscalehighthroughputsequencingdata AT satoyukuto estimatingcopynumbersofallelesfrompopulationscalehighthroughputsequencingdata AT kawaiyosuke estimatingcopynumbersofallelesfrompopulationscalehighthroughputsequencingdata AT yamaguchikabatayumi estimatingcopynumbersofallelesfrompopulationscalehighthroughputsequencingdata AT nagasakimasao estimatingcopynumbersofallelesfrompopulationscalehighthroughputsequencingdata |