Cargando…

Inferring Variation in Copy Number Using High Throughput Sequencing Data in R

Inference of copy number variation presents a technical challenge because variant callers typically require the copy number of a genome or genomic region to be known a priori. Here we present a method to infer copy number that uses variant call format (VCF) data as input and is implemented in the R...

Descripción completa

Detalles Bibliográficos
Autores principales:	Knaus, Brian J., Grünwald, Niklaus J.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2018
Materias:	Genetics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5909048/ https://www.ncbi.nlm.nih.gov/pubmed/29706990 http://dx.doi.org/10.3389/fgene.2018.00123

_version_	1783315822759903232
author	Knaus, Brian J. Grünwald, Niklaus J.
author_facet	Knaus, Brian J. Grünwald, Niklaus J.
author_sort	Knaus, Brian J.
collection	PubMed
description	Inference of copy number variation presents a technical challenge because variant callers typically require the copy number of a genome or genomic region to be known a priori. Here we present a method to infer copy number that uses variant call format (VCF) data as input and is implemented in the R package vcfR. This method is based on the relative frequency of each allele (in both genic and non-genic regions) sequenced at heterozygous positions throughout a genome. These heterozygous positions are summarized by using arbitrarily sized windows of heterozygous positions, binning the allele frequencies, and selecting the bin with the greatest abundance of positions. This provides a non-parametric summary of the frequency that alleles were sequenced at. The method is applicable to organisms that have reference genomes that consist of full chromosomes or sub-chromosomal contigs. In contrast to other software designed to detect copy number variation, our method does not rely on an assumption of base ploidy, but instead infers it. We validated these approaches with the model system of Saccharomyces cerevisiae and applied it to the oomycete Phytophthora infestans, both known to vary in copy number. This functionality has been incorporated into the current release of the R package vcfR to provide modular and flexible methods to investigate copy number variation in genomic projects.
format	Online Article Text
id	pubmed-5909048
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-59090482018-04-27 Inferring Variation in Copy Number Using High Throughput Sequencing Data in R Knaus, Brian J. Grünwald, Niklaus J. Front Genet Genetics Inference of copy number variation presents a technical challenge because variant callers typically require the copy number of a genome or genomic region to be known a priori. Here we present a method to infer copy number that uses variant call format (VCF) data as input and is implemented in the R package vcfR. This method is based on the relative frequency of each allele (in both genic and non-genic regions) sequenced at heterozygous positions throughout a genome. These heterozygous positions are summarized by using arbitrarily sized windows of heterozygous positions, binning the allele frequencies, and selecting the bin with the greatest abundance of positions. This provides a non-parametric summary of the frequency that alleles were sequenced at. The method is applicable to organisms that have reference genomes that consist of full chromosomes or sub-chromosomal contigs. In contrast to other software designed to detect copy number variation, our method does not rely on an assumption of base ploidy, but instead infers it. We validated these approaches with the model system of Saccharomyces cerevisiae and applied it to the oomycete Phytophthora infestans, both known to vary in copy number. This functionality has been incorporated into the current release of the R package vcfR to provide modular and flexible methods to investigate copy number variation in genomic projects. Frontiers Media S.A. 2018-04-13 /pmc/articles/PMC5909048/ /pubmed/29706990 http://dx.doi.org/10.3389/fgene.2018.00123 Text en Copyright © 2018 Knaus and Grünwald. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Genetics Knaus, Brian J. Grünwald, Niklaus J. Inferring Variation in Copy Number Using High Throughput Sequencing Data in R
title	Inferring Variation in Copy Number Using High Throughput Sequencing Data in R
title_full	Inferring Variation in Copy Number Using High Throughput Sequencing Data in R
title_fullStr	Inferring Variation in Copy Number Using High Throughput Sequencing Data in R
title_full_unstemmed	Inferring Variation in Copy Number Using High Throughput Sequencing Data in R
title_short	Inferring Variation in Copy Number Using High Throughput Sequencing Data in R
title_sort	inferring variation in copy number using high throughput sequencing data in r
topic	Genetics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5909048/ https://www.ncbi.nlm.nih.gov/pubmed/29706990 http://dx.doi.org/10.3389/fgene.2018.00123
work_keys_str_mv	AT knausbrianj inferringvariationincopynumberusinghighthroughputsequencingdatainr AT grunwaldniklausj inferringvariationincopynumberusinghighthroughputsequencingdatainr

Inferring Variation in Copy Number Using High Throughput Sequencing Data in R

Ejemplares similares