Cargando…
Inferring Variation in Copy Number Using High Throughput Sequencing Data in R
Inference of copy number variation presents a technical challenge because variant callers typically require the copy number of a genome or genomic region to be known a priori. Here we present a method to infer copy number that uses variant call format (VCF) data as input and is implemented in the R...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5909048/ https://www.ncbi.nlm.nih.gov/pubmed/29706990 http://dx.doi.org/10.3389/fgene.2018.00123 |
_version_ | 1783315822759903232 |
---|---|
author | Knaus, Brian J. Grünwald, Niklaus J. |
author_facet | Knaus, Brian J. Grünwald, Niklaus J. |
author_sort | Knaus, Brian J. |
collection | PubMed |
description | Inference of copy number variation presents a technical challenge because variant callers typically require the copy number of a genome or genomic region to be known a priori. Here we present a method to infer copy number that uses variant call format (VCF) data as input and is implemented in the R package vcfR. This method is based on the relative frequency of each allele (in both genic and non-genic regions) sequenced at heterozygous positions throughout a genome. These heterozygous positions are summarized by using arbitrarily sized windows of heterozygous positions, binning the allele frequencies, and selecting the bin with the greatest abundance of positions. This provides a non-parametric summary of the frequency that alleles were sequenced at. The method is applicable to organisms that have reference genomes that consist of full chromosomes or sub-chromosomal contigs. In contrast to other software designed to detect copy number variation, our method does not rely on an assumption of base ploidy, but instead infers it. We validated these approaches with the model system of Saccharomyces cerevisiae and applied it to the oomycete Phytophthora infestans, both known to vary in copy number. This functionality has been incorporated into the current release of the R package vcfR to provide modular and flexible methods to investigate copy number variation in genomic projects. |
format | Online Article Text |
id | pubmed-5909048 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-59090482018-04-27 Inferring Variation in Copy Number Using High Throughput Sequencing Data in R Knaus, Brian J. Grünwald, Niklaus J. Front Genet Genetics Inference of copy number variation presents a technical challenge because variant callers typically require the copy number of a genome or genomic region to be known a priori. Here we present a method to infer copy number that uses variant call format (VCF) data as input and is implemented in the R package vcfR. This method is based on the relative frequency of each allele (in both genic and non-genic regions) sequenced at heterozygous positions throughout a genome. These heterozygous positions are summarized by using arbitrarily sized windows of heterozygous positions, binning the allele frequencies, and selecting the bin with the greatest abundance of positions. This provides a non-parametric summary of the frequency that alleles were sequenced at. The method is applicable to organisms that have reference genomes that consist of full chromosomes or sub-chromosomal contigs. In contrast to other software designed to detect copy number variation, our method does not rely on an assumption of base ploidy, but instead infers it. We validated these approaches with the model system of Saccharomyces cerevisiae and applied it to the oomycete Phytophthora infestans, both known to vary in copy number. This functionality has been incorporated into the current release of the R package vcfR to provide modular and flexible methods to investigate copy number variation in genomic projects. Frontiers Media S.A. 2018-04-13 /pmc/articles/PMC5909048/ /pubmed/29706990 http://dx.doi.org/10.3389/fgene.2018.00123 Text en Copyright © 2018 Knaus and Grünwald. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Knaus, Brian J. Grünwald, Niklaus J. Inferring Variation in Copy Number Using High Throughput Sequencing Data in R |
title | Inferring Variation in Copy Number Using High Throughput Sequencing Data in R |
title_full | Inferring Variation in Copy Number Using High Throughput Sequencing Data in R |
title_fullStr | Inferring Variation in Copy Number Using High Throughput Sequencing Data in R |
title_full_unstemmed | Inferring Variation in Copy Number Using High Throughput Sequencing Data in R |
title_short | Inferring Variation in Copy Number Using High Throughput Sequencing Data in R |
title_sort | inferring variation in copy number using high throughput sequencing data in r |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5909048/ https://www.ncbi.nlm.nih.gov/pubmed/29706990 http://dx.doi.org/10.3389/fgene.2018.00123 |
work_keys_str_mv | AT knausbrianj inferringvariationincopynumberusinghighthroughputsequencingdatainr AT grunwaldniklausj inferringvariationincopynumberusinghighthroughputsequencingdatainr |