Cargando…
GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data
Reduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. V...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4734769/ https://www.ncbi.nlm.nih.gov/pubmed/26828719 http://dx.doi.org/10.1371/journal.pgen.1005631 |
_version_ | 1782412967324680192 |
---|---|
author | Cooke, Thomas F. Yee, Muh-Ching Muzzio, Marina Sockell, Alexandra Bell, Ryan Cornejo, Omar E. Kelley, Joanna L. Bailliet, Graciela Bravi, Claudio M. Bustamante, Carlos D. Kenny, Eimear E. |
author_facet | Cooke, Thomas F. Yee, Muh-Ching Muzzio, Marina Sockell, Alexandra Bell, Ryan Cornejo, Omar E. Kelley, Joanna L. Bailliet, Graciela Bravi, Claudio M. Bustamante, Carlos D. Kenny, Eimear E. |
author_sort | Cooke, Thomas F. |
collection | PubMed |
description | Reduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. Variant calling error rates, however, are higher in GBS than in standard sequencing, in particular due to restriction site polymorphisms, and few computational tools exist that specifically model and correct these errors. We developed a statistical method to remove errors caused by restriction site polymorphisms, implemented in the software package GBStools. We evaluated it in several simulated data sets, varying in number of samples, mean coverage and population mutation rate, and in two empirical human data sets (N = 8 and N = 63 samples). In our simulations, GBStools improved genotype accuracy more than commonly used filters such as Hardy-Weinberg equilibrium p-values. GBStools is most effective at removing genotype errors in data sets over 100 samples when coverage is 40X or higher, and the improvement is most pronounced in species with high genomic diversity. We also demonstrate the utility of GBS and GBStools for human population genetic inference in Argentine populations and reveal widely varying individual ancestry proportions and an excess of singletons, consistent with recent population growth. |
format | Online Article Text |
id | pubmed-4734769 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-47347692016-02-04 GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data Cooke, Thomas F. Yee, Muh-Ching Muzzio, Marina Sockell, Alexandra Bell, Ryan Cornejo, Omar E. Kelley, Joanna L. Bailliet, Graciela Bravi, Claudio M. Bustamante, Carlos D. Kenny, Eimear E. PLoS Genet Research Article Reduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. Variant calling error rates, however, are higher in GBS than in standard sequencing, in particular due to restriction site polymorphisms, and few computational tools exist that specifically model and correct these errors. We developed a statistical method to remove errors caused by restriction site polymorphisms, implemented in the software package GBStools. We evaluated it in several simulated data sets, varying in number of samples, mean coverage and population mutation rate, and in two empirical human data sets (N = 8 and N = 63 samples). In our simulations, GBStools improved genotype accuracy more than commonly used filters such as Hardy-Weinberg equilibrium p-values. GBStools is most effective at removing genotype errors in data sets over 100 samples when coverage is 40X or higher, and the improvement is most pronounced in species with high genomic diversity. We also demonstrate the utility of GBS and GBStools for human population genetic inference in Argentine populations and reveal widely varying individual ancestry proportions and an excess of singletons, consistent with recent population growth. Public Library of Science 2016-02-01 /pmc/articles/PMC4734769/ /pubmed/26828719 http://dx.doi.org/10.1371/journal.pgen.1005631 Text en © 2016 Cooke et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Cooke, Thomas F. Yee, Muh-Ching Muzzio, Marina Sockell, Alexandra Bell, Ryan Cornejo, Omar E. Kelley, Joanna L. Bailliet, Graciela Bravi, Claudio M. Bustamante, Carlos D. Kenny, Eimear E. GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data |
title | GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data |
title_full | GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data |
title_fullStr | GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data |
title_full_unstemmed | GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data |
title_short | GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data |
title_sort | gbstools: a statistical method for estimating allelic dropout in reduced representation sequencing data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4734769/ https://www.ncbi.nlm.nih.gov/pubmed/26828719 http://dx.doi.org/10.1371/journal.pgen.1005631 |
work_keys_str_mv | AT cookethomasf gbstoolsastatisticalmethodforestimatingallelicdropoutinreducedrepresentationsequencingdata AT yeemuhching gbstoolsastatisticalmethodforestimatingallelicdropoutinreducedrepresentationsequencingdata AT muzziomarina gbstoolsastatisticalmethodforestimatingallelicdropoutinreducedrepresentationsequencingdata AT sockellalexandra gbstoolsastatisticalmethodforestimatingallelicdropoutinreducedrepresentationsequencingdata AT bellryan gbstoolsastatisticalmethodforestimatingallelicdropoutinreducedrepresentationsequencingdata AT cornejoomare gbstoolsastatisticalmethodforestimatingallelicdropoutinreducedrepresentationsequencingdata AT kelleyjoannal gbstoolsastatisticalmethodforestimatingallelicdropoutinreducedrepresentationsequencingdata AT baillietgraciela gbstoolsastatisticalmethodforestimatingallelicdropoutinreducedrepresentationsequencingdata AT braviclaudiom gbstoolsastatisticalmethodforestimatingallelicdropoutinreducedrepresentationsequencingdata AT bustamantecarlosd gbstoolsastatisticalmethodforestimatingallelicdropoutinreducedrepresentationsequencingdata AT kennyeimeare gbstoolsastatisticalmethodforestimatingallelicdropoutinreducedrepresentationsequencingdata |