Cargando…

GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data

Reduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. V...

Descripción completa

Detalles Bibliográficos
Autores principales: Cooke, Thomas F., Yee, Muh-Ching, Muzzio, Marina, Sockell, Alexandra, Bell, Ryan, Cornejo, Omar E., Kelley, Joanna L., Bailliet, Graciela, Bravi, Claudio M., Bustamante, Carlos D., Kenny, Eimear E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4734769/
https://www.ncbi.nlm.nih.gov/pubmed/26828719
http://dx.doi.org/10.1371/journal.pgen.1005631
_version_ 1782412967324680192
author Cooke, Thomas F.
Yee, Muh-Ching
Muzzio, Marina
Sockell, Alexandra
Bell, Ryan
Cornejo, Omar E.
Kelley, Joanna L.
Bailliet, Graciela
Bravi, Claudio M.
Bustamante, Carlos D.
Kenny, Eimear E.
author_facet Cooke, Thomas F.
Yee, Muh-Ching
Muzzio, Marina
Sockell, Alexandra
Bell, Ryan
Cornejo, Omar E.
Kelley, Joanna L.
Bailliet, Graciela
Bravi, Claudio M.
Bustamante, Carlos D.
Kenny, Eimear E.
author_sort Cooke, Thomas F.
collection PubMed
description Reduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. Variant calling error rates, however, are higher in GBS than in standard sequencing, in particular due to restriction site polymorphisms, and few computational tools exist that specifically model and correct these errors. We developed a statistical method to remove errors caused by restriction site polymorphisms, implemented in the software package GBStools. We evaluated it in several simulated data sets, varying in number of samples, mean coverage and population mutation rate, and in two empirical human data sets (N = 8 and N = 63 samples). In our simulations, GBStools improved genotype accuracy more than commonly used filters such as Hardy-Weinberg equilibrium p-values. GBStools is most effective at removing genotype errors in data sets over 100 samples when coverage is 40X or higher, and the improvement is most pronounced in species with high genomic diversity. We also demonstrate the utility of GBS and GBStools for human population genetic inference in Argentine populations and reveal widely varying individual ancestry proportions and an excess of singletons, consistent with recent population growth.
format Online
Article
Text
id pubmed-4734769
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-47347692016-02-04 GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data Cooke, Thomas F. Yee, Muh-Ching Muzzio, Marina Sockell, Alexandra Bell, Ryan Cornejo, Omar E. Kelley, Joanna L. Bailliet, Graciela Bravi, Claudio M. Bustamante, Carlos D. Kenny, Eimear E. PLoS Genet Research Article Reduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. Variant calling error rates, however, are higher in GBS than in standard sequencing, in particular due to restriction site polymorphisms, and few computational tools exist that specifically model and correct these errors. We developed a statistical method to remove errors caused by restriction site polymorphisms, implemented in the software package GBStools. We evaluated it in several simulated data sets, varying in number of samples, mean coverage and population mutation rate, and in two empirical human data sets (N = 8 and N = 63 samples). In our simulations, GBStools improved genotype accuracy more than commonly used filters such as Hardy-Weinberg equilibrium p-values. GBStools is most effective at removing genotype errors in data sets over 100 samples when coverage is 40X or higher, and the improvement is most pronounced in species with high genomic diversity. We also demonstrate the utility of GBS and GBStools for human population genetic inference in Argentine populations and reveal widely varying individual ancestry proportions and an excess of singletons, consistent with recent population growth. Public Library of Science 2016-02-01 /pmc/articles/PMC4734769/ /pubmed/26828719 http://dx.doi.org/10.1371/journal.pgen.1005631 Text en © 2016 Cooke et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Cooke, Thomas F.
Yee, Muh-Ching
Muzzio, Marina
Sockell, Alexandra
Bell, Ryan
Cornejo, Omar E.
Kelley, Joanna L.
Bailliet, Graciela
Bravi, Claudio M.
Bustamante, Carlos D.
Kenny, Eimear E.
GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data
title GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data
title_full GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data
title_fullStr GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data
title_full_unstemmed GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data
title_short GBStools: A Statistical Method for Estimating Allelic Dropout in Reduced Representation Sequencing Data
title_sort gbstools: a statistical method for estimating allelic dropout in reduced representation sequencing data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4734769/
https://www.ncbi.nlm.nih.gov/pubmed/26828719
http://dx.doi.org/10.1371/journal.pgen.1005631
work_keys_str_mv AT cookethomasf gbstoolsastatisticalmethodforestimatingallelicdropoutinreducedrepresentationsequencingdata
AT yeemuhching gbstoolsastatisticalmethodforestimatingallelicdropoutinreducedrepresentationsequencingdata
AT muzziomarina gbstoolsastatisticalmethodforestimatingallelicdropoutinreducedrepresentationsequencingdata
AT sockellalexandra gbstoolsastatisticalmethodforestimatingallelicdropoutinreducedrepresentationsequencingdata
AT bellryan gbstoolsastatisticalmethodforestimatingallelicdropoutinreducedrepresentationsequencingdata
AT cornejoomare gbstoolsastatisticalmethodforestimatingallelicdropoutinreducedrepresentationsequencingdata
AT kelleyjoannal gbstoolsastatisticalmethodforestimatingallelicdropoutinreducedrepresentationsequencingdata
AT baillietgraciela gbstoolsastatisticalmethodforestimatingallelicdropoutinreducedrepresentationsequencingdata
AT braviclaudiom gbstoolsastatisticalmethodforestimatingallelicdropoutinreducedrepresentationsequencingdata
AT bustamantecarlosd gbstoolsastatisticalmethodforestimatingallelicdropoutinreducedrepresentationsequencingdata
AT kennyeimeare gbstoolsastatisticalmethodforestimatingallelicdropoutinreducedrepresentationsequencingdata