Cargando…

Using next-generation DNA sequence data for genetic association tests based on allele counts with and without consideration of zero inflation

The relationship between genetic variability and individual phenotypes is usually investigated by testing for association relying on called genotypes. Allele counts obtained from next-generation sequence data could be used for this purpose too. Genetic association can be examined by treating alterna...

Descripción completa

Detalles Bibliográficos
Autores principales: González Silos, Rosa, Karadag, Özge, Peil, Barbara, Fischer, Christine, Kabisch, Maria, Legrand, Carine, Lorenzo Bermejo, Justo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5133473/
https://www.ncbi.nlm.nih.gov/pubmed/27980668
http://dx.doi.org/10.1186/s12919-016-0062-5
_version_ 1782471269042618368
author González Silos, Rosa
Karadag, Özge
Peil, Barbara
Fischer, Christine
Kabisch, Maria
Legrand, Carine
Lorenzo Bermejo, Justo
author_facet González Silos, Rosa
Karadag, Özge
Peil, Barbara
Fischer, Christine
Kabisch, Maria
Legrand, Carine
Lorenzo Bermejo, Justo
author_sort González Silos, Rosa
collection PubMed
description The relationship between genetic variability and individual phenotypes is usually investigated by testing for association relying on called genotypes. Allele counts obtained from next-generation sequence data could be used for this purpose too. Genetic association can be examined by treating alternative allele counts (AACs) as the response variable in negative binomial regression. AACs from sequence data often contain an excess of zeros, thus motivating the use of Hurdle and zero-inflated models. Here we examine rough type I error rates and the ability to pick out variants with small probability values for 7 different testing approaches that incorporate AACs as an explanatory or as a response variable. Model comparisons relied on chromosome 3 DNA sequence data from 407 Hispanic participants in the Type 2 Diabetes Genetic Exploration by Next-generation sequencing in Ethnic Samples (T2D-GENES) project 1 with complete information on diastolic blood pressure and related medication. Our results suggest that in the investigation of the relationship between AAC as response variable and individual phenotypes as explanatory variable, Hurdle-negative binomial regression has some advantages. This model showed a good ability to discriminate strongly associated variants and controlled overall type I error rates. However, probability values from Hurdle-negative binomial regression were not obtained for approximately 25 % of the investigated variants because of convergence problems, and the mass of the probability value distribution was concentrated around 1.
format Online
Article
Text
id pubmed-5133473
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-51334732016-12-15 Using next-generation DNA sequence data for genetic association tests based on allele counts with and without consideration of zero inflation González Silos, Rosa Karadag, Özge Peil, Barbara Fischer, Christine Kabisch, Maria Legrand, Carine Lorenzo Bermejo, Justo BMC Proc Proceedings The relationship between genetic variability and individual phenotypes is usually investigated by testing for association relying on called genotypes. Allele counts obtained from next-generation sequence data could be used for this purpose too. Genetic association can be examined by treating alternative allele counts (AACs) as the response variable in negative binomial regression. AACs from sequence data often contain an excess of zeros, thus motivating the use of Hurdle and zero-inflated models. Here we examine rough type I error rates and the ability to pick out variants with small probability values for 7 different testing approaches that incorporate AACs as an explanatory or as a response variable. Model comparisons relied on chromosome 3 DNA sequence data from 407 Hispanic participants in the Type 2 Diabetes Genetic Exploration by Next-generation sequencing in Ethnic Samples (T2D-GENES) project 1 with complete information on diastolic blood pressure and related medication. Our results suggest that in the investigation of the relationship between AAC as response variable and individual phenotypes as explanatory variable, Hurdle-negative binomial regression has some advantages. This model showed a good ability to discriminate strongly associated variants and controlled overall type I error rates. However, probability values from Hurdle-negative binomial regression were not obtained for approximately 25 % of the investigated variants because of convergence problems, and the mass of the probability value distribution was concentrated around 1. BioMed Central 2016-10-18 /pmc/articles/PMC5133473/ /pubmed/27980668 http://dx.doi.org/10.1186/s12919-016-0062-5 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Proceedings
González Silos, Rosa
Karadag, Özge
Peil, Barbara
Fischer, Christine
Kabisch, Maria
Legrand, Carine
Lorenzo Bermejo, Justo
Using next-generation DNA sequence data for genetic association tests based on allele counts with and without consideration of zero inflation
title Using next-generation DNA sequence data for genetic association tests based on allele counts with and without consideration of zero inflation
title_full Using next-generation DNA sequence data for genetic association tests based on allele counts with and without consideration of zero inflation
title_fullStr Using next-generation DNA sequence data for genetic association tests based on allele counts with and without consideration of zero inflation
title_full_unstemmed Using next-generation DNA sequence data for genetic association tests based on allele counts with and without consideration of zero inflation
title_short Using next-generation DNA sequence data for genetic association tests based on allele counts with and without consideration of zero inflation
title_sort using next-generation dna sequence data for genetic association tests based on allele counts with and without consideration of zero inflation
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5133473/
https://www.ncbi.nlm.nih.gov/pubmed/27980668
http://dx.doi.org/10.1186/s12919-016-0062-5
work_keys_str_mv AT gonzalezsilosrosa usingnextgenerationdnasequencedataforgeneticassociationtestsbasedonallelecountswithandwithoutconsiderationofzeroinflation
AT karadagozge usingnextgenerationdnasequencedataforgeneticassociationtestsbasedonallelecountswithandwithoutconsiderationofzeroinflation
AT peilbarbara usingnextgenerationdnasequencedataforgeneticassociationtestsbasedonallelecountswithandwithoutconsiderationofzeroinflation
AT fischerchristine usingnextgenerationdnasequencedataforgeneticassociationtestsbasedonallelecountswithandwithoutconsiderationofzeroinflation
AT kabischmaria usingnextgenerationdnasequencedataforgeneticassociationtestsbasedonallelecountswithandwithoutconsiderationofzeroinflation
AT legrandcarine usingnextgenerationdnasequencedataforgeneticassociationtestsbasedonallelecountswithandwithoutconsiderationofzeroinflation
AT lorenzobermejojusto usingnextgenerationdnasequencedataforgeneticassociationtestsbasedonallelecountswithandwithoutconsiderationofzeroinflation