Cargando…
NGS allele counts versus called genotypes for testing genetic association
RNA sequence data are commonly summarized as read counts. By contrast, so far there is no alternative to genotype calling for investigating the relationship between genetic variants determined by next-generation sequencing (NGS) and a phenotype of interest. Here we propose and evaluate the direct an...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Research Network of Computational and Structural Biotechnology
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9294184/ https://www.ncbi.nlm.nih.gov/pubmed/35891781 http://dx.doi.org/10.1016/j.csbj.2022.07.016 |
_version_ | 1784749793950564352 |
---|---|
author | González Silos, Rosa Fischer, Christine Lorenzo Bermejo, Justo |
author_facet | González Silos, Rosa Fischer, Christine Lorenzo Bermejo, Justo |
author_sort | González Silos, Rosa |
collection | PubMed |
description | RNA sequence data are commonly summarized as read counts. By contrast, so far there is no alternative to genotype calling for investigating the relationship between genetic variants determined by next-generation sequencing (NGS) and a phenotype of interest. Here we propose and evaluate the direct analysis of allele counts for genetic association tests. Specifically, we assess the potential advantage of the ratio of alternative allele counts to the total number of reads aligned at a specific position of the genome (coverage) over called genotypes. We simulated association studies based on NGS data from HapMap individuals. Genotype quality scores and allele counts were simulated using NGS data from the Personal Genome Project. Real data from the 1000 Genomes Project was also used to compare the two competing approaches. The average proportions of probability values lower or equal to 0.05 amounted to 0.0496 for called genotypes and 0.0485 for the ratio of alternative allele counts to coverage in the null scenario, and to 0.69 for called genotypes and 0.75 for the ratio of alternative allele counts to coverage in the alternative scenario (9% power increase). The advantage in statistical power of the novel approach increased with decreasing coverage, with decreasing genotype quality and with decreasing allele frequency – 124% power increase for variants with a minor allele frequency lower than 0.05. We provide computer code in R to implement the novel approach, which does not preclude the use of complementary data quality filters before or after identification of the most promising association signals. AUTHOR SUMMARY: Genetic association tests usually rely on called genotypes. We postulate here that the direct analysis of allele counts from sequence data improves the quality of statistical inference. To evaluate this hypothesis, we investigate simulated and real data using distinct statistical approaches. We demonstrate that association tests based on allele counts rather than called genotypes achieve higher statistical power with controlled type I error rates. |
format | Online Article Text |
id | pubmed-9294184 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Research Network of Computational and Structural Biotechnology |
record_format | MEDLINE/PubMed |
spelling | pubmed-92941842022-07-25 NGS allele counts versus called genotypes for testing genetic association González Silos, Rosa Fischer, Christine Lorenzo Bermejo, Justo Comput Struct Biotechnol J Short Communication RNA sequence data are commonly summarized as read counts. By contrast, so far there is no alternative to genotype calling for investigating the relationship between genetic variants determined by next-generation sequencing (NGS) and a phenotype of interest. Here we propose and evaluate the direct analysis of allele counts for genetic association tests. Specifically, we assess the potential advantage of the ratio of alternative allele counts to the total number of reads aligned at a specific position of the genome (coverage) over called genotypes. We simulated association studies based on NGS data from HapMap individuals. Genotype quality scores and allele counts were simulated using NGS data from the Personal Genome Project. Real data from the 1000 Genomes Project was also used to compare the two competing approaches. The average proportions of probability values lower or equal to 0.05 amounted to 0.0496 for called genotypes and 0.0485 for the ratio of alternative allele counts to coverage in the null scenario, and to 0.69 for called genotypes and 0.75 for the ratio of alternative allele counts to coverage in the alternative scenario (9% power increase). The advantage in statistical power of the novel approach increased with decreasing coverage, with decreasing genotype quality and with decreasing allele frequency – 124% power increase for variants with a minor allele frequency lower than 0.05. We provide computer code in R to implement the novel approach, which does not preclude the use of complementary data quality filters before or after identification of the most promising association signals. AUTHOR SUMMARY: Genetic association tests usually rely on called genotypes. We postulate here that the direct analysis of allele counts from sequence data improves the quality of statistical inference. To evaluate this hypothesis, we investigate simulated and real data using distinct statistical approaches. We demonstrate that association tests based on allele counts rather than called genotypes achieve higher statistical power with controlled type I error rates. Research Network of Computational and Structural Biotechnology 2022-07-11 /pmc/articles/PMC9294184/ /pubmed/35891781 http://dx.doi.org/10.1016/j.csbj.2022.07.016 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Short Communication González Silos, Rosa Fischer, Christine Lorenzo Bermejo, Justo NGS allele counts versus called genotypes for testing genetic association |
title | NGS allele counts versus called genotypes for testing genetic association |
title_full | NGS allele counts versus called genotypes for testing genetic association |
title_fullStr | NGS allele counts versus called genotypes for testing genetic association |
title_full_unstemmed | NGS allele counts versus called genotypes for testing genetic association |
title_short | NGS allele counts versus called genotypes for testing genetic association |
title_sort | ngs allele counts versus called genotypes for testing genetic association |
topic | Short Communication |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9294184/ https://www.ncbi.nlm.nih.gov/pubmed/35891781 http://dx.doi.org/10.1016/j.csbj.2022.07.016 |
work_keys_str_mv | AT gonzalezsilosrosa ngsallelecountsversuscalledgenotypesfortestinggeneticassociation AT fischerchristine ngsallelecountsversuscalledgenotypesfortestinggeneticassociation AT lorenzobermejojusto ngsallelecountsversuscalledgenotypesfortestinggeneticassociation |