Cargando…

Identification and Correction of Sample Mix-Ups in Expression Genetic Data: A Case Study

In a mouse intercross with more than 500 animals and genome-wide gene expression data on six tissues, we identified a high proportion (18%) of sample mix-ups in the genotype data. Local expression quantitative trait loci (eQTL; genetic loci influencing gene expression) with extremely large effect we...

Descripción completa

Detalles Bibliográficos
Autores principales: Broman, Karl W., Keller, Mark P., Broman, Aimee Teo, Kendziorski, Christina, Yandell, Brian S., Sen, Śaunak, Attie, Alan D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4592999/
https://www.ncbi.nlm.nih.gov/pubmed/26290572
http://dx.doi.org/10.1534/g3.115.019778
_version_ 1782393269031796736
author Broman, Karl W.
Keller, Mark P.
Broman, Aimee Teo
Kendziorski, Christina
Yandell, Brian S.
Sen, Śaunak
Attie, Alan D.
author_facet Broman, Karl W.
Keller, Mark P.
Broman, Aimee Teo
Kendziorski, Christina
Yandell, Brian S.
Sen, Śaunak
Attie, Alan D.
author_sort Broman, Karl W.
collection PubMed
description In a mouse intercross with more than 500 animals and genome-wide gene expression data on six tissues, we identified a high proportion (18%) of sample mix-ups in the genotype data. Local expression quantitative trait loci (eQTL; genetic loci influencing gene expression) with extremely large effect were used to form a classifier to predict an individual’s eQTL genotype based on expression data alone. By considering multiple eQTL and their related transcripts, we identified numerous individuals whose predicted eQTL genotypes (based on their expression data) did not match their observed genotypes, and then went on to identify other individuals whose genotypes did match the predicted eQTL genotypes. The concordance of predictions across six tissues indicated that the problem was due to mix-ups in the genotypes (although we further identified a small number of sample mix-ups in each of the six panels of gene expression microarrays). Consideration of the plate positions of the DNA samples indicated a number of off-by-one and off-by-two errors, likely the result of pipetting errors. Such sample mix-ups can be a problem in any genetic study, but eQTL data allow us to identify, and even correct, such problems. Our methods have been implemented in an R package, R/lineup.
format Online
Article
Text
id pubmed-4592999
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-45929992015-10-15 Identification and Correction of Sample Mix-Ups in Expression Genetic Data: A Case Study Broman, Karl W. Keller, Mark P. Broman, Aimee Teo Kendziorski, Christina Yandell, Brian S. Sen, Śaunak Attie, Alan D. G3 (Bethesda) Investigations In a mouse intercross with more than 500 animals and genome-wide gene expression data on six tissues, we identified a high proportion (18%) of sample mix-ups in the genotype data. Local expression quantitative trait loci (eQTL; genetic loci influencing gene expression) with extremely large effect were used to form a classifier to predict an individual’s eQTL genotype based on expression data alone. By considering multiple eQTL and their related transcripts, we identified numerous individuals whose predicted eQTL genotypes (based on their expression data) did not match their observed genotypes, and then went on to identify other individuals whose genotypes did match the predicted eQTL genotypes. The concordance of predictions across six tissues indicated that the problem was due to mix-ups in the genotypes (although we further identified a small number of sample mix-ups in each of the six panels of gene expression microarrays). Consideration of the plate positions of the DNA samples indicated a number of off-by-one and off-by-two errors, likely the result of pipetting errors. Such sample mix-ups can be a problem in any genetic study, but eQTL data allow us to identify, and even correct, such problems. Our methods have been implemented in an R package, R/lineup. Genetics Society of America 2015-08-19 /pmc/articles/PMC4592999/ /pubmed/26290572 http://dx.doi.org/10.1534/g3.115.019778 Text en Copyright © 2015 Broman et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Investigations
Broman, Karl W.
Keller, Mark P.
Broman, Aimee Teo
Kendziorski, Christina
Yandell, Brian S.
Sen, Śaunak
Attie, Alan D.
Identification and Correction of Sample Mix-Ups in Expression Genetic Data: A Case Study
title Identification and Correction of Sample Mix-Ups in Expression Genetic Data: A Case Study
title_full Identification and Correction of Sample Mix-Ups in Expression Genetic Data: A Case Study
title_fullStr Identification and Correction of Sample Mix-Ups in Expression Genetic Data: A Case Study
title_full_unstemmed Identification and Correction of Sample Mix-Ups in Expression Genetic Data: A Case Study
title_short Identification and Correction of Sample Mix-Ups in Expression Genetic Data: A Case Study
title_sort identification and correction of sample mix-ups in expression genetic data: a case study
topic Investigations
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4592999/
https://www.ncbi.nlm.nih.gov/pubmed/26290572
http://dx.doi.org/10.1534/g3.115.019778
work_keys_str_mv AT bromankarlw identificationandcorrectionofsamplemixupsinexpressiongeneticdataacasestudy
AT kellermarkp identificationandcorrectionofsamplemixupsinexpressiongeneticdataacasestudy
AT bromanaimeeteo identificationandcorrectionofsamplemixupsinexpressiongeneticdataacasestudy
AT kendziorskichristina identificationandcorrectionofsamplemixupsinexpressiongeneticdataacasestudy
AT yandellbrians identificationandcorrectionofsamplemixupsinexpressiongeneticdataacasestudy
AT sensaunak identificationandcorrectionofsamplemixupsinexpressiongeneticdataacasestudy
AT attiealand identificationandcorrectionofsamplemixupsinexpressiongeneticdataacasestudy