Cargando…

KOMPUTE: imputing summary statistics of missing phenotypes in high-throughput model organism data

MOTIVATION: The International Mouse Phenotyping Consortium (IMPC) is striving to build a comprehensive functional catalog of mammalian protein-coding genes by systematically producing and phenotyping gene-knockout mice for almost every protein-coding gene in the mouse genome and by testing associati...

Descripción completa

Detalles Bibliográficos
Autores principales: Warkentin, Coby, O’Connell, Michael J, Lee, Donghyung
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10409646/
https://www.ncbi.nlm.nih.gov/pubmed/37565237
http://dx.doi.org/10.1093/bioadv/vbad100
_version_ 1785086288601284608
author Warkentin, Coby
O’Connell, Michael J
Lee, Donghyung
author_facet Warkentin, Coby
O’Connell, Michael J
Lee, Donghyung
author_sort Warkentin, Coby
collection PubMed
description MOTIVATION: The International Mouse Phenotyping Consortium (IMPC) is striving to build a comprehensive functional catalog of mammalian protein-coding genes by systematically producing and phenotyping gene-knockout mice for almost every protein-coding gene in the mouse genome and by testing associations between gene loss-of-function and phenotype. To date, the IMPC has identified over 90 000 gene–phenotype associations, but many phenotypes have not yet been measured for each gene, resulting in largely incomplete data; ∼75.6% of association summary statistics are still missing in the latest IMPC summary statistics dataset (IMPC release version 16). RESULTS: To overcome these challenges, we propose KOMPUTE, a novel method for imputing missing summary statistics in the IMPC dataset. Using conditional distribution properties of multivariate normal, KOMPUTE estimates the association Z-scores of unmeasured phenotypes for a particular gene as a conditional expectation given the Z-scores of measured phenotypes. Our evaluation of the method using simulated and real-world datasets demonstrates its superiority over the singular value decomposition matrix completion method in various scenarios. AVAILABILITY AND IMPLEMENTATION: An R package for KOMPUTE is publicly available at https://github.com/statsleelab/kompute, along with usage examples and results for different phenotype domains at https://statsleelab.github.io/komputeExamples.
format Online
Article
Text
id pubmed-10409646
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-104096462023-08-10 KOMPUTE: imputing summary statistics of missing phenotypes in high-throughput model organism data Warkentin, Coby O’Connell, Michael J Lee, Donghyung Bioinform Adv Application Note MOTIVATION: The International Mouse Phenotyping Consortium (IMPC) is striving to build a comprehensive functional catalog of mammalian protein-coding genes by systematically producing and phenotyping gene-knockout mice for almost every protein-coding gene in the mouse genome and by testing associations between gene loss-of-function and phenotype. To date, the IMPC has identified over 90 000 gene–phenotype associations, but many phenotypes have not yet been measured for each gene, resulting in largely incomplete data; ∼75.6% of association summary statistics are still missing in the latest IMPC summary statistics dataset (IMPC release version 16). RESULTS: To overcome these challenges, we propose KOMPUTE, a novel method for imputing missing summary statistics in the IMPC dataset. Using conditional distribution properties of multivariate normal, KOMPUTE estimates the association Z-scores of unmeasured phenotypes for a particular gene as a conditional expectation given the Z-scores of measured phenotypes. Our evaluation of the method using simulated and real-world datasets demonstrates its superiority over the singular value decomposition matrix completion method in various scenarios. AVAILABILITY AND IMPLEMENTATION: An R package for KOMPUTE is publicly available at https://github.com/statsleelab/kompute, along with usage examples and results for different phenotype domains at https://statsleelab.github.io/komputeExamples. Oxford University Press 2023-08-01 /pmc/articles/PMC10409646/ /pubmed/37565237 http://dx.doi.org/10.1093/bioadv/vbad100 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Application Note
Warkentin, Coby
O’Connell, Michael J
Lee, Donghyung
KOMPUTE: imputing summary statistics of missing phenotypes in high-throughput model organism data
title KOMPUTE: imputing summary statistics of missing phenotypes in high-throughput model organism data
title_full KOMPUTE: imputing summary statistics of missing phenotypes in high-throughput model organism data
title_fullStr KOMPUTE: imputing summary statistics of missing phenotypes in high-throughput model organism data
title_full_unstemmed KOMPUTE: imputing summary statistics of missing phenotypes in high-throughput model organism data
title_short KOMPUTE: imputing summary statistics of missing phenotypes in high-throughput model organism data
title_sort kompute: imputing summary statistics of missing phenotypes in high-throughput model organism data
topic Application Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10409646/
https://www.ncbi.nlm.nih.gov/pubmed/37565237
http://dx.doi.org/10.1093/bioadv/vbad100
work_keys_str_mv AT warkentincoby komputeimputingsummarystatisticsofmissingphenotypesinhighthroughputmodelorganismdata
AT oconnellmichaelj komputeimputingsummarystatisticsofmissingphenotypesinhighthroughputmodelorganismdata
AT leedonghyung komputeimputingsummarystatisticsofmissingphenotypesinhighthroughputmodelorganismdata