Cargando…
KOMPUTE: imputing summary statistics of missing phenotypes in high-throughput model organism data
MOTIVATION: The International Mouse Phenotyping Consortium (IMPC) is striving to build a comprehensive functional catalog of mammalian protein-coding genes by systematically producing and phenotyping gene-knockout mice for almost every protein-coding gene in the mouse genome and by testing associati...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10409646/ https://www.ncbi.nlm.nih.gov/pubmed/37565237 http://dx.doi.org/10.1093/bioadv/vbad100 |
_version_ | 1785086288601284608 |
---|---|
author | Warkentin, Coby O’Connell, Michael J Lee, Donghyung |
author_facet | Warkentin, Coby O’Connell, Michael J Lee, Donghyung |
author_sort | Warkentin, Coby |
collection | PubMed |
description | MOTIVATION: The International Mouse Phenotyping Consortium (IMPC) is striving to build a comprehensive functional catalog of mammalian protein-coding genes by systematically producing and phenotyping gene-knockout mice for almost every protein-coding gene in the mouse genome and by testing associations between gene loss-of-function and phenotype. To date, the IMPC has identified over 90 000 gene–phenotype associations, but many phenotypes have not yet been measured for each gene, resulting in largely incomplete data; ∼75.6% of association summary statistics are still missing in the latest IMPC summary statistics dataset (IMPC release version 16). RESULTS: To overcome these challenges, we propose KOMPUTE, a novel method for imputing missing summary statistics in the IMPC dataset. Using conditional distribution properties of multivariate normal, KOMPUTE estimates the association Z-scores of unmeasured phenotypes for a particular gene as a conditional expectation given the Z-scores of measured phenotypes. Our evaluation of the method using simulated and real-world datasets demonstrates its superiority over the singular value decomposition matrix completion method in various scenarios. AVAILABILITY AND IMPLEMENTATION: An R package for KOMPUTE is publicly available at https://github.com/statsleelab/kompute, along with usage examples and results for different phenotype domains at https://statsleelab.github.io/komputeExamples. |
format | Online Article Text |
id | pubmed-10409646 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-104096462023-08-10 KOMPUTE: imputing summary statistics of missing phenotypes in high-throughput model organism data Warkentin, Coby O’Connell, Michael J Lee, Donghyung Bioinform Adv Application Note MOTIVATION: The International Mouse Phenotyping Consortium (IMPC) is striving to build a comprehensive functional catalog of mammalian protein-coding genes by systematically producing and phenotyping gene-knockout mice for almost every protein-coding gene in the mouse genome and by testing associations between gene loss-of-function and phenotype. To date, the IMPC has identified over 90 000 gene–phenotype associations, but many phenotypes have not yet been measured for each gene, resulting in largely incomplete data; ∼75.6% of association summary statistics are still missing in the latest IMPC summary statistics dataset (IMPC release version 16). RESULTS: To overcome these challenges, we propose KOMPUTE, a novel method for imputing missing summary statistics in the IMPC dataset. Using conditional distribution properties of multivariate normal, KOMPUTE estimates the association Z-scores of unmeasured phenotypes for a particular gene as a conditional expectation given the Z-scores of measured phenotypes. Our evaluation of the method using simulated and real-world datasets demonstrates its superiority over the singular value decomposition matrix completion method in various scenarios. AVAILABILITY AND IMPLEMENTATION: An R package for KOMPUTE is publicly available at https://github.com/statsleelab/kompute, along with usage examples and results for different phenotype domains at https://statsleelab.github.io/komputeExamples. Oxford University Press 2023-08-01 /pmc/articles/PMC10409646/ /pubmed/37565237 http://dx.doi.org/10.1093/bioadv/vbad100 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Application Note Warkentin, Coby O’Connell, Michael J Lee, Donghyung KOMPUTE: imputing summary statistics of missing phenotypes in high-throughput model organism data |
title | KOMPUTE: imputing summary statistics of missing phenotypes in high-throughput model organism data |
title_full | KOMPUTE: imputing summary statistics of missing phenotypes in high-throughput model organism data |
title_fullStr | KOMPUTE: imputing summary statistics of missing phenotypes in high-throughput model organism data |
title_full_unstemmed | KOMPUTE: imputing summary statistics of missing phenotypes in high-throughput model organism data |
title_short | KOMPUTE: imputing summary statistics of missing phenotypes in high-throughput model organism data |
title_sort | kompute: imputing summary statistics of missing phenotypes in high-throughput model organism data |
topic | Application Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10409646/ https://www.ncbi.nlm.nih.gov/pubmed/37565237 http://dx.doi.org/10.1093/bioadv/vbad100 |
work_keys_str_mv | AT warkentincoby komputeimputingsummarystatisticsofmissingphenotypesinhighthroughputmodelorganismdata AT oconnellmichaelj komputeimputingsummarystatisticsofmissingphenotypesinhighthroughputmodelorganismdata AT leedonghyung komputeimputingsummarystatisticsofmissingphenotypesinhighthroughputmodelorganismdata |