Cargando…

Dimensional reduction of phenotypes from 53 000 mouse models reveals a diverse landscape of gene function

 : Animal models have long been used to study gene function and the impact of genetic mutations on phenotype. Through the research efforts of thousands of research groups, systematic curation of published literature and high-throughput phenotyping screens, the collective body of knowledge for the mo...

Descripción completa

Detalles Bibliográficos
Autores principales: Konopka, Tomasz, Vestito, Letizia, Smedley, Damian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8633315/
https://www.ncbi.nlm.nih.gov/pubmed/34870209
http://dx.doi.org/10.1093/bioadv/vbab026
_version_ 1784607902616518656
author Konopka, Tomasz
Vestito, Letizia
Smedley, Damian
author_facet Konopka, Tomasz
Vestito, Letizia
Smedley, Damian
author_sort Konopka, Tomasz
collection PubMed
description  : Animal models have long been used to study gene function and the impact of genetic mutations on phenotype. Through the research efforts of thousands of research groups, systematic curation of published literature and high-throughput phenotyping screens, the collective body of knowledge for the mouse now covers the majority of protein-coding genes. We here collected data for over 53 000 mouse models with mutations in over 15 000 genomic markers and characterized by more than 254 000 annotations using more than 9000 distinct ontology terms. We investigated dimensional reduction and embedding techniques as means to facilitate access to this diverse and high-dimensional information. Our analyses provide the first visual maps of the landscape of mouse phenotypic diversity. We also summarize some of the difficulties in producing and interpreting embeddings of sparse phenotypic data. In particular, we show that data preprocessing, filtering and encoding have as much impact on the final embeddings as the process of dimensional reduction. Nonetheless, techniques developed in the context of dimensional reduction create opportunities for explorative analysis of this large pool of public data, including for searching for mouse models suited to study human diseases. AVAILABILITY AND IMPLEMENTATION: Source code for analysis scripts is available on GitHub at https://github.com/tkonopka/mouse-embeddings. The data underlying this article are available in Zenodo at https://doi.org/10.5281/zenodo.4916171. CONTACT: t.konopka@qmul.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online.
format Online
Article
Text
id pubmed-8633315
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-86333152021-12-01 Dimensional reduction of phenotypes from 53 000 mouse models reveals a diverse landscape of gene function Konopka, Tomasz Vestito, Letizia Smedley, Damian Bioinform Adv Original Article  : Animal models have long been used to study gene function and the impact of genetic mutations on phenotype. Through the research efforts of thousands of research groups, systematic curation of published literature and high-throughput phenotyping screens, the collective body of knowledge for the mouse now covers the majority of protein-coding genes. We here collected data for over 53 000 mouse models with mutations in over 15 000 genomic markers and characterized by more than 254 000 annotations using more than 9000 distinct ontology terms. We investigated dimensional reduction and embedding techniques as means to facilitate access to this diverse and high-dimensional information. Our analyses provide the first visual maps of the landscape of mouse phenotypic diversity. We also summarize some of the difficulties in producing and interpreting embeddings of sparse phenotypic data. In particular, we show that data preprocessing, filtering and encoding have as much impact on the final embeddings as the process of dimensional reduction. Nonetheless, techniques developed in the context of dimensional reduction create opportunities for explorative analysis of this large pool of public data, including for searching for mouse models suited to study human diseases. AVAILABILITY AND IMPLEMENTATION: Source code for analysis scripts is available on GitHub at https://github.com/tkonopka/mouse-embeddings. The data underlying this article are available in Zenodo at https://doi.org/10.5281/zenodo.4916171. CONTACT: t.konopka@qmul.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. Oxford University Press 2021-10-11 /pmc/articles/PMC8633315/ /pubmed/34870209 http://dx.doi.org/10.1093/bioadv/vbab026 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Konopka, Tomasz
Vestito, Letizia
Smedley, Damian
Dimensional reduction of phenotypes from 53 000 mouse models reveals a diverse landscape of gene function
title Dimensional reduction of phenotypes from 53 000 mouse models reveals a diverse landscape of gene function
title_full Dimensional reduction of phenotypes from 53 000 mouse models reveals a diverse landscape of gene function
title_fullStr Dimensional reduction of phenotypes from 53 000 mouse models reveals a diverse landscape of gene function
title_full_unstemmed Dimensional reduction of phenotypes from 53 000 mouse models reveals a diverse landscape of gene function
title_short Dimensional reduction of phenotypes from 53 000 mouse models reveals a diverse landscape of gene function
title_sort dimensional reduction of phenotypes from 53 000 mouse models reveals a diverse landscape of gene function
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8633315/
https://www.ncbi.nlm.nih.gov/pubmed/34870209
http://dx.doi.org/10.1093/bioadv/vbab026
work_keys_str_mv AT konopkatomasz dimensionalreductionofphenotypesfrom53000mousemodelsrevealsadiverselandscapeofgenefunction
AT vestitoletizia dimensionalreductionofphenotypesfrom53000mousemodelsrevealsadiverselandscapeofgenefunction
AT smedleydamian dimensionalreductionofphenotypesfrom53000mousemodelsrevealsadiverselandscapeofgenefunction