Cargando…

A Practical Guide to Sparse k-Means Clustering for Studying Molecular Development of the Human Brain

Studying the molecular development of the human brain presents unique challenges for selecting a data analysis approach. The rare and valuable nature of human postmortem brain tissue, especially for developmental studies, means the sample sizes are small (n), but the use of high throughput genomic a...

Descripción completa

Detalles Bibliográficos
Autores principales: Balsor, Justin L., Arbabi, Keon, Singh, Desmond, Kwan, Rachel, Zaslavsky, Jonathan, Jeyanesan, Ewalina, Murphy, Kathryn M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8636820/
https://www.ncbi.nlm.nih.gov/pubmed/34867140
http://dx.doi.org/10.3389/fnins.2021.668293
_version_ 1784608610362327040
author Balsor, Justin L.
Arbabi, Keon
Singh, Desmond
Kwan, Rachel
Zaslavsky, Jonathan
Jeyanesan, Ewalina
Murphy, Kathryn M.
author_facet Balsor, Justin L.
Arbabi, Keon
Singh, Desmond
Kwan, Rachel
Zaslavsky, Jonathan
Jeyanesan, Ewalina
Murphy, Kathryn M.
author_sort Balsor, Justin L.
collection PubMed
description Studying the molecular development of the human brain presents unique challenges for selecting a data analysis approach. The rare and valuable nature of human postmortem brain tissue, especially for developmental studies, means the sample sizes are small (n), but the use of high throughput genomic and proteomic methods measure the expression levels for hundreds or thousands of variables [e.g., genes or proteins (p)] for each sample. This leads to a data structure that is high dimensional (p ≫ n) and introduces the curse of dimensionality, which poses a challenge for traditional statistical approaches. In contrast, high dimensional analyses, especially cluster analyses developed for sparse data, have worked well for analyzing genomic datasets where p ≫ n. Here we explore applying a lasso-based clustering method developed for high dimensional genomic data with small sample sizes. Using protein and gene data from the developing human visual cortex, we compared clustering methods. We identified an application of sparse k-means clustering [robust sparse k-means clustering (RSKC)] that partitioned samples into age-related clusters that reflect lifespan stages from birth to aging. RSKC adaptively selects a subset of the genes or proteins contributing to partitioning samples into age-related clusters that progress across the lifespan. This approach addresses a problem in current studies that could not identify multiple postnatal clusters. Moreover, clusters encompassed a range of ages like a series of overlapping waves illustrating that chronological- and brain-age have a complex relationship. In addition, a recently developed workflow to create plasticity phenotypes (Balsor et al., 2020) was applied to the clusters and revealed neurobiologically relevant features that identified how the human visual cortex changes across the lifespan. These methods can help address the growing demand for multimodal integration, from molecular machinery to brain imaging signals, to understand the human brain’s development.
format Online
Article
Text
id pubmed-8636820
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-86368202021-12-03 A Practical Guide to Sparse k-Means Clustering for Studying Molecular Development of the Human Brain Balsor, Justin L. Arbabi, Keon Singh, Desmond Kwan, Rachel Zaslavsky, Jonathan Jeyanesan, Ewalina Murphy, Kathryn M. Front Neurosci Neuroscience Studying the molecular development of the human brain presents unique challenges for selecting a data analysis approach. The rare and valuable nature of human postmortem brain tissue, especially for developmental studies, means the sample sizes are small (n), but the use of high throughput genomic and proteomic methods measure the expression levels for hundreds or thousands of variables [e.g., genes or proteins (p)] for each sample. This leads to a data structure that is high dimensional (p ≫ n) and introduces the curse of dimensionality, which poses a challenge for traditional statistical approaches. In contrast, high dimensional analyses, especially cluster analyses developed for sparse data, have worked well for analyzing genomic datasets where p ≫ n. Here we explore applying a lasso-based clustering method developed for high dimensional genomic data with small sample sizes. Using protein and gene data from the developing human visual cortex, we compared clustering methods. We identified an application of sparse k-means clustering [robust sparse k-means clustering (RSKC)] that partitioned samples into age-related clusters that reflect lifespan stages from birth to aging. RSKC adaptively selects a subset of the genes or proteins contributing to partitioning samples into age-related clusters that progress across the lifespan. This approach addresses a problem in current studies that could not identify multiple postnatal clusters. Moreover, clusters encompassed a range of ages like a series of overlapping waves illustrating that chronological- and brain-age have a complex relationship. In addition, a recently developed workflow to create plasticity phenotypes (Balsor et al., 2020) was applied to the clusters and revealed neurobiologically relevant features that identified how the human visual cortex changes across the lifespan. These methods can help address the growing demand for multimodal integration, from molecular machinery to brain imaging signals, to understand the human brain’s development. Frontiers Media S.A. 2021-11-16 /pmc/articles/PMC8636820/ /pubmed/34867140 http://dx.doi.org/10.3389/fnins.2021.668293 Text en Copyright © 2021 Balsor, Arbabi, Singh, Kwan, Zaslavsky, Jeyanesan and Murphy. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Neuroscience
Balsor, Justin L.
Arbabi, Keon
Singh, Desmond
Kwan, Rachel
Zaslavsky, Jonathan
Jeyanesan, Ewalina
Murphy, Kathryn M.
A Practical Guide to Sparse k-Means Clustering for Studying Molecular Development of the Human Brain
title A Practical Guide to Sparse k-Means Clustering for Studying Molecular Development of the Human Brain
title_full A Practical Guide to Sparse k-Means Clustering for Studying Molecular Development of the Human Brain
title_fullStr A Practical Guide to Sparse k-Means Clustering for Studying Molecular Development of the Human Brain
title_full_unstemmed A Practical Guide to Sparse k-Means Clustering for Studying Molecular Development of the Human Brain
title_short A Practical Guide to Sparse k-Means Clustering for Studying Molecular Development of the Human Brain
title_sort practical guide to sparse k-means clustering for studying molecular development of the human brain
topic Neuroscience
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8636820/
https://www.ncbi.nlm.nih.gov/pubmed/34867140
http://dx.doi.org/10.3389/fnins.2021.668293
work_keys_str_mv AT balsorjustinl apracticalguidetosparsekmeansclusteringforstudyingmoleculardevelopmentofthehumanbrain
AT arbabikeon apracticalguidetosparsekmeansclusteringforstudyingmoleculardevelopmentofthehumanbrain
AT singhdesmond apracticalguidetosparsekmeansclusteringforstudyingmoleculardevelopmentofthehumanbrain
AT kwanrachel apracticalguidetosparsekmeansclusteringforstudyingmoleculardevelopmentofthehumanbrain
AT zaslavskyjonathan apracticalguidetosparsekmeansclusteringforstudyingmoleculardevelopmentofthehumanbrain
AT jeyanesanewalina apracticalguidetosparsekmeansclusteringforstudyingmoleculardevelopmentofthehumanbrain
AT murphykathrynm apracticalguidetosparsekmeansclusteringforstudyingmoleculardevelopmentofthehumanbrain
AT balsorjustinl practicalguidetosparsekmeansclusteringforstudyingmoleculardevelopmentofthehumanbrain
AT arbabikeon practicalguidetosparsekmeansclusteringforstudyingmoleculardevelopmentofthehumanbrain
AT singhdesmond practicalguidetosparsekmeansclusteringforstudyingmoleculardevelopmentofthehumanbrain
AT kwanrachel practicalguidetosparsekmeansclusteringforstudyingmoleculardevelopmentofthehumanbrain
AT zaslavskyjonathan practicalguidetosparsekmeansclusteringforstudyingmoleculardevelopmentofthehumanbrain
AT jeyanesanewalina practicalguidetosparsekmeansclusteringforstudyingmoleculardevelopmentofthehumanbrain
AT murphykathrynm practicalguidetosparsekmeansclusteringforstudyingmoleculardevelopmentofthehumanbrain