A brief demonstration of an awesome tool for exploring multidimensional data, using personality and work-life satisfaction data.
Recently, I needed to explore the interactions of several variables in relation to a variable of interest, and I was wondering what tool could help me, until I remembered that in the distant past I had used Kohonen’s Self-Organizing Maps (SOMs) in such situations - an artificial neural network designed for unsupervised learning that relies on competitive learning mechanism to preserve topological properties of high-dimensional data and map them onto a lower-dimensional grid for visualization and clustering.
To illustrate, I applied it to a piece of data collected by Clearer Thinking, which used it to compare the predictive performance of different popular personality test frameworks in relation to various life outcomes. Specifically, I tried to explore the relationship between Big Five traits and work-life satisfaction.
# loading necessary libraries
library(tidyverse)
library(kohonen)
library(gridExtra)
# uploading data
data <- readr::read_csv('big5_mbti-stlye_astrologicalSigns_enneagramTypes_outcome_variables.csv')
# data preparation
mydata <- data %>%
dplyr::select(contains("Big5"), outcome_SatisfiedWithWorkLife ) %>%
tidyr::drop_na() %>%
dplyr::rename(
Extraversion = Big5_E,
Openness = Big5_O,
Agreeableness = Big5_A,
Conscientiousness = Big5_C,
Neuroticism = Big5_N,
"Work-Life Satisfaction" = outcome_SatisfiedWithWorkLife
)
# normalize the data
data_scaled <- mydata %>%
dplyr::mutate(across(everything(), ~scale(.)))
# creating a matrix for SOM function
data_matrix <- as.matrix(data_scaled)
data_matrix <- apply(data_matrix, 2, as.numeric)
# keeping the original names of the vars
colnames(data_matrix) <- colnames(data_scaled)
# creating a SOM grid with hexagonal topology 20x20
som_grid <- kohonen::somgrid(xdim = 20, ydim = 20, topo = "hexagonal")
# training the SOM
set.seed(2024)
som_model <- kohonen::som(
data_matrix,
grid = som_grid,
rlen = 5000,
alpha = c(0.05, 0.01),
keep.data = TRUE
)
# plotting the SOM using ggplot
# SOM with hexagonal cells using ggplot
# function to create hexagonal coordinates
hex_coords <- function(x, y) {
r <- 1 / sqrt(3) # the radius of a hexagon
angles <- seq(0, 2*pi, length.out = 7) # angles for hexagon vertices
data.frame(
x = x + r * cos(angles),
y = y + r * sin(angles)
)
}
# defining a blue-to-red color palette
blue_to_red_palette <- colorRampPalette(c("blue", "white", "red"))
# creating a list to store ggplot objects
plots <- list()
# generating the component planes for each variable and store them as ggplot objects
for(i in 1:ncol(data_matrix)){
# extracting the component plane data
plane_data <- data.frame(
x = rep(1:som_grid$xdim, each = som_grid$ydim),
y = rep(1:som_grid$ydim, som_grid$xdim),
z = getCodes(som_model)[,i]
)
# creating hexagon coordinates for each tile
hex_list <- lapply(1:nrow(plane_data), function(j) {
hex_coords(plane_data$x[j], plane_data$y[j])
})
# combining all hexagons into one data frame
hex_data <- do.call(rbind, hex_list)
hex_data$z <- rep(plane_data$z, each = 7)
hex_data$group <- rep(1:nrow(plane_data), each = 7)
# creating a ggplot object with hexagons
p <- ggplot2::ggplot(hex_data, aes(x = x, y = y, group = group, fill = z)) +
ggplot2::geom_polygon() +
ggplot2::scale_fill_gradientn(colors = blue_to_red_palette(100)) +
ggplot2::coord_fixed() +
ggplot2::labs(
title = colnames(data_matrix)[i],
fill = "Value"
) +
ggplot2::theme_minimal() +
ggplot2::theme(
axis.text = element_blank(),
axis.title = element_blank(),
panel.grid = element_blank(),
plot.title = element_text(hjust = 0.5)
)
plots[[i]] <- p
}
# arranging the plots in a 2x3 grid
gridExtra::grid.arrange(grobs = plots, nrow = 2, ncol = 3)
# an alternative and easier SOM visualization
# par(mfrow = c(2, 3))
# par(cex.main = 1.5)
# for(i in 1:ncol(data_matrix)){
# plot(som_model, type = "property", property = getCodes(som_model)[,i],
# main = colnames(data_matrix)[i], shape = "straight", palette.name = blue_to_red_palette)
# }
# dev.off()
# # resetting the layout to default
# par(mfrow = c(1, 1), cex.main = 1)
From the maps, it is easy to see that satisfaction with work life is strongly related to Extraversion (positively) and Neuroticism (negatively) and weakly related to Conscientiousness and Openness (both positively). More importantly, however, we can also check various “exceptions” to these bivariate patterns by looking at the same specific locations across all maps. For example, one can see a group of people with higher Neuroticism (lower left corner) who nevertheless show higher work-life satisfaction, perhaps due to higher Extraversion, Conscientiousness, and partly also Openness. That’s something that wouldn’t be possible to find out from the pairwise plots that are usually used for similar data exploration. Check it out for yourself below.
If you find yourself in a similar situation, definitely give it a try. Happy exploration(s) 🙂
For attribution, please cite this work as
Stehlík (2024, Aug. 21). Ludek's Blog About People Analytics: Kohonen's Self-Organizing Maps. Retrieved from https://blog-about-people-analytics.netlify.app/posts/2024-08-21-kohonen-self-organizing-maps/
BibTeX citation
@misc{stehlík2024kohonen's, author = {Stehlík, Luděk}, title = {Ludek's Blog About People Analytics: Kohonen's Self-Organizing Maps}, url = {https://blog-about-people-analytics.netlify.app/posts/2024-08-21-kohonen-self-organizing-maps/}, year = {2024} }