If you’re curious about what your LinkedIn connections are up to, keep reading…
I took advantage of one rainy day during my vacation to dive into some LI data and explore what my nearly 7K contacts are doing professionally.
The challenge was that the available LI data about my contacts’ jobs was pretty limited - just their job titles. So, my first step was to enrich this information using a LLM to generate brief descriptions of their positions based on those titles. It’s not perfect and it’s a bit error-prone, but in a low-stakes situation like this, fortunately we can accept a bit of imperfection and some minor “hallucinations” 🧙
Next, I fed these descriptions into a BERTopic workflow, which applies several NLP techniques to identify clusters of similar documents:
As you can see in the chart below, the range of roles is quite broad, but it seems that most of my contacts work in people analytics, data science, HR & people management, as executives, and as researchers in academia or in enterprises.
Below is an interactive version of the chart, where you can check the individual job titles behind the job categories.
# packages used
import numpy as np
import pickle
import pandas as pd
import plotly.graph_objects as go
import plotly.offline as py
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
# data used
with open('all_labels_remapped.pkl', 'rb') as f:
= pickle.load(f)
all_labels_remapped = np.load('reduced_embeddings.npy')
reduced_embeddings = pd.read_csv('data.csv')
data
# extracting job titles
= data['position']
titles
# defining the 'Uncategorized' label
= 'Uncategorized'
uncategorized_label
# generating 20 distinct colors
= plt.get_cmap('tab20')
cmap = [cmap(i) for i in range(20)]
colors
# convert matplotlib colors to hex format
= [mcolors.to_hex(c) for c in colors]
colors_hex
# defining color for 'Uncategorized'
= '#D3D3D3' # Light gray
uncategorized_color
# creating an empty figure
= go.Figure()
fig
# plotting 'Uncategorized' points first
= pd.Series(all_labels_remapped) == uncategorized_label
uncategorized_indices
fig.add_trace(go.Scatter(=reduced_embeddings[uncategorized_indices, 0],
x=reduced_embeddings[uncategorized_indices, 1],
y='markers',
mode=dict(
marker=5,
size=uncategorized_color,
color=0.5
opacity
),=uncategorized_label,
name=[f'Title: {titles[j]}<br>Category: {all_labels_remapped[j]}' for j in range(len(uncategorized_indices)) if uncategorized_indices[j]],
text='text',
hoverinfo=True
showlegend;
))
# plotting the rest of the groups on top
= pd.Series(all_labels_remapped).unique()
unique_labels
for i, label in enumerate(unique_labels):
if label == uncategorized_label:
continue # skipping 'Uncategorized' since it was already plotted
= pd.Series(all_labels_remapped) == label
indices = colors_hex[i % len(colors_hex)]
color
fig.add_trace(go.Scatter(=reduced_embeddings[indices, 0],
x=reduced_embeddings[indices, 1],
y='markers',
mode=dict(
marker=5,
size=color,
color=0.5
opacity
),=f'{label}',
name=[f'Title: {titles[j]}<br>Category: {all_labels_remapped[j]}' for j in range(len(indices)) if indices[j]],
text='text',
hoverinfo=True
showlegend;
))
# updating layout
fig.update_layout(='',
title=dict(
xaxis=False,
showgrid=False,
zeroline=False,
showline=False,
showticklabels='',
title
),=dict(
yaxis=False,
showgrid=False,
zeroline=False,
showline=False,
showticklabels='',
title
),='rgba(0,0,0,0)',
plot_bgcolor=600,
height=850,
width='plotly_white'
template )
No big surprises here, considering my career path and control over who I connect with. However, it can be more useful to view it from the perspective of who is missing and use it as a tool for intentional LI network building. After all, as some wise people say, we are the average of the people we spend the most time with. For example, I would appreciate having more artists among my contacts, but it’s a question of whether LI is the right network for finding such connections 😉
P.S. If interested, feel free to check out one of my earlier apps that automatically generates basic descriptive statistics about your LI connections.
For attribution, please cite this work as
Stehlík (2024, Aug. 15). Ludek's Blog About People Analytics: Analyzing LinkedIn connections' jobs using LLMs and the BERTopic package. Retrieved from https://blog-about-people-analytics.netlify.app/posts/2024-08-15-linkedin-contacts-job-positions/
BibTeX citation
@misc{stehlík2024analyzing, author = {Stehlík, Luděk}, title = {Ludek's Blog About People Analytics: Analyzing LinkedIn connections' jobs using LLMs and the BERTopic package}, url = {https://blog-about-people-analytics.netlify.app/posts/2024-08-15-linkedin-contacts-job-positions/}, year = {2024} }