Web and Social Network Analytics

Week 3: Social Media & Graph Analytics

Dr. Zexun Chen

Jan-2026

Table of Contents

Welcome to Week 3!

Last Week:

  • Search engines and PageRank
  • How Google ranks web pages
  • Web crawling and indexing

This Week:

  • Social media networks
  • How to analyze connections and communities
  • Finding influential users

Learning Objectives

By the end of this lecture, you will be able to:

  1. Define key graph concepts (nodes, edges, degree)
  2. Calculate centrality measures (degree, betweenness)
  3. Explain the small-world phenomenon and its causes
  4. Distinguish between strong and weak ties
  5. Describe community detection approaches

Social Media

The Social Media Explosion

  • Last decade: explosion of social media platforms
    • Twitter/X, LinkedIn, Facebook, Instagram, TikTok
  • Creation of vast amounts of user-generated content
  • Content is interconnected through:
    • Relationships: friendships, followers, connections
    • Interactions: likes, shares, comments, mentions

What Makes Social Networks Special?

  • Interpersonal/entity relations form a graph
    • Network of interactions between users
  • Data linked to people
    • Preferences, posts, videos, images
    • Location data, sensor data
  • Rich analytical opportunities
    • Understanding communities and influence
    • Predicting behaviour and trends

Two Types of Social Network

Social Network Analysis

Key Questions for This Week

What We Will Explore

  • What do social networks look like?
  • How do they behave over time?
  • What distribution of patterns do they maintain?

ShopSocial Application: How can we analyse the social connections between users to improve product recommendations and identify influential reviewers?

ShopSocial: A Network Perspective

ShopSocial

  • Users = Nodes (customers, reviewers)
  • Interactions = Edges (reviews, purchases, follows)
  • Questions we can answer:
    • Who are the most influential reviewers?
    • Do customers form communities by product type?
    • How quickly do product trends spread?

📊 Quick Poll: Social Media Usage

Wooclap Question

Go to wooclap.com and enter code: UKFORD

How many social media platforms do you actively use?

A. 1-2 platforms B. 3-4 platforms C. 5-6 platforms D. 7+ platforms

Why this matters: Each platform creates a different kind of network! - LinkedIn: Professional connections - Instagram: Visual content sharing - Twitter/X: Information diffusion

Analytics Tools

Preliminaries:

  • Graphs: \(G = (V, E, W)\)
    • \(V\): vertices (nodes), \(E\): edges (links), \(W\): weights
    • Edges (links):
      • Directed: phone call, follower on Twitter
      • Undirected: collaboration, friendship on Facebook
      • Weighted: edge \(e_{ij}\) has weight \(w_{ij}\)
      • \(E_{max} = \binom{N}{2} = \frac{N(N-1)}{2}, N = |V|\)
    • Vertices (nodes):
      • Node degree \(k_i\): number of edges adjacent
      • In-degree \(k^{in}\); Out-degree \(k^{out}\)
  • Example:

Graph Basics

  • Graphs: \(G = (V, E, W)\)
    • Unipartite, multipartite (typically bipartite)
      • Homogeneous nodes vs nodes of different classes
    • Adjacency matrix \(A\), \(|V| \times |V|\) matrix
      • Sparsity is preferable (for calculations)
      • \(|E| \ll E_{max}\) or \(\bar{k} \ll N-1\)

Preliminaries: Components

  • (Weakly) Connected Component: Set of nodes and edges where there exists a path between any two nodes in set.
    • Strongly Connected (SCC): Directed path between any two nodes.
      • The elements from every pair of nodes in \(S\) can reach each other.
      • There is no larger set containing \(S\) with the same property.
  • Disconnected Graph: Consists of two or more connected components.
  • Bridge Edge: If deleted, the graph becomes disconnected.
  • Directed Acyclic Graph (DAG): Has no cycles.

Characteristics

  • Degree Distribution \(P(k)\):
    • Probability that a node has degree \(k\).
  • Path:
    • Sequence of nodes traversed (e.g., \(<N_1, N_2, N_3, N_2>\)).
  • Distance Between Nodes \(h_{ij}\):
    • Length of the shortest path connecting them.
  • Diameter of Network:
    • The shortest path between any pair of nodes.
  • Average Path Length:
    • \(\bar{h} = \frac{1}{2E_{max}}\sum_{j \neq i} h_{ij}\)

Examples 1

Examples 2

Vertex/Node-Based

📊 Quick Poll: Who’s Important?

Wooclap Question

Go to wooclap.com and enter code: UKFORD

In a social network, who’s more important?

A. Person with most friends B. Person who connects different groups C. Person in the largest friend group D. Person with oldest account

The Answer: It Depends!

Different types of “importance” exist: - Degree centrality: Most connections (A) - Betweenness centrality: Bridge between groups (B) - We’ll learn to measure both!

Centrality - Degree

  • A central actor is involved in many ties:
    • Degree centrality/prestige centrality: Referrals
      • In-links
      • PageRank
    • Degree centrality (undirected):
      • Number of immediate connections
      • \(C_{D}(i) = \frac{k_i}{|V| - 1}\)
    • Prestige centrality (directed):
      • Everyone points to this node
      • \(C_{P}(i) = \frac{k_{i}^{in}}{|V| - 1}\)

🤔 The Gatekeeper: Betweenness Centrality

Real-World Analogy

Imagine you’re the only person who knows both:

  • Your work friends AND
  • Your football team friends

You’re the “bridge” between two groups!

If they want to communicate, they MUST go through you.

Betweenness = How often you’re on the shortest path between others

High betweenness = Information gatekeeper (powerful position!)

Centrality: Betweenness

  • A central actor is indispensable:
    • Lots of connections pass through it.
    • How many pairs of individuals would have to go through you in order to reach one another in the minimum number of hops?
  • Betweenness centrality:
    • Calculation: Number of the shortest paths that go through \(V\) vs total number of the shortest paths that exist between all pairs of nodes. \[C_V = \frac{\sum_{s \neq v \neq t} \sigma_{st}(v)}{\sum_{s \neq v \neq t} \sigma_{st}}\]
    • \(\sigma_{st}\): The number of shortest paths between \(s\) and \(t\).
    • \(\sigma_{st}(v)\): The shortest paths that go through \(v\).

Centrality: Degree vs Betweenness

Centrality: Example

Centrality: Exercise

  • Graph

Exercise: Look at Wooclap

  • Calculate:
    • Degree centrality of A-D:
      • A: \(1/3\); B: \(3/3\)
      • C: \(2/3\); D: \(2/3\)
    • Betweenness centrality of B:
      • Number of Shortest Paths:
        • A-C: 1
        • A-D: 1
        • C-D: 1
      • Number of Shortest Paths through B:
        • A-C: 1
        • A-D: 1
        • C-D: 0
      • Centrality: \(2/3\)

🤔 Are Your Friends Also Friends?

The Question

Think about your 5 closest friends.

  • How many of them know EACH OTHER?
  • Are they all in the same group, or from different parts of your life?

Clustering Coefficient measures this!

  • High clustering: Your friends all know each other (tight-knit group)
  • Low clustering: Your friends don’t know each other (you’re the bridge)

Clustering Coefficient

  • Portion of connected neighbours of node \(i\):
    • Number of connections vs maximum number of connections.
    • Local clustering coefficient of node \(i\): \[ C_i^{Directed} = \frac{\sum_{j, k \in N_i} e_{jk}}{k_i (k_i - 1)}, \quad C_i^{Undirected} = \frac{\sum_{j, k \in N_i} e_{jk}}{k_i (k_i - 1)/2} \]
      • \(N_i = \{v_j \in V| e_{ij} \in E \vee e_{ji} \in E\}\): the neighbourhood of \(i\).
      • Denominator: Possible connections between adjacent nodes.
    • Average clustering coefficient:
      • Average local density.
      • \(C = \frac{1}{|N|} \sum_i^{|N|} C_i\)

Clustering Coefficient: Exercise

  • What is the local clustering coefficient of node B?
    • \(N = \{A, C, D\}\)
    • \(k_i = 3\)
    • \(C_B = \frac{1}{(3 \times 2) / 2}\)

Clustering Coefficient

Statistical Properties

📊 The 80/20 Rule of Networks

Have You Noticed?

  • A few YouTubers have millions of subscribers
  • Most have just a handful
  • Same pattern: Twitter, Instagram, TikTok…

Why?

Power Law: The “Rich Get Richer” Effect

  • Popular accounts get suggested more
  • Being suggested → more followers
  • More followers → more suggestions
  • Cycle amplifies inequality!

Distribution

  • Distribution of Node Connectivity:
    • Heavy tails (right skew): Strong nodes have many connections, weak nodes still have a few.
    • Power law: \(P(k) = k^{-\alpha}\)
    • Straight line on a log-log plot with slope \(-\alpha\).

Distribution of node connectivity

Power Law

  • Model of Preferential Attachment:
    • Stems from citation analysis: New citations are proportionate to the number a paper already has.
    • “Rich get richer” model.

📊 Quick Poll: Degrees of Separation

Wooclap Question

Go to wooclap.com and enter code: UKFORD

How many people do you think separate you from Barack Obama?

A. 2-3 people B. 4-6 people C. 7-15 people D. 15+ people

Surprise!

Research shows it’s typically 4-6 people for most connections!

🌍 Six Degrees of Separation

Try This!

Think of someone famous you’d like to meet (e.g., Taylor Swift, Elon Musk).

Can you trace a chain of people who might know each other to reach them?

  • You → Your friend → Their friend → … → Famous person

Surprisingly: Most chains are 6 steps or fewer!

This is the “Small World” phenomenon.

Small Worlds

Experiment

Experiment

  • Typical shortest path between any two people?
    • Small-world experiment ((Milgram 1967), Link):
      • 300 people from Omaha, Nebraska, and Kansas.
      • Asked to send a letter to a stockbroker in Boston by passing the letter through friends.
      • Recorded the number of steps taken.

Bacon number

  • Six Degrees of Kevin Bacon:
    • Create a network of Hollywood actors.
    • Connect actors if they appeared in the same movie.
    • Bacon number: Number of steps to Kevin Bacon.
    • https://www.sixdegrees.org/

Example of Small World

Code
import networkx as nx
import plotly.graph_objects as go

# Create a Watts–Strogatz small-world network
G = nx.watts_strogatz_graph(n=30, k=4, p=0.1)

# Calculate a 3D layout for the nodes
pos_3d = nx.spring_layout(G, dim=3, seed=42)

# Extract node coordinates
x_nodes = []
y_nodes = []
z_nodes = []
for node in G.nodes():
    x_nodes.append(pos_3d[node][0])
    y_nodes.append(pos_3d[node][1])
    z_nodes.append(pos_3d[node][2])

# Build edge coordinate lists for Plotly
edge_x = []
edge_y = []
edge_z = []
for edge in G.edges():
    x0, y0, z0 = pos_3d[edge[0]]
    x1, y1, z1 = pos_3d[edge[1]]
    # Plotly needs a break (None) to start a new line for each edge
    edge_x += [x0, x1, None]
    edge_y += [y0, y1, None]
    edge_z += [z0, z1, None]

# Create a Plotly trace for edges
edge_trace = go.Scatter3d(
    x=edge_x, y=edge_y, z=edge_z,
    mode='lines',
    line=dict(color='lightblue', width=2),
    hoverinfo='none'
)

# Create a Plotly trace for nodes
node_trace = go.Scatter3d(
    x=x_nodes, y=y_nodes, z=z_nodes,
    mode='markers',
    marker=dict(symbol='circle', size=5, color='pink'),
    text=[f"Node {n}" for n in G.nodes()],
    hoverinfo='text'
)

# Combine the traces into a figure
fig = go.Figure(
    data=[edge_trace, node_trace],
    layout=go.Layout(
        title="Small-World Network in 3D (Plotly)",
        showlegend=False,
        scene=dict(
            xaxis=dict(showbackground=False),
            yaxis=dict(showbackground=False),
            zaxis=dict(showbackground=False)
        )
    )
)

fig.show()

Smaller Number on Digital Plataform

  • Facebook:
    • 99.6% of users are connected by degree 5 (6 hops).
    • 92% are connected by only 4 degrees.

Reference Link

Causes for Small World

  • High Clustering: Social networks tend to have groups of friends who are more interconnected.
  • Short Path Lengths (Diameter): Despite clustering, there are typically short paths that connect different groups.
  • Random Connections: Random links between distant nodes significantly reduce the average path length.

Cause (Continuous)

  • Cause of Small Worlds (Watts and Strogatz (1998)):
    • A network with low clustering can still be a small world.
    • Only a few random links are enough to create a small-world effect.

Source: http://www.ladamic.com/

Smaller Worlds on Digital Platforms

Link/Edge-Based

🤔 Why Friends of Friends Become Friends

Think About It

Your friend Alex introduces you to their friend Sam.

What usually happens?

  • You and Sam often become friends too!
  • The “triangle” closes

Why?

  • You meet through Alex (opportunity)
  • You trust Alex’s judgment (trust)
  • Avoiding awkwardness at group events (incentive)

Technical Term

This tendency for triangles to form is called Triadic Closure.

Friend of a Friend is a Friend

Triadic Closure

Triadic closure is a concept in social network theory, first suggested by German sociologist Georg Simmel in his 1908 book Soziologie (Sociology: Investigations on the Forms of Sociation).

  • Strong Ties and Triadic Closure:
    • Weak Claim: A new edge B-C is more likely when A-B and A-C are ties.
    • Strong Claim: If there exists strong ties between A and B/C, there must be an edge B-C.

Triadic Closure: Clustering Coefficient

  • Consider Clustering Coefficient \(C_A\) of Node A:
    • Probability that two random friends of A are friends themselves.
    • Before New Edge: \(C_A = \frac{1}{4 \times 3 / 2}\)
    • After New Edge: \(C_A = \frac{2}{4 \times 3 / 2}\)
    • Higher clustering coefficient after the new edge.

Triadic Closure: Causes

  • Meet Friends Through Friends:
    • Have opportunities to meet through common friends.
    • Mutual trust in shared friends.
    • Incentive to pair friends to avoid stress in relationships.

Bridge

  • Bridges:
    • If removed, the components of the network become disconnected.

Local Bridges

  • Local Bridges (Equivalent Definitions):
    • Any edge whose endpoints have no friends in common.
    • Any edge with zero neighbour overlap is called a local bridge.
    • Any edge whose deletion results in increasing the distance between the endpoints to a value strictly more than two.
  • Example:
    • A-B is the only local bridge in the below graph.

Strong Triadic Closure

Strong Triadic Closure Property

A node \(A\) violates the Strong Triadic Closure Property if it has strong ties to two non-linked nodes \(B\) and \(C\).

  • Observations:
    • No node in the left figure violates the Strong Triadic Closure Property.
    • If we change A-F and A-B to strong ties, then it violates the Strong Triadic Closure Property due to the absence of links B-F, B-E, and A-G.

Local Bridges and Weak Ties

Proposition

If a node \(A\) in a network satisfies the Strong Triadic Closure Property and is involved in at least two strong ties, then any local bridge it is involved in must be a weak tie.

  • Illustration:
  • A connection between:
    • Local bridge (a structural view).
    • Strong and weak ties (an individual view).

đź’ˇ The Surprising Strength of Weak Ties

Counter-intuitive Finding

Who’s more likely to help you find a new job?

  • A. Your best friend
  • B. Someone you barely know

Answer: B!

Why? Your best friend knows the similar people you do. Acquaintances have access to different networks and new information!

đź”— Strong vs Weak Ties

Strong Ties:

  • Close friends, family
  • Frequent contact
  • Emotional support
  • Same information bubble

Weak Ties:

  • Acquaintances, former colleagues
  • Infrequent contact
  • Access to NEW information
  • Bridge to other communities

ShopSocial Application

  • Strong ties: Regular customers who review together
  • Weak ties: Occasional visitors who bring new product categories
  • Which drives more diverse sales?

Ties

  • Triadic Closure:
    • Useful for analyzing people that can be addressed.
    • E.g., in campaigns for sport shoes.
  • Visualization of Concepts

  • Weak Ties:
    • Tend to carry different information compared to close contacts.
      • In experiments, acquaintanceship ties were more effective at passing information.
      • E.g., in recommending jobs –> untapped/undiscovered community.

Weak Ties: Example

Mobile Call Graph with Edge Strength

Graph Structure

📊 Quick Poll: Finding Groups

Wooclap Question

Go to wooclap.com and enter code: UKFORD

In your social network, how distinct are your friend groups?

A. Very distinct - different groups never meet B. Somewhat distinct - occasional overlap C. Very mixed - everyone knows everyone D. I have one big friend group

Community detection algorithms try to identify these groups automatically!

🔍 Finding Groups in Networks

The Goal

Look at a social network. Can you spot the “friend groups”?

  • University friends cluster together
  • Work colleagues cluster together
  • Family members cluster together

Community Detection = Teaching computers to find these groups automatically

đź’ˇ The Basic Idea

Good communities have:

  1. Many connections WITHIN the group; 2. Few connections BETWEEN groups

Like school cafeteria tables:

  • Kids at same table chat a lot (within); - Different tables rarely interact (between)

ShopSocial Example

Can we find “communities” of similar shoppers?

  • Electronics enthusiasts
  • Fashion buyers
  • Book lovers

This helps with targeted recommendations!

Community Discovery

  • Community Discovery:
    • Identifies groups or clusters within a network that are more densely connected internally.
    • Networks are strongly modular.
    • Cliques exist, e.g., groups of friends, user bases, etc.

Basic Idea

Partition a graph to minimize connections between different groups while maximizing connections within the same group.

Normalised Cut (Ncut)

  • Ncut:
    • A criterion for measuring the quality of a network division into clusters or communities.
  • Normalised Cut of a group of nodes \(S, \bar{S} \subset V\): \[ Ncut(S) = \frac{\sum_{i \in S, j \in \bar{S}} A(i, j)}{\sum_{i \in S} k_i} + \frac{\sum_{i \in S, j \in \bar{S}} A(i, j)}{\sum_{j \in \bar{S}} k_j} \]
  • Interpretation:
    • Low Ncut signifies good communities as they are well-connected internally but sparsely connected externally.

Normalised Cut (Ncut): Continues

  • Quality Functions:
    • Kernighan-Lin Objective:
      • Minimise number of inter-cluster edges (weights) of the same size.
      • \(KLObj(V_1, \cdots, V_k) = \sum_{i \neq j} A(V_i, V_j)\)
    • Modularity:
      • Measures whether subgraphs are far from random subgraphs.
      • Independent of the number of clusters.
      • \(m\): Number of edges in the graph. \[ Q = \sum_{c=1}^k \left[\frac{A(V_c, V_j)}{m} - \left(\frac{k_{V_c}}{2m}\right)^2\right] \]

Normalised Cut (Ncut): Directed Network

  • Directed Networks:
    • Normalised Cut: \[ Ncut_{dir}(S) = \frac{\sum_{i \in S, j \in \bar{S}} \pi(i)P(i, j)}{\sum_{i \in S} \pi(i)} + \frac{\sum_{i \in S, j \in \bar{S}} \pi(i)P(i, j)}{\sum_{j \in \bar{S}} \pi(j)} \]
      • \(P\): Transition matrix of a random walk.
      • \(\pi\): Stationary distribution vector (see PageRank).
      • Typically minimised using spectral clustering.
    • Modularity: \[ Q = \frac{1}{m} \sum_{i, j \in E} \left[A_{ij} - \frac{k_i^{in} k_j^{out}}{m}\right] \delta_{c_i, c_j} \]
      • \(\delta_{c_i, c_j}\): 1 if community of \(i\) equals community of \(j\), 0 otherwise.
      • Measures community quality in directed networks.

Graph Partition Algorithm

  • Kernighan-Lin (KL) Algorithm ((Kernighan and Lin 1970), Partition a graph into two blocks):
    • Classic graph partitioning.
    • For each iteration, search for nodes that can be swapped to reduce the number of inter-cluster edges (edge cut).
    • Gain: Reduction in edge cut when moving nodes.
    • Nodes with the largest gain are repeatedly selected from larger partition.
  • Example KL Algorithm:
    • Swapping A-D: 1 fewer cuts (4:3).
    • Swapping B-D: 1 fewer cuts (4:3).
    • Swapping C-D: 2 fewer cuts (4:2).

Agglomerative / Divisive Algorithms

  • Agglomerative: Start from single-node communities (bottom-up approach).
  • Divisive: Start from all-nodes community (top-down approach).

Girvan and Newman (Divisive) Algorithm (Newman 2010)

Edge Betweenness

The number of the shortest paths that go through a specific edge vs the total number of the shortest paths.

  1. For every edge in a graph, calculate the edge betweenness centrality.
  2. Remove the edge with the highest betweenness centrality.
  3. Recalculate the betweenness centrality for every remaining edge.
  4. Repeat steps 2-4 until there are no more edges left.

Girvan and Newman (Divisive) Algorithm

Girvan and Newman (Divisive) Algorithm

Spectral and Other Approaches

  • Spectral Algorithms:
    • Use top \(k\) eigenvectors of the adjacency matrix to represent nodes in a \(k\)-dimensional space.
    • Apply regular clustering techniques.
    • Computational cost of Singular Vector Decomposition (SVD) is high.
  • Other Approaches:
    • Local graph clustering.
    • Multi-Level Regularised Markov Clustering.

Further Questions

📝 Quick Quiz: Test Your Understanding

Question Your Answer
1. A node with high betweenness centrality is often called a _____
2. “Friends of friends become friends” describes _____ closure
3. Weak ties are valuable because they provide access to _____
4. The “six degrees” phenomenon is also called _____ world

Answers: 1. Bridge/Gatekeeper, 2. Triadic, 3. New information/Different networks, 4. Small

Case Study

Twitter/X: Observations from Social Network

  • Observations:
    • Power law \(\alpha\) is roughly 5 (high).
    • Bump at 20: Reflects initial suggestions.
    • Bump at 2000: Maximum follower limit before 2009.

Twitter/X: Followers and Tweets

  • Many users never tweet (low median).
  • The average number of tweets per follower exceeds the median; users tweet far more than expected.
  • Observations include:
    • Dips at 20/2000.
    • Spam accounts commonly seen follower #250/500/2,000/5,000.

Twitter/X: Degrees of Separation

  • Degrees of Separation:
    • Average path length is 4.12.
  • Ranking Users:
    • PageRank: A measure of importance based on connections.
    • Retweets: Reflects influence through content amplification.
  • Graph:

đź›’ ShopSocial Connection: Twitter Insights

ShopSocial Connection

Twitter’s network patterns apply to ShopSocial too:

  • Power law followers → Some ShopSocial reviewers have massive influence
  • Retweets = Shares → Product recommendations spread similarly
  • Hashtags = Categories → Clustering by product type

How would you analyze ShopSocial’s reviewer network?

Top Reviewer Analysis

  • 1,000 Top Reviewers:
  • Data collected on products reviewed from Amazon.com.

đź›’ Apply to ShopSocial

Apply to ShopSocial

This Amazon top reviewer analysis is directly applicable to ShopSocial!

  • Who are ShopSocial’s most influential reviewers?
  • Do top reviewers cluster by product category?
  • Can we identify “bridge” reviewers who review across categories?

Network analysis helps ShopSocial:

  • Identify key influencers for marketing
  • Detect review manipulation patterns
  • Recommend products through social connections

Summary

Key Takeaways

Graph Basics:

  • Nodes, edges, directed/undirected
  • Adjacency matrix representation
  • Connected components

Centrality Measures:

  • Degree: Number of connections
  • Betweenness: Bridge between groups
  • Clustering: How connected are your friends?

Network Properties:

  • Power law: “Rich get richer”
  • Small world: 6 degrees of separation
  • Triadic closure: Triangle formation

Community Detection:

  • Identify clusters in networks
  • Modularity measures quality
  • Applications: Marketing, recommendations

đź“… Looking Ahead

Next Week: Unsupervised Learning Techniques

  • Further Clustering Algorithm
  • Frequent Itemset Analysis
  • Recommendation system

Assessment Reminder:

  • Centrality calculations may appear in assessment
  • Practice with small network examples
  • ShopSocial analysis continues

References

Kernighan, Brian W, and Shen Lin. 1970. “An Efficient Heuristic Procedure for Partitioning Graphs.” The Bell System Technical Journal 49 (2): 291–307.
Milgram, Stanley. 1967. “The Small World Problem.” Psychology Today 2 (1): 60–67.
Newman, Mark. 2010. Networks: An Introduction. Oxford University Press.
Watts, Duncan J, and Steven H Strogatz. 1998. “Collective Dynamics of ’Small-World’ Networks.” Nature 393 (6684): 440–42.

Questions?

Thank you!

Dr. Zexun Chen

đź“§ Zexun.Chen@ed.ac.uk

Office Hours: By appointment