Week 4: Unsupervised Learning Techniques
Feb-2026
Last Week:
This Week:
ShopSocial Question: How can we understand customer opinions, find product bundles, and make personalized recommendations?
By the end of this lecture, you will be able to:
Assessment link: These techniques are directly applicable to your ShopSocial coursework analysis.

Wooclap Question
Go to wooclap.com and enter code: RMBELP
How many distinct customer types do you think ShopSocial has?
Think About It
You’re managing ShopSocial and want to send different marketing emails to different customer types. How would you group 100,000 customers into meaningful segments?
Clustering = Grouping similar items together WITHOUT predefined labels

Using K-Means with RFM Analysis (Recency, Frequency, Monetary):
| Segment | Recency | Frequency | Monetary | Strategy |
|---|---|---|---|---|
| Champions | Recent | Often | High | Reward loyalty |
| At Risk | Long ago | Often | High | Win back campaigns |
| New Customers | Recent | Low | Low | Onboarding emails |
| Hibernating | Long ago | Low | Low | Re-engagement offers |
Note
Key Insight: Clustering helps ShopSocial personalize marketing without manual labeling!
Wooclap Question
Go to wooclap.com and enter code: RMBELP
When you read a product review, how quickly can you tell if it’s positive or negative?
Think About It
ShopSocial has 500,000 product reviews. A human reading 1 review per minute would take 347 days working 24/7 to read them all!
How can we automatically understand customer opinions?
Sentiment Analysis = Automatically determining the emotional tone of text
| Approach | How It Works | Pros | Cons |
|---|---|---|---|
| Lexicon-Based | Count positive/negative words using a dictionary | Simple, interpretable | Misses context, sarcasm |
| Machine Learning | Train classifier on labeled examples | More accurate | Needs training data |
| Deep Learning | Neural networks (BERT, etc.) | State-of-the-art | Computationally expensive |
| LLM-Based | Prompt GPT/Claude/Gemini directly | Flexible, contextual | API costs, latency |
Today’s Focus: Lexicon-based approach (unsupervised - no labels needed!) + LLM insights
Step 1: Build/Use a Sentiment Lexicon
| Word | Sentiment Score |
|---|---|
| excellent | +3 |
| good | +1 |
| okay | 0 |
| bad | -1 |
| terrible | -3 |
Step 2: Score the Text
Example Review: > “This product is excellent! The quality is good but shipping was bad.”
Calculation:
Activity (3 minutes)
Using this simple lexicon, calculate the sentiment score:
| Word | Score |
|---|---|
| love/great/excellent | +2 |
| good/nice | +1 |
| bad/poor | -1 |
| hate/terrible/awful | -2 |
Reviews to score:
Solutions
ShopSocial Challenge
Reviews like “Sick product bro! Absolutely killed it!” use slang that confuses basic lexicons.
VADER (Valence Aware Dictionary for Sentiment Reasoning):
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
reviews = ["This product is AMAZING!!!", "Meh, it's okay I guess...", "Worst purchase ever :("]
for review in reviews:
scores = analyzer.polarity_scores(review)
print(f"{review[:30]:30} → {scores['compound']:.2f}")Output:
This product is AMAZING!!! → 0.69 (Positive)
Meh, it's okay I guess... → 0.00 (Neutral)
Worst purchase ever :( → -0.69 (Negative)
Think About It
What if instead of building complex rules or training models, you could just ask an AI to analyze the sentiment?
Modern Large Language Models (ChatGPT, Claude, Gemini) can perform sentiment analysis with simple prompts:
# Example using OpenAI API
import openai
review = "The product arrived late but the quality exceeded my expectations!"
response = openai.chat.completions.create(
model="gpt-4",
messages=[{
"role": "user",
"content": f"""Analyze this review and return JSON:
- sentiment: positive/negative/mixed
- confidence: 0-1
- key_aspects: list of mentioned aspects with their sentiment
Review: "{review}" """
}]
)| Aspect | Lexicon-Based | ML/VADER | LLM (GPT/Claude) |
|---|---|---|---|
| Setup effort | Low | Medium | Very Low |
| Handles context | Poor | Moderate | Excellent |
| Sarcasm detection | No | Limited | Good |
| Aspect extraction | Manual rules | Requires training | Built-in |
| Cost per review | Free | Free | ~$0.001-0.01 |
| Speed | Very fast | Fast | Slower (API calls) |
| Explainability | High | Medium | Can explain reasoning |
ShopSocial Strategy
Better prompts = Better results:
| Task | Example Prompt |
|---|---|
| Basic sentiment | “Is this review positive, negative, or neutral?” |
| With confidence | “Rate sentiment from -1 (very negative) to +1 (very positive)” |
| Aspect-based | “List each product aspect mentioned and its sentiment” |
| Actionable insights | “What specific improvements does the customer suggest?” |
| Comparative | “Compare sentiment across these 5 reviews” |
Key Insight: LLMs turn sentiment analysis from a technical problem into a prompt design problem.
Wooclap Question
Go to wooclap.com and enter code: RMBELP
Which products do you think are MOST often bought together on ShopSocial?
Think About It
You’re browsing a laptop on Amazon and see “Frequently bought together: Laptop bag, Mouse, USB hub”
How does Amazon know these items go together? They analyzed millions of shopping carts!

Support: The number of times \(A\) appears among the transactions. \[ sup(A) = \frac{|\{A \subseteq t | t \in T\}|}{|T|} \]
Example: Calculate the support of Golden iPhone.

Support: The number of times \(A\) appears among the transactions. \[ sup(A) = \frac{|\{A \subseteq t | t \in T\}|}{|T|} \]
Example: Calculate the support of Golden iPhone.
Confidence: The number of times both itemsets occur together given the occurrence of \(A\). \[ conf(A \rightarrow B) = \frac{sup(A \cap B)}{sup(A)} \]
Example: Calculate the confidence of Golden iPhone \(\rightarrow\) Purple Case.
Confidence: The number of times both itemsets occur together given the occurrence of \(A\). \[ conf(A \rightarrow B) = \frac{sup(A \cap B)}{sup(A)} \]
Example: Calculate the confidence of Golden iPhone \(\rightarrow\) Purple Case.
Lift: The support for both itemsets occurring together given they are independent. \[ lift(A \rightarrow B) = \frac{sup(A \cap B)}{sup(A) \times sup(B)} \]
Example: Calculate the lift of Golden iPhone \(\rightarrow\) Purple Case.
Lift: The support for both itemsets occurring together given they are independent. \[ lift(A \rightarrow B) = \frac{sup(A \cap B)}{sup(A) \times sup(B)} \]
Example: Calculate the lift of Golden iPhone \(\rightarrow\) Purple Case.
\[ lift(A \rightarrow B) = \frac{sup(A \cap B)}{sup(A) \times sup(B)} \]
\[ lift(A \rightarrow B) = \frac{sup(A \cap B)}{sup(A) \times sup(B)} \]


Wooclap Question
Go to wooclap.com and enter code: RMBELP
How do you think Netflix/Spotify know what to recommend?
🎯 The Answer: B (mostly!)
Collaborative Filtering: Find users like you, recommend what they liked. This is the core idea we’ll explore today.
Think About It
ShopSocial has:
That means 99.96% of the utility matrix is empty! How do we make recommendations with so little data?

Question: Does Douglas like R?
Question:
Q1: Find similarity between Douglas and Maurizio.
Q2: Find similarity between Johannes and Maurizio.
Question 3: What product(s) would you recommend for Maurizio?
Question 3+: What product(s) would you recommend for Maurizio now?
Other actions to improve results:
Other actions to improve results:
Other actions to improve results:
| Question | Your Answer |
|---|---|
| 1. K-means requires specifying ___ in advance | |
| 2. VADER is designed for analyzing ___ text | |
| 3. Support measures how ___ an itemset appears | |
| 4. Lift > 1 indicates items are ___ | |
| 5. Collaborative filtering finds similar ___ |
Answers:
Clustering (Review):
Sentiment Analysis:
Frequent Itemset Analysis:
Recommendation Systems:
Next Week: Ethics in Social Network Analytics
The techniques we learned today raise important questions:
We’ll explore:
Thank you!
Dr. Zexun Chen
📧 Zexun.Chen@ed.ac.uk
Office Hours: By appointment
Next Week: Ethics in Social Network Analytics
