Skip to content

Commit 9741497

Browse files
committed
Add: new post - Consolidated Recommendation Systems
1 parent 21c4380 commit 9741497

5 files changed

+111
-2
lines changed

.github/workflows/build-playground.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ jobs:
4949
JEKYLL_ENV: production
5050
- name: Upload artifact
5151
# Automatically uploads an artifact from the './_site' directory by default
52-
uses: actions/upload-pages-artifact@v2
52+
uses: actions/upload-pages-artifact@v3
5353

5454
# Deployment job
5555
deploy:
@@ -61,4 +61,4 @@ jobs:
6161
steps:
6262
- name: Deploy to GitHub Pages
6363
id: deployment
64-
uses: actions/deploy-pages@v3
64+
uses: actions/deploy-pages@v4
+108
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
---
2+
layout: post
3+
title: "Consolidated Recommendation Systems"
4+
date: "2025-02-13"
5+
categories: RecSys
6+
---
7+
8+
This post is a quick summary of [Lessons Learnt From Consolidating ML Models in a Large Scale Recommendation System](https://netflixtechblog.medium.com/lessons-learnt-from-consolidating-ml-models-in-a-large-scale-recommendation-system-870c5ea5eb4a). I have also added a few questions I got while reading it. I end the post with what we do at work to deal with this.
9+
10+
11+
## Summary
12+
13+
- Recommendation System: candidate gen + ranking.
14+
- A typical ranking model pipeline:
15+
16+
1. Label prep
17+
2. Feature prep
18+
3. Model training
19+
4. Model evaluation
20+
5. Model deployment (with inference contract)
21+
22+
- Each recommendation use case (e.g.: discover page, notifications, related items, category exploration, search) will have a version of the above pipeline.
23+
- As use cases increase, the team will need to maintain multiple such pipelines. It is time-consuming to maintain multiple pipelines and increases points of failure.
24+
25+
<figure class="image">
26+
<img src="{{ site.url }}/assets/2025-02/consolidated_recsys_neflix_1.webp" alt="" style="text-align: center; margin: auto">
27+
<figcaption style="text-align: center">Figure 1: Figure from the Netflix blog linked at the start.</figcaption>
28+
</figure>
29+
30+
- Since the pipelines have the same component, we can consolidate them.
31+
- Consolidated pipeline:
32+
33+
1. Label prep for each use case separately
34+
2. Stratified union of all the prepared labels
35+
3. Feature prep (separate categorical feature representing the use case)
36+
4. Model training
37+
5. Model evaluation
38+
6. Model deployment (with inference contract)
39+
40+
<figure class="image">
41+
<img src="{{ site.url }}/assets/2025-02/consolidated_recsys_neflix_2.webp" alt="" style="text-align: center; margin: auto" width="100">
42+
<figcaption style="text-align: center">Figure 2: Figure from the Netflix blog linked at the start.</figcaption>
43+
</figure>
44+
45+
- Label prep for each use case separately
46+
47+
1. Each use case will have different ways of generating the labels.
48+
2. Use case context details are added as separate features.
49+
- Search context: search query, region
50+
- Similar items context: source item
51+
3. When the use case is search, context features specific to the similar item use case will be filled with default values.
52+
53+
- Union of all the prepared labels
54+
55+
1. Final labelled set: a% samples from use case-1 labels + b% samples from use case-2 labels + … + z% samples from use case-n labels
56+
2. The proportions [a, b, …, z] come from stratification
57+
3. Q: How is this stratification done? Platform traffic across different use cases?
58+
4. Q: What are the results when these proportions are business-driven? Eg: contribution to revenue.
59+
60+
- Feature prep
61+
62+
1. All use case specific features added to the data.
63+
2. If a feature is only used for use case 1 then it will contain default value for all the other use cases.
64+
3. Add a new categorical feature task_type to the features to inform the model about the target reco task.
65+
66+
- Model training happens as usual: feature vector and labels. Architecture remains the same. Optimisation remains the same.
67+
- Model evaluation
68+
69+
1. Check the appropriate eval metrics to check the model.
70+
2. Q: How do we judge if the model performed well for all the use cases?
71+
3. Q: Will it require a separate evaluation set for each use case?
72+
4. Q: Can there be a 2nd order Simpson’s paradox here: the consolidated model performs well, but when tried for individual use cases, its performance is low? My hunch: no.
73+
74+
- Model deployment (with inference contract)
75+
76+
1. Deploy the same model in the respective environment made for each use case. That env will have all the specific network-related knobs: batch size, throughput, latency, caching policy, parallelism, etc.
77+
2. Generic API contract to support the heterogenous context (search query for search, source item for related items use case.)
78+
79+
- Caveats
80+
81+
1. The consolidated use cases should be related (eg: ranking for movies in the search and discover page)
82+
2. One definition of related can be: ranking the same entities.
83+
84+
- Advantages
85+
86+
1. Reduces maintenance costs (less code; fewer deployments)
87+
2. Quick model iterations to all the use cases
88+
- Updates (new features, architecture, etc) for one use case can be applied to other use cases.
89+
- If consolidated tasks are related, then new features don’t cause regression in practice.
90+
3. Can be extended to any related use case from offline and online POV.
91+
4. Cross-learning: the model potentially gains more (hidden) learning from the other tasks. Eg: having search data gives more data to the model learning for related-items task.
92+
- Q: Is this happening? How can we verify this? One way: Train an independent model on the use-case specific data and compare its performance with the consolidated model’s performance on the same task.
93+
94+
- I was confused about what to call this learning paradigm. [Wikipedia](https://en.wikipedia.org/wiki/Multi-task_learning) says that it is multi-task learning.
95+
96+
97+
## Practice at my work
98+
99+
- The models are not merged across different tasks like relevance and search.
100+
- Within relevance ranking tasks (discover, similar items, category exploration), have a common base ranker model.
101+
- On top of that, we have different heuristics to make it better for that particular section.
102+
- Advantages:
103+
- There is only one main model for all related tasks.
104+
- Keeps the heuristics logic simple and, thus, easy to maintain.
105+
- Challenges
106+
- Heuristics are crude/manual/semi-automated → we may be leaving some gains on the table. There are bandit-based approaches to automating it, though.
107+
- It loses out on cross-learning opportunities.
108+
Binary file not shown.
Binary file not shown.

js/heatmap.js

+1
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ var data = [
6060
{ "year": years.length - 10 - 1, "month": 0, "value": 1 },
6161
{ "year": years.length - 10 - 1, "month": 8, "value": 1 },
6262
{ "year": years.length - 11 - 1, "month": 0, "value": 1 },
63+
{ "year": years.length - 11 - 1, "month": 1, "value": 1 },
6364
];
6465

6566

0 commit comments

Comments
 (0)