[BUG] Robust 2-stage recommender system pipeline #207

bschifferer · 2022-09-22T09:55:59Z

Bug description

The unit test of the 2-stage recommender system pipeline is shaky due to multiple reasons:

user_id sent to triton inference server does not exist in FEAST storage
FIASS cannot return k valid candidates given the user query:
-- FIASS will return k-candidates, but filled up with -1 for not found candidates
-- -1 cannot be processed by FEAST
-- Issue is that FIASS has not enough item vectors to generate an index. Even 256 item_embeddings could result in less than 100 candidates

Unit test:
https://github.com/NVIDIA-Merlin/Merlin/blob/main/tests/unit/examples/test_building_deploying_multi_stage_RecSys.py

Edge cases, we should be handling without crashing the systems:

user_id is not available in FEAST
user requests more topk than items in FIASS indexed (n): topk>FIASS
FIASS cannot return k-th valid candidates, even topk<n
FIASS returns item_ids which are not available in FEAST for futher processing
Candidates IDs are not availble in FEAST
Number of candidates are less then requested topk

What should be the result in each of the cases?

The text was updated successfully, but these errors were encountered:

rnyak · 2022-09-22T12:34:22Z

thanks @bschifferer. these are all valid points. can we also add nulls issue to this list? integration test fails if we have nulls in the user id and item id columns in the real dataset.

viswa-nvidia · 2022-09-22T19:36:28Z

Changed priority to P1. Refer https://nvidia.slack.com/archives/C01RP7T89PY/p1663872124879779?thread_ts=1663843219.331779&cid=C01RP7T89PY

karlhigley · 2023-03-22T17:15:10Z

Just for context on how we got here:

The Merlin 1.0 launch created a need to be able to at least tell a story about how serving would work, so we built the multi-stage example and put exactly enough code behind it to make that notebook usually run but not much else.
Session-based has taken a lot of development bandwidth that could otherwise have been allocated to this stuff and directed it elsewhere. Additionally, there's been a significant lack of clarity around how session-based models would fit into multi-stage recommenders, so that work hasn't overlapped with this as much as it otherwise might have.
We've spent large chunks of the past year trying to figure out how to get the pieces of Merlin to work together more smoothly, which has involved a lot of Merlin Core development on the part of the Systems devs.

I agree that this stuff is important though, and we might soon have bandwidth to tackle it, once we get session-based serving for both TF and Torch ironed out. Maybe in the 23.04-23.05 timeline?

bschifferer added bug Something isn't working P0 labels Sep 22, 2022

viswa-nvidia added P1 and removed P0 labels Sep 22, 2022

karlhigley added this to the Merlin 23.05 milestone Mar 22, 2023

viswa-nvidia modified the milestones: Merlin 23.05, Merlin 23.06 May 17, 2023

BlakeB415 mentioned this issue Dec 18, 2023

[BUG] QueryFaiss returning item ID as -1 causing type error #387

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Robust 2-stage recommender system pipeline #207

[BUG] Robust 2-stage recommender system pipeline #207

bschifferer commented Sep 22, 2022 •

edited

Loading

rnyak commented Sep 22, 2022

viswa-nvidia commented Sep 22, 2022

karlhigley commented Mar 22, 2023

[BUG] Robust 2-stage recommender system pipeline #207

[BUG] Robust 2-stage recommender system pipeline #207

Comments

bschifferer commented Sep 22, 2022 • edited Loading

Bug description

rnyak commented Sep 22, 2022

viswa-nvidia commented Sep 22, 2022

karlhigley commented Mar 22, 2023

bschifferer commented Sep 22, 2022 •

edited

Loading