Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Robust 2-stage recommender system pipeline #207

Open
bschifferer opened this issue Sep 22, 2022 · 3 comments
Open

[BUG] Robust 2-stage recommender system pipeline #207

bschifferer opened this issue Sep 22, 2022 · 3 comments
Labels
bug Something isn't working P1
Milestone

Comments

@bschifferer
Copy link
Contributor

bschifferer commented Sep 22, 2022

Bug description

The unit test of the 2-stage recommender system pipeline is shaky due to multiple reasons:

  • user_id sent to triton inference server does not exist in FEAST storage
  • FIASS cannot return k valid candidates given the user query:
    -- FIASS will return k-candidates, but filled up with -1 for not found candidates
    -- -1 cannot be processed by FEAST
    -- Issue is that FIASS has not enough item vectors to generate an index. Even 256 item_embeddings could result in less than 100 candidates

Unit test:
https://github.com/NVIDIA-Merlin/Merlin/blob/main/tests/unit/examples/test_building_deploying_multi_stage_RecSys.py

Edge cases, we should be handling without crashing the systems:

  • user_id is not available in FEAST
  • user requests more topk than items in FIASS indexed (n): topk>FIASS
  • FIASS cannot return k-th valid candidates, even topk<n
  • FIASS returns item_ids which are not available in FEAST for futher processing
  • Candidates IDs are not availble in FEAST
  • Number of candidates are less then requested topk

What should be the result in each of the cases?

@bschifferer bschifferer added bug Something isn't working P0 labels Sep 22, 2022
@rnyak
Copy link
Contributor

rnyak commented Sep 22, 2022

thanks @bschifferer. these are all valid points. can we also add nulls issue to this list? integration test fails if we have nulls in the user id and item id columns in the real dataset.

@viswa-nvidia viswa-nvidia added P1 and removed P0 labels Sep 22, 2022
@viswa-nvidia
Copy link

@karlhigley
Copy link
Contributor

Just for context on how we got here:

  • The Merlin 1.0 launch created a need to be able to at least tell a story about how serving would work, so we built the multi-stage example and put exactly enough code behind it to make that notebook usually run but not much else.
  • Session-based has taken a lot of development bandwidth that could otherwise have been allocated to this stuff and directed it elsewhere. Additionally, there's been a significant lack of clarity around how session-based models would fit into multi-stage recommenders, so that work hasn't overlapped with this as much as it otherwise might have.
  • We've spent large chunks of the past year trying to figure out how to get the pieces of Merlin to work together more smoothly, which has involved a lot of Merlin Core development on the part of the Systems devs.

I agree that this stuff is important though, and we might soon have bandwidth to tackle it, once we get session-based serving for both TF and Torch ironed out. Maybe in the 23.04-23.05 timeline?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P1
Projects
None yet
Development

No branches or pull requests

4 participants