Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating embeddings without serving up the model checkpoint each time #17

Open
erichare opened this issue Feb 10, 2020 · 2 comments
Open

Comments

@erichare
Copy link

Hello,

Gobbli is a fantastic package. I've been trying to use it in some of my work. One issue is it seems like the BERT checkpoint is being loaded at each call of embed(). This makes the embedding generation take 20-30 seconds on my machine.

Is there a way to "serve up" this model so that subsequent calls to embed() don't have to load the model checkpoint each time? Or would this require quite a bit of restructuring?

@jasonnance
Copy link
Collaborator

I'm glad you're finding gobbli useful!

You're correct that each call to embed() does a lot of work, although it's not just loading the checkpoint -- it's also writing all your data to disk and reading it in the container, then writing all the embeddings to disk in the container and reading it outside. Depending on how big your dataset is, that might be taking more time.

There isn't currently a way around this -- I'd consider it a fundamental limitation of gobbli's design. If latency is important to you, you may want to look into something like https://github.com/hanxiao/bert-as-service, which is better-suited for serving lower-latency responses. gobbli was only intended for experimental/batch workloads and was more designed to help you quickly determine if a model will work in a production situation rather than serving a production model.

It would be theoretically possible to rework gobbli's model Docker containers into e.g. REST API services (as opposed to single run batch processes) which could be spun up once and repeatedly used, but this would be a fair amount of work, since we'd have to essentially build a mostly-same-but-slightly-different API server within the constraints of a host of different Python environments. I don't see that happening any time soon.

I'll leave this issue open for discussion for a bit, but I don't think there's much we can do about it in the near-term.

@erichare
Copy link
Author

Thank you so much for that response Jason. I had a suspicion this was the case and it makes sense why it's a technical challenge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants