A micro-service to support analyzing words based on models of word embeddings (aka. "word2vec").
- Install python.
- Run
pip install -r requirements.txt
to install dependencies - Run
python download-google-news-model.py
to download the google news model file.
We develop with PyCharm.
Configuration happens via environment variables. We use python-dotenv
to manage this on locall dev
machines. Make a .env
file and define the following in it:
- SENTRY_DSN - optional url for Sentry logging
- SECRET_KEY - for Flask
- LOG_LEVEL - DEBUG, INFO, etc.
- MEDIA_CLOUD_API_KEY - your Media Cloud API key
Two options:
- Development: run
python run.py
to test it out - Production-like: run
./run.sh
to run it with gunicorn
You can then hit the local homepage to try it out from a simple web-testing harness: http://localhost:8000
Or you can test that with something like this (the first request takes a while to load the giant model into memory):
import requests
response = requests.post("http://localhost:8000/api/v2/google-news2d.json",
data = {'words[]':['apples', 'bananas', 'three']})
print response.json()
This is configured to deploy as a Heroku buildpack to dokku.
You'll need to do something like this to set the required environment variables:
dokku config:set word-embeddings SECRET_KEY=oiwajj243josadjoi SENTRY_DSN=https://THING1:[email protected]/THING3 MEDIA_CLOUD_API_KEY=MY_AWESOME_KEY
- Update the semantic version number in
server/__init.py__
- Tag the repository with that number, like
v4.5.2
- Push it to the server, like
git push dokku v4.5.2:master