Salmon is a tool for efficiently generating ordinal embeddings. It relies on "active" machine learning algorithms to choose the most informative queries for humans to answer.
This documentation is available at these locations:
- Primary source: https://docs.stsievert.com/salmon/
- Secondary source: as a raw PDF (and as a slower loading PDF).
- Secondary source: as zipped HTML directory, which requires unzipping the directory
then opening up
index.html
.
Please file an issue if you can not access the documentation.
Visit the documentation at https://docs.stsievert.com/salmon/offline.html. Briefly, this should work:
$ cd path/to/salmon
$ conda env create -f salmon.lock.yml
$ conda activate salmon
(salmon) $ pip install -e .
The documentation online mentions more about how to generate an embedding offline: https://docs.stsievert.com/salmon/offline.html#generate-embeddings
With this, it's also possible to create a script that uses and imports Salmon:
from salmon.triplets.samplers import TSTE
import numpy as np
n, d = 85, 2
sampler = TSTE(n=n, d=d)
em_init = np.array([[i, -i] for i in range(n)])
sampler.opt.initialize(embedding=em_init)
queries, scores, meta = sampler.get_queries(num=10_000)
This script allows the data scientist to score queries for an embedding they specify.