Add a beam search implementation and a .generate() method to the mo…

…del (allenai#83)
AlibabaPAI · Apr 6, 2023 · 5607566 · 5607566
1 parent 9911b78
commit 5607566
Show file tree

Hide file tree

Showing 5 changed files with 2,019 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -60,11 +60,33 @@ This may require a reservation on the Infiniband cluster.
 
 See the [Beaker documentation](https://beaker-docs.apps.allenai.org/distributed-training.html) for more information on distributed training.
 
+## Generating text
+
+You can use the `generate()` method to produce text using beam search with a variety of options.
+
+For example:
+
+```python
+# Prepare inputs.
+# Note: we don't want the EOS token added to the end of the input, hence
+# the `add_special_tokens=False`.
+input_ids = tokenizer.encode("I'm a large language model, ", add_special_tokens=False)
+# `model.generate()` expects a batch.
+input_tensor = torch.tensor(input_ids).unsqueeze(0)
+
+# Run beam search.
+outputs = model.generate(input_tensor, max_steps=3, beam_size=3)
+
+# The output token IDs are shape (batch_size, beam_size, max_steps)
+best_generation = outputs.token_ids[0][0].tolist()
+print(tokenizer.decode(best_generation))
+```
+
 ## Finding official runs
 
 We keep all of our runs in WandB under [the "ai2-llm" entity](https://wandb.ai/ai2-llm).
 We don't store model checkpoints in WandB. Those are in GCS under `gs://allennlp-olmo/<wandb_run_path>`.
 
 ### Highlighted models
 
- * 300M parameters, ~70B tokens, a starter model that's not completely random: https://wandb.ai/ai2-llm/LLM-scripts/runs/ed5krfk9
+ * 300M parameters, ~70B tokens, a starter model that's not completely random: https://wandb.ai/ai2-llm/LLM-scripts/runs/ed5krfk9