Skip to content

Commit

Permalink
Update PACKAGE.md to include Llama info (#7104)
Browse files Browse the repository at this point in the history
* Update PACKAGE.md to include Llama info

* Apply suggestions from code review

Co-authored-by: Eric StJohn <[email protected]>

---------

Co-authored-by: Eric StJohn <[email protected]>
  • Loading branch information
tarekgh and ericstj authored Mar 22, 2024
1 parent 19fb805 commit b8f20bf
Showing 1 changed file with 26 additions and 0 deletions.
26 changes: 26 additions & 0 deletions src/Microsoft.ML.Tokenizers/PACKAGE.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,18 @@ Microsoft.ML.Tokenizers supports various the implmentation of the tokenization u
* BPE - Byte pair encoding model
* English Roberta model
* Tiktoken model
* Llama model

## How to Use

```c#
using Microsoft.ML.Tokenizers;
using System.Net.Http;
using System.IO;

//
// Using Tiktoken Tokenizer
//
// initialize the tokenizer for `gpt-4` model, downloading data files
Tokenizer tokenizer = await Tokenizer.CreateTiktokenForModelAsync("gpt-4");
Expand All @@ -33,6 +40,25 @@ Console.WriteLine($"5 tokens from start: {processedText.Substring(0, trimIndex)}
IReadOnlyList<int> ids = tokenizer.EncodeToIds(source);
Console.WriteLine(string.Join(", ", ids));
// prints: 1199, 4037, 2065, 374, 279, 1920, 315, 45473, 264, 925, 1139, 264, 1160, 315, 11460, 13
//
// Using Llama Tokenizer
//
// Open stream of remote Llama tokenizer model data file
using HttpClient httpClient = new();
const string modelUrl = @"https://huggingface.co/hf-internal-testing/llama-tokenizer/resolve/main/tokenizer.model";
using Stream remoteStream = await httpClient.GetStreamAsync(modelUrl);

// Create the Llama tokenizer using the remote stream
Tokenizer llamaTokenizer = Tokenizer.CreateLlama(remoteStream);
string input = "Hello, world!";
ids = llamaTokenizer.EncodeToIds(input);
Console.WriteLine(string.Join(", ", ids));
// prints: 1, 15043, 29892, 3186, 29991
Console.WriteLine($"Tokens: {llamaTokenizer.CountTokens(input)}");
// print: Tokens: 5
```

## Main Types
Expand Down

0 comments on commit b8f20bf

Please sign in to comment.