diff --git a/Documentation/3. Expressive matching.md b/Documentation/3. Expressive matching.md index 5ac1f03..58a6788 100644 --- a/Documentation/3. Expressive matching.md +++ b/Documentation/3. Expressive matching.md @@ -1,32 +1,28 @@ # Example: expressive matching -tokens(from:) -> [Token] +The results returned by `tokens(from:)`returns an array tuples with the signature `(tokenizer: TokenType, text: String, range: Range)` -The results returned by `tokens(from:)`returns an array of `Token` where `Token` is a typealias of the tuple `(tokenizer: TokenType, text: String, range: Range)` +To make use of the `tokenizer` element, you need to either use type casting (using `as?`) or type checking (using `is`) for the `tokenizer` element to be useful. -Which requires either type casting (using `as?`) type checking or type checking (using `is`) for the `tokenizer` element to be useful: +Maybe we want to filter out only tokens that are numbers: ````Swift import Mustard let messy = "123Hello world&^45.67" let tokens = messy.tokens(from: .decimalDigits, .letters) +// tokens.count -> 5 -// using type checking -if tokens[0].tokenizer is EmojiToken { - print("found emoji token") -} - -// using type casting -if let _ = tokens[0].tokenizer as? NumberToken { - print("found number token") -} +let numbers = tokens.filter({ $0.tokenizer is NumberToken }) +// numbers.count -> 0 ```` -This can lead to bugs in your logic-- in the example above neither of the print statements will be executed since the tokenizer used was actually the character sets `.decimalDigits`, and `.letters`. +This can lead to bugs in your logic-- in the example above `numberTokens` will be empty because the tokenizers used were the character sets `.decimalDigits`, and `.letters`, so the filter won't match any of the tokens. + +This may seem like an obvious error, but it's the type of unexpected bug that can slip in when we're using loosely typed results. -Mustard can return a strongly typed set of matches if a single `TokenType` is used. +Thankfully, Mustard can return a strongly typed set of matches if a single `TokenType` is used: ````Swift import Mustard @@ -35,6 +31,7 @@ let messy = "123Hello world&^45.67" // call `tokens()` method on string to get matching tokens from string let numberTokens: [NumberToken.Match] = messy.tokens() +// numberTokens.count -> 2 ```` diff --git a/Mustard/MustardTests/CharacterSetTokenTests.swift b/Mustard/MustardTests/CharacterSetTokenTests.swift index fce4396..531aa00 100644 --- a/Mustard/MustardTests/CharacterSetTokenTests.swift +++ b/Mustard/MustardTests/CharacterSetTokenTests.swift @@ -24,6 +24,8 @@ class CharacterSetTokenTests: XCTestCase { let tokens = "123Hello world&^45.67".tokens(from: .decimalDigits, .letters) + let numbers = tokens.filter({ $0.tokenizer is NumberToken }) + XCTAssert(tokens.count == 5, "Unexpected number of characterset tokens [\(tokens.count)]") XCTAssert(tokens[0].tokenizer == CharacterSet.decimalDigits) diff --git a/README.md b/README.md index 869b6be..71dea6f 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ Mustard is a Swift library for tokenizing strings when splitting by whitespace doesn't cut it. -## Quick start +## Quick start using character sets Mustard extends `String` with the method `tokens(from: CharacterSet...)` which allows you to pass in one or more character sets to use criteria to find tokens. @@ -32,17 +32,18 @@ let tokens = messy.tokens(from: .decimalDigits, .letters) // tokens[4].range -> Range(19..<21) ```` +## Expressive use with custom tokenizers + Creating by creating objects that implement the `TokenType` protocol we can create -more advanced tokenizers. Here's some usage of a `DateToken` type that matches tokens -with the a valid `MM/dd/yy` format, and also exposes a `date` property to access the +more advanced tokenizers. Here's some usage of a `DateToken` type (see example for implementation) +that matches tokens with the a valid `MM/dd/yy` format, and also exposes a `date` property to access the corresponding `Date` object. ````Swift import Mustard let messyInput = "Serial: #YF 1942-b 12/01/27 (Scanned) 12/03/27 (Arrived) ref: 99/99/99" - -let tokens:[DateToken.Token] = messyInput.tokens() +let tokens: [DateToken.Token] = messyInput.tokens() // tokens.count -> 2 // ('99/99/99' is *not* matched by `DateToken`) //