diff --git a/jep-014-string-functions.md b/jep-014-string-functions.md index ee7352f..fa7493c 100644 --- a/jep-014-string-functions.md +++ b/jep-014-string-functions.md @@ -4,9 +4,10 @@ |---|--- | **JEP** | 14 | **Author** | Maxime Labelle, Chris Armstrong (GorillaStack), Richard Gibson -| **Created**| 13-October-2022 | **SemVer** | MINOR | **Status**| accepted +| **Created**| 13-October-2022 +| **Obsoleted by**| [JEP-14a](./jep-014a-string-functions.md) ## Abstract diff --git a/jep-014a-string-functions.md b/jep-014a-string-functions.md new file mode 100644 index 0000000..861a34f --- /dev/null +++ b/jep-014a-string-functions.md @@ -0,0 +1,296 @@ +# String Functions + +||| +|---|--- +| **JEP** | 14a +| **Author** | Maxime Labelle, Chris Armstrong (GorillaStack), Richard Gibson +| **SemVer** | MINOR +| **Status**| draft +| **Created**| 13-October-2022 +| **Obsoletes**| [JEP-14](./jep-014-string-functions.md) + +## Addendum + +|Date|Description +|---|---| +|15-March-2023|Clarified error type precedence. + +## Abstract + +This JEP introduces a core set of useful string manipulation functions. Those functions are modeled from functions found in popular programming languages such as JavaScript and Python. + +## Specification + +Some string manipulation functions bring the new concept of _optional arguments_ to JMESPath functions. The specification paragraph on function evaluation must thus be changed accordingly – highlighted in **bold** in the text below: + +_Functions can ~~either~~ have a specific arity, **a range of valid – minimum and maximum – number of arguments** or be variadic with a minimum number of arguments. If a function-expression is encountered where the arity does not match or the minimum number of arguments for a variadic function is not provided, then implementations must indicate to the caller that an invalid-arity error occurred. How and when this error is raised is implementation specific._ + +Some functions accept number arguments which are further constrained to integers or even non-negative integers. This JEP specifies a new error +type `invalid-value` by updating the paragraph on type constraints from the specification like so: + +_Each function signature declares the types of its input parameters. If any type constraints are not met, implementations must indicate that an `invalid-type` error occurred. **If a function parameter accepts values constrained to a specific subset of a type and those constraints are not met, implementations must report that an `invalid-value` error occurred.**_ + +_The [initial version of this JEP](./jep-014-string-functions.md) had a provision stating that_ “How and when those errors are raised is implementation specific”. _This provision has been removed. Implementation must perform type-checking for all function parameters_ before _attempting to evaluate the set of valid values for a given type._ + + +### find_first + +``` +int find_first(string $subject, string $sub[, int $start[, int $end]]) +``` +Given the `$subject` string, `find_first()` returns the zero-based index of the first occurence where the `$sub` substring appears in `$subject` or `null` if it does not appear. If either the `$subject` or the `$sub` argument is an empty string, `find_first()` returns `null`. + +The `$start` and `$end` parameters are optional and allow restricting to the slice `[$start:$end]` the range within `$subject` in which `$sub` must be found. + +- If `$start` is omitted, it defaults to `0` (which is the start of the `$subject` string). +- If `$end` is omitted, it defaults to `length(subject)` (which is past the end of the `$subject` string). + +If not omitted, the `$start` or `$end` arguments are expected to be integers. Otherwise, an error MUST be raised. + +Contrary to similar functions found in most popular programming languages, the `find_first()` function does not return `-1` if no occurrence of the substring can be found. Instead, it returns `null` for consistency reasons with how JMESPath behaves. + +### Examples + +| Given | Expression | Result +|---|---|--- +| `"subject string"` | `` find_first(@, 'string') `` | `8` +| `"subject string"` | `` find_first(@, 'string', `0`) `` | `8` +| `"subject string"` | `` find_first(@, 'string', `0`, `14`) `` | `8` +| `"subject string"` | `` find_first(@, 'string', `-99`, `100`) `` | `8` +| `"subject string"` | `` find_first(@, 'string', `-6`) `` | `8` +| `"subject string"` | `` find_first(@, 'string', `0`, `13`) `` | `null` +| `"subject string"` | `` find_first(@, 'string', `8`) `` | `8` +| `"subject string"` | `` find_first(@, 'string', `8`, `11`) `` | `null` +| `"subject string"` | `` find_first(@, 'string', `9`) `` | `null` +| `"subject string"` | `` find_first(@, 's') `` | `0` +| `"subject string"` | `` find_first(@, 's', `1`) `` | `8` +| `"subject string"` | `` find_first(@, '') `` | `null` + +### find_last + +``` +int find_last(string $subject, string $sub[, int $start[, int $end]]) +``` +Given the `$subject` string, `find_last()` returns the zero-based index of the last occurence where the `$sub` substring appears in `$subject` or `null` if it does not appear. If either the `$subject` or the `$sub` argument is an empty string, `find_last()` returns `null`. + +The `$start` and `$end` parameters are optional and allow restricting to the slice `[$start:$end]` the range within `$subject` in which `$sub` must be found. + +- If `$start` is omitted, it defaults to `0` (which is the start of the `$subject` string). +- If `$end` is omitted, it defaults to `length(subject)` (which is past the end of the `$subject` string). + +If not omitted, the `$start` or `$end` arguments are expected to be integers. Otherwise, an error MUST be raised. + +Contrary to similar functions found in most popular programming languages, the `find_last()` function does not return `-1` if no occurrence of the substring can be found. Instead, it returns `null` for consistency reasons with how JMESPath behaves. + +### Examples + +| Given | Expression | Result +|---|---|--- +| `"subject string"` | `` find_last(@, 'string') `` | `8` +| `"subject string"` | `` find_last(@, 'string', `8`) `` | `8` +| `"subject string"` | `` find_last(@, 'string', `8`, `9`) `` | `null` +| `"subject string"` | `` find_last(@, 'string', `9`) `` | `null` +| `"subject string"` | `` find_last(@, 's') `` | `0` +| `"subject string"` | `` find_last(@, 's', `1`) `` | `8` +| `"subject string"` | `` find_last(@, 's', `0`, `7`) `` | `0` +| `"subject string"` | `` find_last(@, '') `` | `null` + +### lower + +``` +string lower(string $subject) +``` +Returns the lowercase `$subject` string using Unicode default casing conversion specification. + +### Examples + +| Given | Expression | Result +|---|---|--- +| `"STRING"` | `` lower(@) `` | `"string"` + +### pad_left + +``` +string pad_left(string $subject, number $width[, string $pad]) +``` + +Given the `$subject` string, `pad_left()` adds characters to the beginning and returns a string of length at least `$width`. + +The `$pad` optional string parameter specifies the padding character. +If omitted, it defaults to an ASCII space (U+0020). +If present, it MUST have length 1, otherwise an error MUST be raised. + +If the `$subject` string has length greater than or equal to `$width`, it is returned unmodified. + +If `$width` is not an integer or is negative, an error MUST be raised. + +### Examples + +| Given | Expression | Result +|---|---|--- +| `"string"` | `` pad_left(@, `0`) `` | `"string"` +| `"string"` | `` pad_left(@, `5`) `` | `"string"` +| `"string"` | `` pad_left(@, `10`) `` | `"    string"` +| `"string"` | `` pad_left(@, `10`, '-') `` | `"----string"` + +### pad_right + +``` +string pad_right(string $subject, number $width[, string $pad]) +``` + +Given the `$subject` string, `pad_right()` adds characters to the end and returns a string of length at least `$width`. + +The `$pad` optional string parameter specifies the padding character. +If omitted, it defaults to an ASCII space (U+0020). +If present, it MUST have length 1, otherwise an error MUST be raised. + +If the `$subject` string has length greater than or equal to `$width`, it is returned unmodified. + +If `$width` is not an integer or is negative, an error MUST be raised. + +### Examples + +| Given | Expression | Result +|---|---|--- +| `"string"` | `` pad_right(@, `0`) `` | `"string"` +| `"string"` | `` pad_right(@, `5`) `` | `"string"` +| `"string"` | `` pad_right(@, `10`) `` | `"string    "` +| `"string"` | `` pad_right(@, `10`, '-') `` | `"string----"` + +### replace + +``` +string replace(string $subject, string $old, string $new[, number $count]) +``` +Given the `$subject` string, `replace()` replaces occurrences of the `$old` substring with the `$new` substring. + +The `$count` optional integer specifies how many occurrences of the `$old` substring in `$subject` are replaced. If this parameter is omitted, all occurrences are replaced. If `$count` is not an integer or is negative, an error MUST be raised. + +The `replace()` function has no effect if `$count` is `0`. + +### Examples + +| Given | Expression | Result +|---|---|--- +| `"aabaaabaaaab"` | `` replace(@, 'aa', '-', `0`) `` | `"aabaaabaaaab"` +| `"aabaaabaaaab"` | `` replace(@, 'aa', '-', `1`) `` | `"-baaabaaaab"` +| `"aabaaabaaaab"` | `` replace(@, 'aa', '-', `2`) `` | `"-b-abaaaab"` +| `"aabaaabaaaab"` | `` replace(@, 'aa', '-', `3`) `` | `"-b-ab-aab"` +| `"aabaaabaaaab"` | `` replace(@, 'aa', '-') `` | `"-b-ab--b"` + +### split + +``` +array[string] split(string $subject, string $search[, number $count]) +``` + +Given the `$subject` string, `split()` breaks on ocurrences of the string `$search` and returns an array. + +The `split()` function returns an array containing each partial string between occurrences of `$search`. If `$subject` contains no occurrences of the `$search` string, an array containing just the original `$subject` string will be returned. + +If the `$search` argument is an empty string, `split()` breaks on every character and returns an array containing each character from the `$subject` string. Thus, if `$subject` is _also_ an empty string, `split()` returns an empty array. + +The `$count` optional integer specifies the maximum number of split points within the `$search` string. +If this parameter is omitted, all occurrences are split. If `$count` is not an integer or is negative, an error MUST be raised. + +If `$count` is equal to `0`, `split()` returns an array containing a single element, the `$subject` string. + +Otherwise, the `split()` function breaks on occurrences of the `$search` string up to `$count` times. The last string in the resulting array containing the remaining contents of `$subject` unmodified. + +**Note**: The `split()` function was [originally designed by Chris Armstrong](https://github.com/GorillaStack/jmespath.site/blob/master/docs/proposals/string-manipulation.rst). However, its behaviour has been slightly altered for consistency reasons. + +### Examples + +| Expression | Result +|---|--- +| `split('', '')` | `[]` +| `split('all chars', '')` | `[ "a", "l", "l", " ", "c", "h", "a", "r", "s" ]` +| `split('/', '/')` | `[ "", "" ]` | +|`` split('average\|-\|min\|-\|max\|-\|mean\|-\|median', '\|-\|') `` | `[ "average", "min", "max", "mean", "median" ]` +|`` split('average\|-\|min\|-\|max\|-\|mean\|-\|median', '\|-\|', `3`) `` | `[ "average", "min", "max", "mean\|-\|median" ]` +|`` split('average\|-\|min\|-\|max\|-\|mean\|-\|median', '\|-\|', `2`) `` | `[ "average", "min", "max\|-\|mean\|-\|median" ]` +|`` split('average\|-\|min\|-\|max\|-\|mean\|-\|median', '\|-\|', `1`) `` | `[ "average", "min\|-\|max\|-\|mean\|-\|median" ]` +|`` split('average\|-\|min\|-\|max\|-\|mean\|-\|median', '\|-\|', `0`) `` | `[ "average\|-\|min\|-\|max\|-\|mean\|-\|median" ]` +| `split('average\|-\|min\|-\|max\|-\|mean\|-\|median', '-')` | `[ "average\|", "\|min\|", "\|max\|", "\|mean\|", "\|median" ]` + +## Specification + +### trim + +``` +string trim(string $subject[, string $chars]) +``` +Given the `$subject` string, `trim()` removes the leading and trailing characters found in `$chars`. + +The `$chars` optional string parameter represents a set of characters to be removed. If this parameter is not specified, or is an empty string, whitespace characters are removed from the `$subject` string. Whitespaces are defined by the Unicode standard as codepoints having the `White_Space` property set to `Yes`. + +### Examples + +| Given | Expression | Result +|---|---|--- +| `" subject string "` | `` trim(@) `` | `"subject string"` +| `" subject string "` | `` trim(@, '') `` | `"subject string"` +| `" subject string "` | `` trim(@, ' ') `` | `"subject string"` +| `" subject string "` | `` trim(@, 's') `` | `" subject string "` +| `" subject string "` | `` trim(@, 'su') `` | `" subject string "` +| `" subject string "` | `` trim(@, 'su ') `` | `"bject string"` +| `" subject string "` | `` trim(@, 'gsu ') `` | `"bject strin"` + +### trim_left + +``` +string trim_left(string $subject[, string $chars]) +``` +Given the `$subject` string, `trim_left()` removes the leading characters found in `$chars`. + +Like for the `trim()` function, the `$chars` optional string parameter represents a set of characters to be removed. `trim_left()` defaults to removing whitespace characters if `$chars` is not specified or is an empty string. + +### Examples + +| Given | Expression | Result +|---|---|--- +| `" subject string "` | `` trim_left(@) `` | `"subject string "` +| `" subject string "` | `` trim_left(@, 's') `` | `" subject string "` +| `" subject string "` | `` trim_left(@, 'su') `` | `" subject string "` +| `" subject string "` | `` trim_left(@, 'su ') `` | `"bject string "` +| `" subject string "` | `` trim_left(@, 'gsu ') `` | `"bject string "` + +### trim_right + +``` +string trim_right(string $subject[, string $chars]) +``` +Given the `$subject` string, `trim_right()` removes the trailing characters found in `$chars`. + +Like for the `trim()` and `trim_left()` functions, the `$chars` optional string parameter represents a set of characters to be removed. `trim_right()` defaults to removing whitespace characters if `$chars` is not specified or is an empty string. + +### Examples + +| Given | Expression | Result +|---|---|--- +| `" subject string "` | `` trim_right(@) `` | `" subject string"` +| `" subject string "` | `` trim_right(@, 's') `` | `" subject string "` +| `" subject string "` | `` trim_right(@, 'su') `` | `" subject string "` +| `" subject string "` | `` trim_right(@, 'su ') `` | `" subject string"` +| `" subject string "` | `` trim_right(@, 'gsu ') `` | `" subject strin"` + +### upper + +``` +string upper(string $subject) +``` +Returns the uppercase `$subject` string using Unicode default casing conversion specification. + +| Given | Expression | Result +|---|---|--- +| `"string"` | `` upper(@) `` | `"STRING"` + +## Compliance + +A new `string_functions.json` file will be added to the compliance tests. +The test suite will introduce the following new error type: + +- invalid-value + +This error type would be raised by `split()` for instance, if its `$count` parameter is negative or not an integer.