Skip to content

Commit

Permalink
[Docs] Batch07: adding examples to string functions (#16862)
Browse files Browse the repository at this point in the history
* Lower,Upper,Lpad,Rpad,Parse_long

* up to REGEXP_EXTRACT

* batch 07 ready for review

* updated definitions in scalar

* Apply suggestions from code review

Co-authored-by: Charles Smith <[email protected]>

* rpad and lpad

* addressing comments

* minor fixes

* improving examples based on suggestions

* matched -> matches

* correcting typo

* Apply suggestions from code review

Co-authored-by: Charles Smith <[email protected]>

---------

Co-authored-by: Charles Smith <[email protected]>
  • Loading branch information
edgar2020 and techdocsmith authored Aug 21, 2024
1 parent 338da67 commit 7256953
Show file tree
Hide file tree
Showing 2 changed files with 228 additions and 32 deletions.
252 changes: 224 additions & 28 deletions docs/querying/sql-functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -1506,11 +1506,13 @@ SELECT
FROM "taxi-trips"
LIMIT 1
```
Returns the following:
| `max_temperature` | `natural_log_max_temp` |
| -- | -- |
| `76` | `4.330733340286331` |
</details>
[Learn more](sql-scalar.md#numeric-functions)
Expand Down Expand Up @@ -1552,19 +1554,65 @@ Looks up the expression in a registered query-time lookup table.
## LOWER
`LOWER(expr)`
Returns the expression in lowercase.
**Function type:** [Scalar, string](sql-scalar.md#string-functions)
* **Syntax:** `LOWER(expr)`
* **Function type:** Scalar, string
<details><summary>Example</summary>
The following example converts the `OriginCityName` column from the `flight-carriers` datasource to lowercase.
```sql
SELECT
"OriginCityName" AS "origin_city",
LOWER("OriginCityName") AS "lowercase"
FROM "flight-carriers"
LIMIT 1
```
Returns the following:
| `origin_city` | `lowercase` |
| -- | -- |
`San Juan, PR` | `san juan, pr` |
</details>
[Learn more](sql-scalar.md#string-functions)
Returns the expression in lowercase.
## LPAD
`LPAD(<CHARACTER>, <INTEGER>, [<CHARACTER>])`
Returns a string of size `length` from `expr`. When the length of `expr` is less than `length`, left pads `expr` with `chars`, which defaults to the space character. Truncates `expr` to `length` if `length` is shorter than the length of `expr`.
**Function type:** [Scalar, string](sql-scalar.md#string-functions)
* **Syntax:** `LPAD(expr, length[, chars])`
* **Function type:** Scalar, string
<details><summary>Example</summary>
The following example left pads the value of `OriginStateName` from the `flight-carriers` datasource to return a total of 11 characters.
```sql
SELECT
"OriginStateName" AS "origin_state",
LPAD("OriginStateName", 11, '+') AS "add_left_padding"
FROM "flight-carriers"
LIMIT 3
```
Returns the following:
| `origin_state` | `add_left_padding` |
| -- | -- |
| `Puerto Rico` | `Puerto Rico` |
| `Massachusetts` | `Massachuset` |
| `Florida` | `++++Florida` |
</details>
[Learn more](sql-scalar.md#string-functions)
Returns the leftmost number of characters from an expression, optionally padded with the given characters.
## LTRIM
Expand Down Expand Up @@ -1794,11 +1842,31 @@ Parses `expr` into a `COMPLEX<json>` object. This operator deserializes JSON val
## PARSE_LONG
`PARSE_LONG(<CHARACTER>, [<INTEGER>])`
Converts a string into a long(BIGINT) with the given radix, or into DECIMAL(base 10) if a radix is not provided.
**Function type:** [Scalar, string](sql-scalar.md#string-functions)
* **Syntax:**`PARSE_LONG(string[, radix])`
* **Function type:** Scalar, string
<details><summary>Example</summary>
The following example converts the string representation of the binary, radix 2, number `1100` into its long (BIGINT) equivalent.
```sql
SELECT
'1100' AS "binary_as_string",
PARSE_LONG('1110', 2) AS "bigint_value"
```
Returns the following:
| `binary_as_string` | `bigint_value` |
| -- | -- |
| `1100` | `14` |
</details>
[Learn more](sql-scalar.md#string-functions)
Converts a string into a BIGINT with the given base or into a DECIMAL data type if the base is not specified.
## PERCENT_RANK
Expand All @@ -1810,11 +1878,33 @@ Returns the relative rank of the row calculated as a percentage according to the
## POSITION
`POSITION(<CHARACTER> IN <CHARACTER> [FROM <INTEGER>])`
Returns the one-based index position of a substring within an expression, optionally starting from a given one-based index. If `substring` is not found, returns 0.
**Function type:** [Scalar, string](sql-scalar.md#string-functions)
* **Syntax**: `POSITION(substring IN expr [FROM startingIndex])`
* **Function type:** Scalar, string
Returns the one-based index position of a substring within an expression, optionally starting from a given one-based index.
<details><summary>Example</summary>
The following example returns the one-based index of the substring `PR` in the `OriginCityName` column from the `flight-carriers` datasource starting from index 5.
```sql
SELECT
"OriginCityName" AS "origin_city",
POSITION('PR' IN "OriginCityName" FROM 5) AS "index"
FROM "flight-carriers"
LIMIT 2
```
Returns the following:
| `origin_city` | `index` |
| -- | -- |
| `San Juan, PR` | `11` |
| `Boston, MA` | `0` |
</details>
[Learn more](sql-scalar.md#string-functions)
## POWER
Expand Down Expand Up @@ -1872,28 +1962,90 @@ Returns the rank with gaps for a row within a window. For example, if two rows t
## REGEXP_EXTRACT
`REGEXP_EXTRACT(<CHARACTER>, <CHARACTER>, [<INTEGER>])`
Apply regular expression `pattern` to `expr` and extract the Nth capture group. If `N` is unspecified or zero, returns the first substring that matches the pattern. Returns `null` if there is no matching pattern.
**Function type:** [Scalar, string](sql-scalar.md#string-functions)
* **Syntax:** `REGEXP_EXTRACT(expr, pattern[, N])`
* **Function type:** Scalar, string
<details><summary>Example</summary>
Applies a regular expression to the string expression and returns the _n_th match.
The following example uses regular expressions to find city names inside the `OriginCityName` column from the `flight-carriers` datasource by matching what comes before the comma.
```sql
SELECT
"OriginCityName" AS "origin_city",
REGEXP_EXTRACT("OriginCityName", '([^,]+)', 0) AS "pattern_match"
FROM "flight-carriers"
LIMIT 1
```
Returns the following:
| `origin_city` | `pattern_match` |
| -- | -- |
| `San Juan, PR` | `San Juan`|
</details>
[Learn more](sql-scalar.md#string-functions)
## REGEXP_LIKE
`REGEXP_LIKE(<CHARACTER>, <CHARACTER>)`
Returns `true` if the regular expression `pattern` finds a match in `expr`. Returns `false` otherwise.
**Function type:** [Scalar, string](sql-scalar.md#string-functions)
* **Syntax:** `REGEXP_LIKE(expr, pattern)`
* **Function type:** Scalar, string
<details><summary>Example</summary>
Returns true or false signifying whether the regular expression finds a match in the string expression.
The following example returns `true` when the `OriginCityName` column from `flight-carriers` has a city name containing a space.
```sql
SELECT
"OriginCityName" AS "origin_city",
REGEXP_LIKE("OriginCityName", '[A-Za-z]+\s[A-Za-z]+') AS "pattern_found"
FROM "flight-carriers"
LIMIT 2
```
Returns the following:
| `origin_city` | `pattern_found` |
| -- | -- |
| `San Juan, PR` | `true` |
| `Boston, MA` | `false` |
</details>
[Learn more](sql-scalar.md#string-functions)
## REGEXP_REPLACE
`REGEXP_REPLACE(<CHARACTER>, <CHARACTER>, <CHARACTER>)`
Replaces all occurrences of a regular expression in a string expression with a replacement string. Refer to capture groups in the replacement string using `$group` syntax. For example: `$1` or `$2`.
**Function type:** [Scalar, string](sql-scalar.md#string-functions)
* **Syntax:** `REGEXP_REPLACE(expr, pattern, replacement)`
* **Function type:** Scalar, string
<details><summary>Example</summary>
The following example matches three consecutive words, where each word is its own capture group, and replaces the matched words with the word in the second capture group punctuated with exclamation marks.
```sql
SELECT
'foo bar baz' AS "original_string",
REGEXP_REPLACE('foo bar baz', '([A-Za-z]+) ([A-Za-z]+) ([A-Za-z]+)' , '$2!') AS "modified_string"
```
Returns the following:
| `original_string` | `modified_string` |
| -- | -- |
| `foo bar baz` | `bar!` |
</details>
[Learn more](sql-scalar.md#string-functions)
Replaces all occurrences of a regular expression in a string expression with a replacement string. The replacement
string may refer to capture groups using `$1`, `$2`, etc.
## REPEAT
Expand Down Expand Up @@ -2041,11 +2193,34 @@ Returns the number of the row within the window starting from 1.
## RPAD
`RPAD(<CHARACTER>, <INTEGER>, [<CHARACTER>])`
Returns a string of size `length` from `expr`. When the length of `expr` is less than `length`, right pads `expr` with `chars`, which defaults to the space character. Truncates `expr` to `length` if `length` is shorter than the length of `expr`.
**Function type:** [Scalar, string](sql-scalar.md#string-functions)
* **Syntax:** `RPAD(expr, length[, chars])`
* **Function type:** Scalar, string
<details><summary>Example</summary>
The following example right pads the value of `OriginStateName` from the `flight-carriers` datasource to return a total of 11 characters.
```sql
SELECT
"OriginStateName" AS "origin_state",
RPAD("OriginStateName", 11, '+') AS "add_right_padding"
FROM "flight-carriers"
LIMIT 3
```
Returns the following:
Returns the rightmost number of characters from an expression, optionally padded with the given characters.
| `origin_state` | `add_right_padding` |
| -- | -- |
| `Puerto Rico` | `Puerto Rico` |
| `Massachusetts` | `Massachuset` |
| `Florida` | `Florida++++` |
</details>
[Learn more](sql-scalar.md#string-functions)
## RTRIM
Expand Down Expand Up @@ -2586,11 +2761,32 @@ For more information, see [UNNEST](./sql.md#unnest).

## UPPER

`UPPER(expr)`
Returns the expression in uppercase.

**Function type:** [Scalar, string](sql-scalar.md#string-functions)
* **Syntax:** `UPPER(expr)`
* **Function type:** Scalar, string

Returns the expression in uppercase.
<details><summary>Example</summary>

The following example converts the `OriginCityName` column from the `flight-carriers` datasource to uppercase.

```sql
SELECT
"OriginCityName" AS "origin_city",
UPPER("OriginCityName") AS "uppercase"
FROM "flight-carriers"
LIMIT 1
```

Returns the following:

| `origin_city` | `uppercase` |
| -- | -- |
`San Juan, PR` | `SAN JUAN, PR` |

</details>

[Learn more](sql-scalar.md#string-functions)

## VAR_POP

Expand Down
8 changes: 4 additions & 4 deletions docs/querying/sql-scalar.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,11 +105,11 @@ String functions accept strings and return a type appropriate to the function.
|`LOOKUP(expr, lookupName[, replaceMissingValueWith])`|Looks up `expr` in an existing [query-time lookup table](lookups.md) that has the name `lookupName`. Returns the optional constant `replaceMissingValueWith` when `expr` is null or when the lookup does not contain a value for `expr`.<br /><br />You can query lookups directly using the [`lookup` schema](sql.md#from).|
|`LOWER(expr)`|Returns `expr` in all lowercase.|
|`UPPER(expr)`|Returns `expr` in all uppercase.|
|`LPAD(expr, length[, chars])`|Returns a string of `length` from `expr` left-padded with `chars`. If `length` is shorter than the length of `expr`, the result is `expr` which is truncated to `length`. The result is null if either `expr` or `chars` is null. If `chars` is an empty string, no padding is added; however, `expr` may be trimmed if necessary.|
|`RPAD(expr, length[, chars])`|Returns a string of `length` from `expr` right-padded with `chars`. If `length` is shorter than the length of `expr`, the result is `expr` which is truncated to `length`. The result is null if either `expr` or `chars` is null. If `chars` is an empty string, no padding is added; however, `expr` may be trimmed if necessary.|
|`PARSE_LONG(string[, radix])`|Parses a string into a long (BIGINT) with the given radix or 10 (decimal) if a radix is not provided.|
|`LPAD(expr, length[, chars])`|Returns a string of `length` from `expr`. If `expr` is shorter than `length`, left pads `expr` with `chars`, which defaults to space characters. If `expr` exceeds `length`, truncates `expr` to equal `length`. If `chars` is an empty string, no padding is added. Returns `null` if either `expr` or `chars` is null.|
|`RPAD(expr, length[, chars])`|Returns a string of `length` from `expr`. If `expr` is shorter than `length`, right pads `expr` with `chars`, which defaults to space characters. If `expr` exceeds `length`, truncates `expr` to equal `length`. If `chars` is an empty string, no padding is added. Returns `null` if either `expr` or `chars` is null.|
|`PARSE_LONG(string[, radix])`|Parses a string into a long (BIGINT) with the given radix, or 10 (decimal) if a radix is not provided.|
|`POSITION(substring IN expr [FROM startingIndex])`|Returns the index of `substring` within `expr` with indexes starting from 1. The search begins at `startingIndex`. If `startingIndex` is not specified, the default is 1. If `substring` is not found, returns 0.|
|`REGEXP_EXTRACT(expr, pattern[, index])`|Apply regular expression `pattern` to `expr` and extract a capture group or `NULL` if there is no match. If `index` is unspecified or zero, returns the first substring that matched the pattern. The pattern may match anywhere inside `expr`. To match the entire string, use the `^` and `$` markers at the start and end of your pattern. When `druid.generic.useDefaultValueForNull = true`, it is not possible to differentiate an empty-string match from a non-match (both return `NULL`).|
|`REGEXP_EXTRACT(expr, pattern[, index])`|Apply regular expression `pattern` to `expr` and extract a capture group or `NULL` if there is no match. If `index` is unspecified or zero, returns the first substring that matches the pattern. The pattern may match anywhere inside `expr`. To match the entire string, use the `^` and `$` markers at the start and end of your pattern. When `druid.generic.useDefaultValueForNull = true`, it is not possible to differentiate an empty-string match from a non-match (both return `NULL`).|
|`REGEXP_LIKE(expr, pattern)`|Returns whether `expr` matches regular expression `pattern`. The pattern may match anywhere inside `expr`; if you want to match the entire string instead, use the `^` and `$` markers at the start and end of your pattern. Similar to [`LIKE`](sql-operators.md#logical-operators), but uses regexps instead of LIKE patterns. Especially useful in WHERE clauses.|
|`REGEXP_REPLACE(expr, pattern, replacement)`|Replaces all occurrences of regular expression `pattern` within `expr` with `replacement`. The replacement string may refer to capture groups using `$1`, `$2`, etc. The pattern may match anywhere inside `expr`; if you want to match the entire string instead, use the `^` and `$` markers at the start and end of your pattern.|
|`REPLACE(expr, substring, replacement)`|Replaces instances of `substring` in `expr` with `replacement` and returns the result.|
Expand Down

0 comments on commit 7256953

Please sign in to comment.