Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/master'
Browse files Browse the repository at this point in the history
  • Loading branch information
linux-china committed Apr 4, 2024
2 parents 47d9150 + 24f1dda commit dd8ee91
Show file tree
Hide file tree
Showing 20 changed files with 1,288 additions and 407 deletions.
808 changes: 552 additions & 256 deletions Cargo.lock

Large diffs are not rendered by default.

30 changes: 16 additions & 14 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@ name = "zawk"
version = "0.5.3"
authors = ["Eli Rosenthal <[email protected]>", "linux_china <[email protected]>"]
edition = "2021"
description = "an efficient Awk-like language with stdlib"
homepage = "https://github.com/linux-china/frawk"
repository = "https://github.com/linux-china/frawk"
description = "An efficient Awk-like language implementation by Rust with stdlib"
homepage = "https://github.com/linux-china/zawk"
repository = "https://github.com/linux-china/zawk"
readme = "README.md"
keywords = ["awk", "csv", "tsv"]
keywords = ["awk", "csv", "tsv", "etl", "stdlib"]
categories = ["command-line-utilities", "text-processing"]
license = "MIT OR Apache-2.0"
build = "build.rs"
Expand All @@ -17,7 +17,7 @@ build = "build.rs"
log = "0.4"
env_logger = "0.11"
petgraph = "0.6"
smallvec = "1.13.1"
smallvec = "1.13.2"
hashbrown = "0.14"
lazy_static = "1.4.0"
regex = "1.10"
Expand All @@ -43,16 +43,16 @@ itertools = "0.12"
num-traits = "0.2"
assert_cmd = "2.0.14"
paste = "1.0"
cranelift = "0.105"
cranelift-codegen = "0.105"
cranelift-frontend = "0.105"
cranelift-module = "0.105"
cranelift-native = "0.105"
cranelift-jit = "0.105"
cranelift = "0.106"
cranelift-codegen = "0.106"
cranelift-frontend = "0.106"
cranelift-module = "0.106"
cranelift-native = "0.106"
cranelift-jit = "0.106"
fast-float = "0.2"
bumpalo = { version = "3.15.3", features = ["collections"] }
target-lexicon = "0.12.14"
uuid = { version = "1.7", features = ["v4", "v7"] }
uuid = { version = "1.8", features = ["v4", "v7", "fast-rng"] }
ulid = "1"
rs-snowflake="0.6"
fend-core = "1.4"
Expand All @@ -63,6 +63,7 @@ base58 = "0.2"
base64 = "0.22"
base-62 = "0.1"
urlencoding = "2"
flate2 = "1.0"
url = "2"
sha2 = "0.10"
md5 = "0.7"
Expand All @@ -83,7 +84,8 @@ serde = "1"
serde_json = "1"
logos = "0.14"
local-ip-address = "0.6"
reqwest = { version = "0.11", features = ["blocking"] }
reqwest = { version = "0.12", features = ["blocking"] }
oneio = {version = "0.16", default-features=false, features = ["remote", "compressions"]}
nats = "0.24"
redis = "0.25"
minio = "0.1.0"
Expand All @@ -98,7 +100,7 @@ shlex = "1"
shell-escape="0.1"
pad="0.1"
rusqlite = { version = "0.31", features = ["bundled"] }
mysql = { version = "24" }
mysql = { version = "25" }
csv = "1"
semver = "1"
ctor="0.2"
Expand Down
2 changes: 1 addition & 1 deletion info/How-to-add-builtin-function.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,6 @@ please refer `ENVIRON` as example.
cases - https://github.com/whatisinternet/inflector
* Internationalization with gawk: https://www.gnu.org/software/gawk/manual/html_node/I18N-Example.html

# todo
# Tools

* Apache Parquet Read: please use [dr](https://crates.io/crates/dr) to convert parquet to CSV file.
25 changes: 20 additions & 5 deletions info/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,12 @@ Time flies, and we need a new Modern AWK to work with DuckDB, ClickHouse, S3, KV

# Why not just contribute to frawk?

frawk is a foundation to zawk for syntax, types, lex etc.,
and zawk focuses to make AWK more powerful with standard library.
Now I'm not sure that developers will accept my changes to frawk, and zawk just experimental work: `zawk = AWK + stdlib`.
frawk is a foundation to zawk for syntax, types, lex etc.,
and zawk focuses to make AWK more powerful with standard library.
Now I'm not sure that developers will accept my changes to frawk, and zawk just experimental
work: `zawk = AWK + stdlib`.

Frawk still good for text processing, embedded etc.,
Frawk still good for text processing, embedded etc.,
and if possible I will contribute some work to frawk, for example:

* Upgrade to Rust 2021
Expand All @@ -25,7 +26,7 @@ and if possible I will contribute some work to frawk, for example:
# zawk will fix some bugs in frawk?

Yes. Eli Rosenthal had much less time over the last 1-2 years to devote to bug fixes and feature requests for frawk,
and I will try my best to fix bugs in frawk.
and I will try my best to fix bugs in frawk.

# Any roadmap for zawk?

Expand All @@ -41,3 +42,17 @@ Now I'm not sure about the roadmap, but I will try my best to make zawk more pow
```shell
$ duckdb -c "COPY (select * from 'family.parquet') TO 'family.csv' (FORMAT CSV)"
```

# Special types in text

* bool: `mkbool("true")`
* Tuple: `tuple("('abc',123)")`: IntMap<Str>
* Array: `parse_array("[1,2,3]")`: IntMap<Str>
* Record: `record("{field1:1,field2:'two'}")`: StrMap<Str>
* variants: `days(30)`, `week(2)`: StrMap<Str>, and key is `name` and `value`.
* flags: `{read,write}`: StrMap<Int>

You can use above functions to parse special types in text.
If possible, don't add space in value text.

**Tips**: No matter what type you use, the format should be regular expression friendly.
84 changes: 63 additions & 21 deletions info/stdlib.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ AWK stdlib Cheat Sheet: https://cheatography.com/linux-china/cheat-sheets/zawk/

# Text functions

Text is encoding with utf-8 by default.
Text is encoding with utf-8 by default.

### char_at

Expand Down Expand Up @@ -124,14 +124,14 @@ Trim text with chars with `trim($1, "[]()")`

### starts_with/ends_with/contains

The return value is `1` or `0`.
The return value is `1` or `0`.

- `starts_with($1, "https://")`
- `ends_with($1, ".com")`
- `contains($1, "//")`

Why not use regex? Because starts_with/ends_with/contains are easy to use and understand.
Most libraries include these functions, and I don't want AWK stdlib weird.
Why not use regex? Because starts_with/ends_with/contains are easy to use and understand.
Most libraries include these functions, and I don't want AWK stdlib weird.

### mask

Expand Down Expand Up @@ -164,10 +164,12 @@ Return default value if text is empty or not exist.

### append_if_missing/prepend_if_missing

Add suffix/prefix if missing
Add suffix/prefix if missing/present

- `append_if_missing("nats://example.com","/") # example.com/`
- `preappend_if_missing("example.com","https://") # https://example.com`
- `remove_if_end("demo.json", ".json") # demo`
- `remove_if_begin("demo.json", "file://./") # file://./demo.json`

### quote/double_quote

Expand All @@ -180,8 +182,8 @@ quote/double text if not quoted/double quoted.

Convert bytes to human-readable format, and vice versa. Units: `B`, `KB`, `MB`, `GB`, `TB`, `PB`, `EB`, `ZB`, `YB`.

- `format_bytes(1024)`: 1 KB
- `to_bytes("2 KB")`: 2024
- `format_bytes(1024)`: 1 KB
- `to_bytes("2 KB")`: 2024

# Text Escape

Expand Down Expand Up @@ -249,28 +251,31 @@ array fields:

### Pairs

Parse pairs text to array(MapStrStr), for example:
Parse pairs text to array(MapStrStr), for example:

* URL query string `id=1&name=Hello%20World1`
* Trace Context tracestate: `congo=congosSecondPosition,rojo=rojosFirstPosition`
* Cookies: `pairs(cookies_text, ";", "=")`, such
as: `_device_id=c49fdb13b5c41be361ee80236919ba50; user_session=qDSJ7GlA3aLriNnDG-KJsqw_QIFpmTBjt0vcLy5Vq2ay6StZ;`

Usage: `pairs("a=b,c=d")`, `pairs("id=1&name=Hello%20World","&")`, `pairs("a=b;c=d",";","=")`.

**Tips**: if `pairs("id=1&name=Hello%20World","&")`, text will be treated as URL query string, and URL decode will
be introduced to decode the value automatically.

### Attributes
### Records

Prometheus/OpenMetrics text format, such as `http_requests_total{method="post",code="200"}`

Usage:
Usage:

* `attributes("http_requests_total{method=\"post\",code=\"200\"}")`
* `attributes("mysql{host=localhost user=root password=123456 database=test}")`
* `record("http_requests_total{method=\"post\",code=\"200\"}")`
* `record("mysql{host=localhost user=root password=123456 database=test}")`

### Message

A message always contains name, headers and boy, and text format is like `http_requests_total{method="post",code="200"}(100)`
A message(record with body) always contains name, headers and body, and text format is
like `http_requests_total{method="post",code="200"}(100)`

Usage:

Expand All @@ -281,8 +286,8 @@ Usage:

Parse function invocation format into `IntMap<Str>`, and 0 indicates function name.

* `arr=func("hello(1,2,3)")`: `arr[0]=>hello`, `arr[1]=>1`
* `arr=func("welcome('Jackie Chan',3)")`: `arr[0]=>welcome`, `arr[1]=>Jackie Chan`
* `arr=func("hello(1,2,3)")`: `arr[0]=>hello`, `arr[1]=>1`
* `arr=func("welcome('Jackie Chan',3)")`: `arr[0]=>welcome`, `arr[1]=>Jackie Chan`

# ID generator

Expand Down Expand Up @@ -348,6 +353,22 @@ gawk兼容

`_join(arr, ",")` IntMap -> Str

### parse_array

`parse_array("['first','second','third']")`: IntMap<Str>

### tuple

`tuple("(1,2,'first','second')")`: IntMap<Str>

### variant

`variant("week(5)")`: StrMap<Str>

### flags

`flags("{vip,top20}")`: StrMap<Int>

# Math

Floating-point operations: sin, cos, atan, atan2, log, log2, log10, sqrt, exp are delegated to the Rust standard
Expand Down Expand Up @@ -412,14 +433,14 @@ utc by default.
https://docs.rs/chrono/latest/chrono/format/strftime/index.html

* `strftime("%Y-%m-%d %H:%M:%S")`
* `strftime()` or `strftime("%+")`: ISO 8601 / RFC 3339 date & time format.
* `strftime()` or `strftime("%+")`: ISO 8601 / RFC 3339 date & time format.

### mktime

please refer https://docs.rs/dateparser/latest/dateparser/#accepted-date-formats

- `mktime("2012 12 21 0 0 0")`:
- `mktime("2019-11-29 08:08-08")`:
- `mktime("2012 12 21 0 0 0")`:
- `mktime("2019-11-29 08:08-08")`:

# JSON

Expand Down Expand Up @@ -452,7 +473,8 @@ Formats:
- `base58`
- `base62`
- `base64`,
- `base64url`,
- `base64url`: url safe without pad
- `zlib2base64url`: zlib then base64url, good for online diagram service, such as [PlantUML](https://plantuml.com/), [Kroki](https://kroki.io/)
- `url`,
- `hex-base64`,
- `hex-base64url`,
Expand Down Expand Up @@ -483,9 +505,20 @@ Algorithms:
- hmac: `hmac("HmacSHA256","your-secret-key", $1)` or `hmac("HmacSHA512","your-secret-key", $1)`
- jwt: `jwt("HS256","your-secret-key", arr)`. algorithm: `HS256`, `HS384`, `HS512`.
- dejwt: `dejwt("your-secret-key", token)`.
- encrypt: `encrypt("aes-128-cbc", "Secret Text", "your_pass_key")` Now only `aes-128-cbc` and `aes-128-gcm` support
-

encrypt: `encrypt("aes-128-cbc", "Secret Text", "your_pass_key")`, `encrypt("aes-256-gcm", "Secret Text", "your_pass_key","iv")`

- encrypt: `decrypt("aes-128-cbc", "7b9c07a4903c9768ceeeb922bcb33448", "your_pass_key")`

Explain for `encrypt` and `decrypt`:

* mode — Encryption mode. now only `aes-128-cbc`, `aes-256-cbc`, `aes-128-gcm`, `aes-256-gcm` support
* plaintext — Text that need to be encrypted.
* key — Encryption key. `16` bytes(16 ascii chars) for `128` and `32` bytes(32 ascii chars) for `256`.
* iv — Initialization vector. Required for `-gcm` modes, optional for others: hex string with `12` random bytes, such
as `e2af9567c7454bce4437dd97`.

# KV

Key/Value Functions:
Expand Down Expand Up @@ -598,9 +631,12 @@ date/time array:

### File

- read file into text: `read_all(file_path)`
- read file into text: `read_all(file_path)`, `read_all("https://example.com/text.gz")`
- write text info file: `write_all(file_path, text)` Replace if file exits.

**Tips**: `read_all` function uses [OneIO](github.com/bgpkit/oneio), and remote(https or ftp) and compressions(
gz,bz,lz,xz) are supported.

### getline

Please visit: https://www.gnu.org/software/gawk/manual/html_node/Getline.html
Expand All @@ -613,11 +649,17 @@ and http://awk.freeshell.org/AllAboutGetline
- dump: `var_dump(name)`,
- logging: `log_debug(msg)`, `log_info()`, `log_warn()`, `log_error()`

**Attention**: dump/logging output will be directed to std err to avoid std output pollution.

### Reflection

- `isarray(x)`,
- `typeof(x)` https://www.gnu.org/software/gawk/manual/html_node/Type-Functions.html

### zawk

- `version()`: return zawk version

# Credits

thanks to:
Expand Down
7 changes: 7 additions & 0 deletions justfile
Original file line number Diff line number Diff line change
Expand Up @@ -233,4 +233,11 @@ run-starts-with:
run-encrypt:
cargo run --package zawk --bin zawk -- 'BEGIN{ print encrypt("aes-128-cbc","Hello World", "123456") }' demo.txt

run-variant:
cargo run --package zawk --bin zawk -- 'BEGIN{ print variant("week(5)")["value"] }' demo.txt

run-flags:
cargo run --package zawk --bin zawk -- 'BEGIN{ print flags("{vip,top10}")["top10"] }' demo.txt

run-version:
cargo run --package zawk --bin zawk -- 'BEGIN{ print version() }' demo.txt
Loading

0 comments on commit dd8ee91

Please sign in to comment.