-
-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document and give example for alsw implementation #38
Comments
ALSW 利用一个 hash 函数和一个 offset 函数实现 ALSW 有两个功能:check 和 decode,各需要两个长度为 16 的查找表,例如 ALSW 的 check 功能用于判断输入字符是否有效,当且仅当输出小于 0x80 时,输入有效。 这是用于 debug 的测试函数,可以在终端打印两张带有颜色的映射表,填写 check_hash 和 decode_hash 函数时可以根据映射表调整数值。 simd/crates/base64-simd/src/alsw.rs Lines 115 to 137 in 9baa5bf
用 Base32Crockford 举例,其中 check_hash 未经调整 struct Base32CrockfordAlsw;
impl Base32CrockfordAlsw {
#[inline]
const fn decode(c: u8) -> u8 {
match c {
b'0'..=b'9' => c - b'0',
b'A'..=b'H' => c - b'A' + 10,
b'J'..=b'K' => c - b'J' + 18,
b'M'..=b'N' => c - b'M' + 20,
b'P'..=b'T' => c - b'P' + 22,
b'V'..=b'Z' => c - b'V' + 27,
_ => 0xff,
}
}
#[inline]
const fn check_hash(i: u8) -> u8 {
match i {
0x0 => 1,
0x1 => 1,
0x2..=0x7 => 6,
0x8..=0xA => 1,
0xB..=0xF => 7,
_ => unreachable!(),
}
}
#[allow(dead_code)]
#[inline]
const fn decode_hash(i: u8) -> u8 {
Self::check_hash(i)
}
}
vsimd::impl_alsw!(Base32CrockfordAlsw); #[cfg(feature = "std")]
#[test]
#[ignore]
fn debug_crockford_alsw_check() {
let hash = &Base32CrockfordAlsw::CHECK_HASH;
let offset = &Base32CrockfordAlsw::CHECK_OFFSET;
let is_primary = |c: u8| Base32CrockfordAlsw::decode(c) != 0xff;
vsimd::tools::print_fn_table(is_primary, |c: u8| vsimd::alsw::hash(hash, c));
vsimd::tools::print_fn_table(is_primary, |c: u8| vsimd::alsw::check(hash, offset, c));
} 执行命令 (部分由 vscode rust-analyzer 插件生成)
红色代表字符集位置,绿色代表小于 0x80,蓝色代表不小于 0x80。 decode 同理,但对第二张表的要求更低,红色位置的数值全部正确即可。 更改函数后保存,上面命令中的 目前只能人工调整 check_hash 和 decode_hash,我还没有找到根据字符集生成这两个函数的方法。 |
其实初版 base32-simd 计划支持 Base32 Crockford,后来移除了。 |
thank you @Nugine , that was very informative. I'll give this a try. But afaik, this is basically a brute force process to find the matching hash method. The prints are helpful to identify which of the 16 values can affect the result, but a hand tweaking process may take forever. Is there any technique on how to pick the value of the hash, and is there a typical range of each value? |
先要熟悉一下 ALSW 的计算机制。 可以根据 base64 的数值分配来找找规律 simd/crates/base64-simd/src/alsw.rs Lines 115 to 137 in 9baa5bf
hash 后的结果通常会将红色区域分割成横向连续的几条,列边界处会垂直分割区域。check 结果是按这样的横条分配的。 此外就靠找规律、猜测和数字敏感性了。时间长了我也记不住有什么方法了。 由于 ALSW 的查找表仅由字符集决定,这样的调整过程是一次性的,我觉得一定存在一个确定的算法来计算查找表或者判定无法计算,但还没想到构造方法。 |
Hi Nugine,
I'm planning to add a new charset support to simd-base32, and is stuck with the alsw decoder implementation.
Given that the charset is:
I have the decode function as:
But it is unclear to me how the check_hash function is implemented. Would you please add some documentation to the implementation and the verification of such functions?
Thanks!
The text was updated successfully, but these errors were encountered: