Correct Base64 regular expression #4040

0xThiebaut · 2025-01-29T17:48:12Z

The current Base64 regular expression is incorrect:

It fails to recognize Base64 strings longer than 1 character without padding (e.g., TlZJU08h)
It recognizes invalid Base64 (e.g., TlZJU0=)

This PR corrects the Base64 regular expression, subsequently ensuring the UI correctly handles Base64 strings.

scudette · 2025-01-30T01:48:28Z

this is a complicated problem actually. The problem is that base64 is not really unique enough to reliably detect.

We can make a couple of tests;

Try to decode the string itself as base64 - this will ultimately give all valid base64 encoded strings.
Use a rough regex to guess which is more likely to be a base64 encoding.

Both choices above can have false positive (i.e. they claim a string is base64 encoded which while it technically is, is not meaningful because the string accidentally also has a base64 decoding). A false negative is a string which is a valid base64 string but we dont recognize it as such.

For option 1 above we wont have any false negatives (because if the string can be decoded without raising an exception then its ok). But method 1 above actually raises the bar for false positives.

For example the regex you have matches things like "name", "aced" and basically any 4 letter string. This will mess up the UI even worse for non-base64 strings.

There is probably no optimal solution though because we really dont know the type of the string - maybe we leave it up to the user (e.g. via a context menu) to allow decoding on a case by case basis?

0xThiebaut · 2025-01-30T02:04:01Z

Good point, I indeed hadn't considered the false-positive cases.

Given for the UI base64-decoding mostly makes sense if the output is printable, could we consider combining option 1 with an additional check that the resulting output is a printable string? This would avoid many of the false-positives such as name.

scudette · 2025-01-30T02:08:41Z

Yes that seems like a reasonable compromise. It might not add much in practice because usually base64 is applied to json binary byte sequences (if they were plain string they would be already printable). Specifically in Golang a []byte type will always encode to base64 when emitted into JSON regardless of the data being printable or not.

Maybe leave the current regex which is pretty good at picking things which definitey look like base64 and also combine a try/catch based test with a printable regex test.

Nothing will be perfect but maybe this is better.

0xThiebaut · 2025-01-30T02:24:31Z

Closing this one, I'll consider reopening one with the combined base64 & printable heuristic checks.

It might not add much in practice because usually base64 is applied to json binary byte sequences.

The false-negative issue is one we've been repeatedly encountering in our registry-based hunts due to REG_BINARY data getting base64-encoded; Although the blame lays in software not using REG_MULTI_SZ or REG_SZ ;)

Correct Base64 regular expression

00f5409

0xThiebaut closed this Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct Base64 regular expression #4040

Correct Base64 regular expression #4040

0xThiebaut commented Jan 29, 2025 •

edited

Loading

scudette commented Jan 30, 2025

0xThiebaut commented Jan 30, 2025

scudette commented Jan 30, 2025

0xThiebaut commented Jan 30, 2025

Correct Base64 regular expression #4040

Correct Base64 regular expression #4040

Conversation

0xThiebaut commented Jan 29, 2025 • edited Loading

scudette commented Jan 30, 2025

0xThiebaut commented Jan 30, 2025

scudette commented Jan 30, 2025

0xThiebaut commented Jan 30, 2025

0xThiebaut commented Jan 29, 2025 •

edited

Loading