Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct Base64 regular expression #4040

Closed
wants to merge 1 commit into from

Conversation

0xThiebaut
Copy link
Contributor

@0xThiebaut 0xThiebaut commented Jan 29, 2025

The current Base64 regular expression is incorrect:

  • It fails to recognize Base64 strings longer than 1 character without padding (e.g., TlZJU08h)
  • It recognizes invalid Base64 (e.g., TlZJU0=)

This PR corrects the Base64 regular expression, subsequently ensuring the UI correctly handles Base64 strings.

@scudette
Copy link
Contributor

this is a complicated problem actually. The problem is that base64 is not really unique enough to reliably detect.

We can make a couple of tests;

  1. Try to decode the string itself as base64 - this will ultimately give all valid base64 encoded strings.
  2. Use a rough regex to guess which is more likely to be a base64 encoding.

Both choices above can have false positive (i.e. they claim a string is base64 encoded which while it technically is, is not meaningful because the string accidentally also has a base64 decoding). A false negative is a string which is a valid base64 string but we dont recognize it as such.

For option 1 above we wont have any false negatives (because if the string can be decoded without raising an exception then its ok). But method 1 above actually raises the bar for false positives.

For example the regex you have matches things like "name", "aced" and basically any 4 letter string. This will mess up the UI even worse for non-base64 strings.

There is probably no optimal solution though because we really dont know the type of the string - maybe we leave it up to the user (e.g. via a context menu) to allow decoding on a case by case basis?

@0xThiebaut
Copy link
Contributor Author

Good point, I indeed hadn't considered the false-positive cases.

Given for the UI base64-decoding mostly makes sense if the output is printable, could we consider combining option 1 with an additional check that the resulting output is a printable string? This would avoid many of the false-positives such as name.

@scudette
Copy link
Contributor

Yes that seems like a reasonable compromise. It might not add much in practice because usually base64 is applied to json binary byte sequences (if they were plain string they would be already printable). Specifically in Golang a []byte type will always encode to base64 when emitted into JSON regardless of the data being printable or not.

Maybe leave the current regex which is pretty good at picking things which definitey look like base64 and also combine a try/catch based test with a printable regex test.

Nothing will be perfect but maybe this is better.

@0xThiebaut
Copy link
Contributor Author

Closing this one, I'll consider reopening one with the combined base64 & printable heuristic checks.

It might not add much in practice because usually base64 is applied to json binary byte sequences.

The false-negative issue is one we've been repeatedly encountering in our registry-based hunts due to REG_BINARY data getting base64-encoded; Although the blame lays in software not using REG_MULTI_SZ or REG_SZ ;)

@0xThiebaut 0xThiebaut closed this Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants