Possible fix for extended ASCII issue (#490) #666

kirk-sayre-work · 2021-03-09T20:33:58Z

This is a potential fix for #490 . Based on the behavior of oledump and some VBA payload decoders that decode extended ASCII strings it looks like VBA code bytes should be left unmodified after they have been decompressed, so no unicode string conversion. The unicode conversion attempts to do something smart and useful with the "special" Office extended VBA characters, which results in these single byte extended ASCII characters being converted to multi-byte unicode characters, which breaks the VBA payload decoders.

Additionally, if you loop through a string with extended ASCII characters in VBA, pull out each character with Mid(), and then print the characters with Debug.Print the original single byte extended ASCII character is printed (i.e. what you get from the initial decompressed VBA stream prior to unicode conversion).

decalage2 · 2021-03-10T20:40:02Z

It is done on purpose to convert VBA source code to unicode for Python 3, and to UTF-8 for Python 2, so that we always get the VBA code as native str whatever the Python version. Also for Python 2 I chose to convert to UTF-8 so that any application calling olevba through its API always gets a byte string with a known encoding. If we keep raw bytes instead, then it can be encoded with any code page, and it's hard to handle it properly from calling applications.
So, do you have specific samples I could try, to understand what the issue is with the unicode conversion? (maybe it's due to a code page that is not properly handled by python, or my code has a bug) I see there is one sample mentioned in #490, do you have others?
Or if you have a specific need to get the raw bytes instead of the Unicode/UTF-8 version, I can look at the API to make it easier to get.

kirk-sayre-work · 2021-03-10T23:27:59Z

Here are some example ITW maldocs from the last 7 days or so:

https://bazaar.abuse.ch/sample/b7153cc8f00e1f39c16da557acb8a43f57eed55a371674e995a8ac808e047ab4/
https://bazaar.abuse.ch/sample/c84b1478fdf53dd00791f5ceb8e7744493964c363a8c02ea4f24600dab28fb83/
https://bazaar.abuse.ch/sample/f451591b470a934d1ef08937d9009f19e1d426651d87603d3cded34b54c53b6c/

The VBA macros build an encoded payload string with a series of single character string concatenations and then decode the payload shell command. Several of the characters in the encoded payload string are "special" VBA extended ASCII characters. The decode loop works in ViperMonkey if the raw byte values from the decompressed VBA are provided by olevba. I tried to implement a mapping in ViperMonkey from the unicode translation of these extended ASCII characters back to the original extended ASCII value, but Python does not map these raw byte values to unicode in a nice predictable way.

A flag to the VBA_Parser() constructor telling olevba to return raw code strings would work just fine for ViperMonkey usage.

decalage2 · 2021-03-11T07:31:41Z

Indeed it's a tricky issue, due to the fact that MS Office and Python do not handle some extended ASCII codes the same way for code page 1252 (and potentially other code pages): for Python the cp1252 codec treats them as undefined and cannot convert them to unicode, while MS Office (actually Windows) treats them as special control codes. I am still investigating the issue to find a proper solution, all my findings are documented in issue #490.

I will see how to improve the API so that you can get the raw bytes directly. This should work fine for western code pages such as 1252, but you may have other issues with more exotic code pages... This is why I converted everything to unicode in the first place. :-)

kirk-sayre-work added 3 commits March 8, 2021 16:38

Correctly report extended ASCII characters in strings.

620c0d8

Target raw dtring usage to just decompressed VBA code.

15ea944

Removed debug print statement.

6bd82b2

decalage2 self-requested a review March 10, 2021 20:40

decalage2 self-assigned this Mar 10, 2021

decalage2 added 🐛 bug olevba labels Mar 10, 2021

Fix some bad characters in decoded VBA.

49473da

decalage2 added this to the Next Release milestone Jun 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible fix for extended ASCII issue (#490) #666

Possible fix for extended ASCII issue (#490) #666

kirk-sayre-work commented Mar 9, 2021

decalage2 commented Mar 10, 2021

kirk-sayre-work commented Mar 10, 2021 •

edited

Loading

decalage2 commented Mar 11, 2021 •

edited

Loading

Possible fix for extended ASCII issue (#490) #666

Are you sure you want to change the base?

Possible fix for extended ASCII issue (#490) #666

Conversation

kirk-sayre-work commented Mar 9, 2021

decalage2 commented Mar 10, 2021

kirk-sayre-work commented Mar 10, 2021 • edited Loading

decalage2 commented Mar 11, 2021 • edited Loading

kirk-sayre-work commented Mar 10, 2021 •

edited

Loading

decalage2 commented Mar 11, 2021 •

edited

Loading