Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PowerShell 5.1 UTF-16LE output not recognized by rdump as a valid input for RecordStreamReader #138

Open
DevJoost opened this issue Sep 5, 2024 · 1 comment

Comments

@DevJoost
Copy link

DevJoost commented Sep 5, 2024

Whilst I understand most of us don't use PowerShell when executing (advanced) Dissect commands in correspondence with rdump, it is currently not (by default) possible to use any records originating from a PowerShell 5.1 process (stdin or as a file) as an input for rdump.

Unfortunately, PowerShell (version 5.1, by default installed on all Windows machines) outputs data in UTF-16LE. This is not visible in the command prompt but provides issues when piping records to rdump or using records you just stored as a file. The UTF-16LE data as an input is not recognized and dealt with accordingly by the RecordStreamReader class.

Just a side note and if your experiencing the same issue, cmd.exe and PowerShell version 6 and 7 by default do output data in UTF-8 and therefore do not experience this issue.

UTF16-LE bug

@yunzheng
Copy link
Member

I've tested this a bit on Windows 10 (PowerShell 5.1), and it looks that the output is UTF-16-LE, but still mangled for some bytes. So decoding it using UTF-16-LE will not get back the original raw bytes.

Examples dumped using the examples/records.json.

I'm using -w - to force writing a RecordStream and > to simulate the pipe:

# Using windows cmd.exe:
C:\Users\user>rdump records.json -w - > cmd-redirect.records

# Using powershell 5.1:
PS C:\Users\user> rdump records.json -w - > ps-redirect.records

HexDumps:

$ xxd cmd-redirect.records | head -n 3
00000000: 0000 000f c40d 5245 434f 5244 5354 5245  ......RECORDSTRE
00000010: 414d 0a00 0000 81c7 7e0e 9202 92aa 7465  AM......~.....te
00000020: 7874 2f70 6173 7465 9792 a673 7472 696e  xt/paste...strin
$ xxd ps-redirect.records | head -n 3
00000000: fffe 0000 0000 0000 0f00 0025 0d00 0a00  ...........%....
00000010: 5200 4500 4300 4f00 5200 4400 5300 5400  R.E.C.O.R.D.S.T.
00000020: 5200 4500 4100 4d00 0d00 0a00 0000 0000  R.E.A.M.........

The ps-redirect.records has a BOM marker, but even stripping that and decoding the data using utf-16-le will not get back the original raw header:

In [14]: open("cmd-redirect.records", "rb").read()[:20]
Out[14]: b'\x00\x00\x00\x0f\xc4\rRECORDSTREAM\n\x00'

In [15]: open("ps-redirect.records", "rb").read()[2:50].decode("utf-16-le")
Out[15]: '\x00\x00\x00\x0f─\r\nRECORDSTREAM\r\n\x00\x00\x00'

You can see there are some \r added in by PowerShell and the 0xc4 is just weirdly encoded. I don't see a way to get back the original raw bytes.

We can add some basic detection and raise a warning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants