Implement readUser on Windows #3248

gabriel-samfira · 2022-11-01T13:25:48Z

This change allows us to parse the SAM registry hive to fetch the information associated with the user we set in the USER stanza. The Security Accounts Manager (SAM) file is a registry hive that holds Windows users, password hashes, information about the user (last login, full name, etc). In essence, it is a combination of /etc/passwd, /etc/shadow and last log.

Parsing this file allows us to properly set permissions on files and folders we create when we use ADD, COPY, WORKDIR, etc when USER contains a user that does not have a Well Known SID like ContainerAdministrator or ContainerUser. This also removes Windows as a special case in a number of code paths.

This means we can create new users in containers using:

RUN net user example /ADD /ACTIVE:yes

Then use that user instead of the two built in ones.

Sadly, the format of the SAM registry hive is quite obscure. I was unable to find proper documentation on how to parse it, aside from an old article that is no longer available outside of the web archive project.

https://web.archive.org/web/20190423074618/http://www.beginningtoseethelight.org:80/ntsecurity/index.htm

Depends on: tonistiigi/fsutil#138

Signed-off-by: Gabriel Adrian Samfira [email protected]

This change allows us to parse the SAM registry hive to fetch the user information associated with the user we set in the USER stanza. This is the Windows equivalent of parsing the /etc/passwd file. Unfortunately the format of that hive is obscure. Signed-off-by: Gabriel Adrian Samfira <[email protected]>

tonistiigi · 2022-11-01T23:18:54Z

ping @TBBle

TBBle · 2022-11-05T16:37:41Z

I'm not really in a position to review this properly, but the principle seems sound. The only other way I can see of making this work would be to start the container and execute a command to dump the relevant user SID.

Having had a flick through, I see at least one spot that still has a debugging path (GetUserInfoFromOfflineSAMHive doesn't use samPath), but this is a draft pending tonistiigi/fsutil#138, so that's not important right now.

Having also had a flick through the linked reference doc, I was wondering why the decision was made to walk the SAM for all users and then look for a match, rather than attempting to fetch the desired name from the Names subkey, and then the relevant numbered key. Do we have a use-case for "list all known users in the image"? The Linux version works this way because /etc/passwd etc is not structured, but the Windows SAM is.

My only concern (and I assume this flow has been tested) is whether the WCIFS local-mounting actually does the right thing with the registry hives from a multi-layer image. I would have assumed the differencing-registry setup was handled by the container execution environment, but if this code works for the described use-case, then my assumption is wrong and WCIFS is setting it up as well, which is nice. This source suggests that the layers are separately stored in the layer images, so maybe the Offline Registry Library itself understanders layered hives?

gabriel-samfira · 2022-11-05T20:42:49Z

The only other way I can see of making this work would be to start the container and execute a command to dump the relevant user SID.

That was my first option, but it meant building a helper binary which works in every Windows container (the Current() function in os/user doesn't work on nanoserver - missing netapi32.dll) which we needed to add into every container to fetch the SID, then remove it. Felt brittle and hacky. I cannot argue that parsing an undocumented hive is any better, but the info is there, and we can fetch it without much of a fuss.

Having also had a flick through the linked reference doc, I was wondering why the decision was made to walk the SAM for all users and then look for a match, rather than attempting to fetch the desired name from the Names subkey, and then the relevant numbered key.

The Names subkey is just a list of user names. While it is an ordered list of names that you could probably match to the proper user one level up HKLM:\SAM\Domains\Account\Users, You'd still have to loop through that list to look for the name, then go back and parse the V value associated with the proper user.

This is what the user listing looks like in the SAM hive:

The hex value you see under Name is in fact the RID of the user. The V value holds all the information about that user, including the username, SID (well, for everyone except the Guest account it seems), password hash, last login attempt, etc. We only care about the username and the SID, though.

I considered parsing Names first, then matching it to the proper key and only parsing the V value just for that one user, but I didn't feel it added much value. I also thought that if at some point we may need to do that for more than one user, we'd have to open the SAM hive and parse it for every user we care about. This way we only have to read the SAM hive once.

There are not many local users inside a typical container, nor on any single instance for that matter. Fetching them all didn't seem too much of a waste of cycles.

My only concern (and I assume this flow has been tested) is whether the WCIFS local-mounting actually does the right thing with the registry hives from a multi-layer image. I would have assumed the differencing-registry setup was handled by the container execution environment, but if this code works for the described use-case, then my assumption is wrong and WCIFS is setting it up as well, which is nice. This source suggests that the layers are separately stored in the layer images, so maybe the Offline Registry Library itself understanders layered hives?

That is the best part. The layered keys seem to be activated when the container layer is activated. I locally rebased your two PRs: containerd/containerd#4419 and microsoft/hcsshim#901, applied the changes I proposed for both: TBBle/containerd#1 and TBBle/hcsshim#1.

With the changes in this PR, plus some quick and nasty hacks in the containerd executor of buildkit, I managed to build an image on Windows using the following Dockerfile:

FROM mcr.microsoft.com/windows/nanoserver:ltsc2022

USER ContainerAdministrator

RUN net user gabriel /ADD /ACTIVE:yes

USER gabriel
WORKDIR C:\\workdir

And the result is as follows:

The WORKDIR stanza prompted buildkit to create the folder and change the owner. The SID corresponds to the SID for the gabriel user inside the image, which was created above.

There is still a lot of work to do in various parts of buildkit, but most things seem to be minor.

TBBle · 2022-11-06T11:54:01Z

The Names subkey is just a list of user names. While it is an ordered list of names that you could probably match to the proper user one level up HKLM:\SAM\Domains\Account\Users, You'd still have to loop through that list to look for the name, then go back and parse the V value associated with the proper user.

My understanding from the linked doc was that the Names subkey is a list of keys, named for the user with the @ value then being the hex key name from the level up for that name, hence trading any loop-through in favour of an extra indirection.

No particularly strong feelings either way about it, I agree that container images are unlikely to have so many users that O(N)/2 will be observably worse that 2*O(1).

gabriel-samfira · 2022-11-06T12:02:35Z

This is what the contents of the Names subkey looks like:

That article is old. Really old. Things may have changed since then. Luckily what we need has stayed the same since the Windows XP era 😄 .

Edit: To clarify, there does not seem to be an @ value:

PS C:\Users\Administrator\work\playground> Get-ItemProperty HKLM:\external\SAM\Domains\Account\Users\Names\gabriel -Name "@"
Get-ItemProperty : Property @ does not exist at path
HKEY_LOCAL_MACHINE\external\SAM\Domains\Account\Users\Names\gabriel.
At line:1 char:1
+ Get-ItemProperty HKLM:\external\SAM\Domains\Account\Users\Names\gabri ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidArgument: (@:String) [Get-ItemProperty], PSArgumentException
    + FullyQualifiedErrorId : System.Management.Automation.PSArgumentException,Microsoft.PowerShell.Commands.GetItemPr
   opertyCommand

PS C:\Users\Administrator\work\playground> Get-ItemPropertyValue HKLM:\external\SAM\Domains\Account\Users\Names\gabriel -Name "@"
Get-ItemPropertyValue : Property @ does not exist at path
HKEY_LOCAL_MACHINE\external\SAM\Domains\Account\Users\Names\gabriel.
At line:1 char:1
+ Get-ItemPropertyValue HKLM:\external\SAM\Domains\Account\Users\Names\ ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidArgument: (:) [Get-ItemPropertyValue], PSArgumentException
    + FullyQualifiedErrorId : Argument,Microsoft.PowerShell.Commands.GetItemPropertyValueCommand

PS C:\Users\Administrator\work\playground> ls HKLM:\external\SAM\Domains\Account\Users\Names\gabriel\
PS C:\Users\Administrator\work\playground>

gabriel-samfira · 2022-11-21T15:28:50Z

I've been thinking about this PR and I think we'll end up defining a reexec utility in buildkit, and using that to fetch the SID from a container. It's not as clean as just reading a file, but parsing undocumented data structures is generally frowned upon.

Will leave this as a draft for now and focus on getting the containerd worker executor code working. We need to do that anyway, and we can use it to get the SID using documented APIs, even if it means we create a temporary container.

As a side note, it seems that HCS does not carry ownership information from layer to layer. Only DACLs are preserved. The owner of the file is reset to the built-in Administrators group. We still need the SID to create proper ACLs, and at some point in the future when HCS will preserve ownership info, we'll already have that as well.

gabriel-samfira · 2023-02-06T17:00:57Z

Closing this as it was replaced by #3518

gabriel-samfira marked this pull request as draft November 1, 2022 13:26

gabriel-samfira force-pushed the implement-read-user-on-windows branch from 46ec44e to 5a932cc Compare November 1, 2022 13:31

TBBle mentioned this pull request Nov 5, 2022

Use offline registry library to generate minimum hive TBBle/hcsshim#1

Closed

This was referenced Jan 18, 2023

[Windows] Add get-user-info utility #3516

Merged

[Windows] Implement executor on Windows #3517

Merged

gabriel-samfira closed this Feb 6, 2023

crazy-max added the area/windows label Feb 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement readUser on Windows #3248

Implement readUser on Windows #3248

gabriel-samfira commented Nov 1, 2022

tonistiigi commented Nov 1, 2022

TBBle commented Nov 5, 2022

gabriel-samfira commented Nov 5, 2022

TBBle commented Nov 6, 2022 •

edited

Loading

gabriel-samfira commented Nov 6, 2022 •

edited

Loading

gabriel-samfira commented Nov 21, 2022

gabriel-samfira commented Feb 6, 2023

Implement readUser on Windows #3248

Implement readUser on Windows #3248

Conversation

gabriel-samfira commented Nov 1, 2022

tonistiigi commented Nov 1, 2022

TBBle commented Nov 5, 2022

gabriel-samfira commented Nov 5, 2022

TBBle commented Nov 6, 2022 • edited Loading

gabriel-samfira commented Nov 6, 2022 • edited Loading

gabriel-samfira commented Nov 21, 2022

gabriel-samfira commented Feb 6, 2023

TBBle commented Nov 6, 2022 •

edited

Loading

gabriel-samfira commented Nov 6, 2022 •

edited

Loading