Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ldms streams fails to implement authorization #1527

Open
morrone opened this issue Nov 19, 2024 · 28 comments
Open

ldms streams fails to implement authorization #1527

morrone opened this issue Nov 19, 2024 · 28 comments
Labels
bug V4 Version 4 v5-topic Topic for design of version 5

Comments

@morrone
Copy link
Collaborator

morrone commented Nov 19, 2024

Could someone please explain how ldms streams authorization works? I'm having trouble finding anything about it in documentation.

@morrone
Copy link
Collaborator Author

morrone commented Nov 21, 2024

Here is what I understand:

We need to keep in mind that authentication only tells us who a user is. It does not tell us what they are authorized to do.

Any user that can authenticate with an ldmsd has free access to publish and/or subscribe to anything in streams that they wish. So in common deployments where munge is used to authenticate users on a cluster and in ldmsd, all cluster users will have free reign to publish and access any streams they wish. A user's stream data will not be isolated from other users.

Many ldmsd running in deployments are running as root, because samplers require root access to gather their data.

Right now those same root ldmsd can have streams data published to them from any authenticating user (typically any user on the system). This is poor practice, because we should limit to our greatest possible extent the amount of data that can be injected into the daemon by users. The more we allow rusers to push data into the root daemon, the more likely we are to have the ldmsd exploited in the future. Best practice in most cases would be to avoid enabling streams when the ldmsd is running as a privileged user. Separate non-privileged ldmsd should be used for handling user data streams.

Next, there is no isolation of data between the users using the streams service. Anyone on the cluster can access anything. We need to protect a user's data, and only expose it to the other users explicitly permitted to do so.

So in the longer term, we need to design and implement a proper authorization system for ldms streams.

But in the near term we must:

  • Add an ldmsd configuration option to enable/disable ldms streams.
  • Make the default disabled.
  • Clearly document the implications of enabling streams. Something like "WARNING: All authenticated users are permitted to publish anything to any stream, and may also subscribe to any stream. There is no data protection after authentication. In most cases, it is also best practice to avoid enabling streams when the ldmsd runs as a privileged user."

@morrone morrone changed the title How does ldms steams authorization work? ldms streams fails to implement authorization Nov 21, 2024
@morrone morrone added bug V4 Version 4 v5-topic Topic for design of version 5 labels Nov 21, 2024
@morrone morrone modified the milestones: v4.4.5, v4.5.1 Nov 21, 2024
@tom95858
Copy link
Collaborator

@morrone, all of these issues are resolved in 4.5.1 (top of tree). Just an FYI. And I think you statements above are a bit too strong, but let's talk about it and decide what to do with 4.4.5. I like the idea of the stream disable command.

@morrone
Copy link
Collaborator Author

morrone commented Nov 22, 2024

If I misunderstand something about the implementation I am happy to make corrections. But if my understanding is correct, I think it is important that we are clear and direct about the security implications.

Can you give pointer to where the issues are resolved on main?

@tom95858
Copy link
Collaborator

tom95858 commented Nov 22, 2024 via email

@morrone
Copy link
Collaborator Author

morrone commented Nov 22, 2024

Ok, it looks like only main has ldms_stream.[ch], while both have ldmsd_stream.[ch]. On first skim of the code, it looks main broke out some of the ldmsd_stream.[ch] content into ldms_stream.[ch] and then expanded the functioality.

There are virtually no code comments, and I haven't found much in the way of other documentation yet. It looks like ldms_stream.h is missing function declarations that should probably be there too. Tomorrow I'll see if I can figure out what what is going on.

@tom95858
Copy link
Collaborator

Ok, it looks like only main has ldms_stream.[ch], while both have ldmsd_stream.[ch]. On first skim of the code, it looks main broke out some of the ldmsd_stream.[ch] content into ldms_stream.[ch] and then expanded the functioality.

It broke out all of the capability and added more. The only reason that ldmsd_stream still exist is to make the configuration compatible with 4.4.5. If we don't want that, we can completely remove ldmsd_xxxx version of streams. And I'm beginning to think that is a good idea

There are virtually no code comments,

They are written in C. ;- Point taken.

... and I haven't found much in the way of other documentation yet.

I think we should review the documentation for both main and b4.4.

... It looks like ldms_stream.h is missing function declarations that should probably be there too. Tomorrow I'll see if I can figure out what what is going on.

@narategithub could you take a look at this?

@narategithub
Copy link
Collaborator

ldmsd_strea.[ch] are there so that top-of-tree ldmsd can still process old ldmsd_stream messages from old ldmsd. Going forward, all application should use ldms_stream.

If you are an application, just include ldms.h will provide all functions you need to interact with ldms_stream.

To publish to the peer, after connected, simply call ldms_stream_publish() (see https://github.com/ovis-hpc/ldms/blob/main/ldms/src/core/ldms.h#L1376) to publish the data.

To subscribe for data, there are two steps:

I agree that we should have a man page for this, with examples (both Python and C).

Example of simple stream client w/ Python: https://github.com/ovis-hpc/ldms-test/blob/master/python/ldms_stream_client.py

Simple example of stream publisher:

from ovis_ldms import ldms 
x = ldms.Xprt(name = "sock", auth = "munge")
x.connect("somehost", 411)
x.stream_publish("mystreamname", "some_stream_data") 

@tom95858
Copy link
Collaborator

@narategithub I think @morrone is talking about configuration documentation. Do we have that?

@narategithub
Copy link
Collaborator

We have prdcr_subscribe config command. It has its help string, e.g.

(not connected)> ?prdcr_subscribe

        Subscribe for stream data from all matching producers
        Parameters:
        regex=     A regular expression matching PRODUCER names
        stream=    The stream name or regular expression
        [rx_rate=] The recv rate (bytes/sec) limit for the matching streams. The
                   default is -1 (unlimited).

and another description in ldmsd_controller.man:

   Subscribe to a stream on matching producers
       prdcr_subscribe attr=<value>

              regex PRDCR_REGEX
                     A regular expression matching PRODUCER names

              stream STREAM_NAME_OR_REGEX
                     The stream name or regular expression

              [rx_rate BYTES_PER_SECOND]
                     The recv rate (bytes/sec) limit for the matching streams. The default is -1 (unlimited).

@morrone
Copy link
Collaborator Author

morrone commented Nov 23, 2024

@narategithub I think @morrone is talking about configuration documentation. Do we have that?

Its really everything. A basic explanation of what streams is and how it functions, an explanation security model,

ldmsd_strea.[ch] are there so that top-of-tree ldmsd can still process old ldmsd_stream messages from old ldmsd. Going forward, all application should use ldms_stream.

If I remember correctly, on main the ldmsd_stream.c code was calling into the ldms_stream.c code, and the ldms_stream didn't function on its own. But maybe I'm missing something.

But this gets back to documentation. What the heck are "old streams" and "new streams"? How would a user know which they are using? Is there versioning somewhere? What changes in the model when we go from one to the other? What are the implications from mixing both in a running system?

If you are an application, just include ldms.h will provide all functions you need to interact with ldms_stream.

I have a lot of questions about the API that I'm not sure where to start. I'm not looking for you to answer them here, I am just giving examples of what I expect to be able to understand when I'm reading documentation:

* If \c x is \c NULL, publish to all subscribers. Otherwise, publish directly
 * to the peer of \c x.

I have no idea what "publish directly to the peer of x" means.

* \param x NULL for loopback, or a valid rail handle.

I have no idea what it would mean to publish to "loopback".

* \param stream_name The name of the stream.

What exactly is a stream? We seem to be able to publish messages without creating the stream first, correct? So Is "stream_name" really just a label applied to individual messages?

 * \param cred         The credential of the publisher. This can be \c NULL, and
 *                       the \c euid and \c egid are used.

Why are we able to set the credential of the publisher? Are we trusting the client to tell us who they are?
What is the "cred" used for? What effect does it have?

* \param perm The permission on the stream data.

So I assume this means that individual messages are permissioned rather than streams?

@morrone
Copy link
Collaborator Author

morrone commented Nov 23, 2024

Also, the streams stuff in ldms.h on main seems to maybe (if I'm understanding) be a mix of functions some of which are about running a streams service, and some that are about using a streams service, but there doesn't seem to be clear differentiation between the two in the form of the function names or the documentation. But I may just not understand how things work.

@tom95858
Copy link
Collaborator

tom95858 commented Nov 23, 2024 via email

@narategithub
Copy link
Collaborator

Summary of the current stream

I'll use bob and alice as users. I'm assuming ldmsd is running as root:root.

on b4.4:

  • bob can connect to ldmsd and publish anything on any stream. There is no UID/GID associated to the stream delivery event.
  • alice can connect to ldmsd but cannot subscribe to any stream. Only root can subscribe.

on current main

  • bob can connect to ldmsd and publish anything on any stream. Be nice, bob!
    • on main, bob can set permission bits (e.g. 0400 for self), if there are subscriber on ldmsd, only the subscriber clients that are allowed will get the data.
  • alice can connect to ldmsd and can subscribe to any stream, but alice will only get the allowed data (controlled by the publisher). For example, even though alice subscribed bob_stream, if bob published bob_stream with 0400, alice won't get the data.
  • root gets all the data, even though bob specified 0400.

@tom95858
Copy link
Collaborator

tom95858 commented Dec 5, 2024

On b4.4:

  • If we change the default to disabled
    • Streams data will stop flowing for all existing streams users including at least the following:
      • Sandia, Argonne, LBNL
    • The configuration command stream_enable would need to be added to their existing configuration scripts in order to allow streams data to be injected into the system
  • If we leave the default at enabled
    • Streams users apps will continue to work as expected without change to their configuration file
    • LLNL will have to add the command stream_disabled to all of their existing configuration files
    • This may be considered inadequate if these configuration files can somehow be changed by a user, but I believe that only @morrone modifies these files
    • There is concern that a non-root user can still inject data into the monitoring infrastructure even though a non-root user cannot subscribe to it
      • Note that a non-root user can write a script that uses ldmsd_controller to submit an unlimited number of configuration requests whether or not those requests are honored by the system. If this is a real issue, then we may want to consider rate-control guard rails for non-root users.
      • Note also that configuration requests are not forwarded up the monitoring chain, while streams data is.

@tom95858 proposed adding a configuration.ac option to set the default one way or the other. After some thought I believe this to be a bad idea, because then we would have versions named 4.4.5 that have fundamentally different streams behavior.

@tom95858 would like the group to consider removing the ldmsd_stream_xxx versions of the API altoghter from main. This would require existing streams users (that have not already done so) to modify their plugins to use the new API.

@morrone
Copy link
Collaborator Author

morrone commented Dec 6, 2024

Summary of the current stream

Thanks, this is a good start. I would like to see this expanded though, because it leaves me with lots of questions.

on current main

  • bob can connect to ldmsd and publish anything on any stream. Be nice, bob!

    • on main, bob can set permission bits (e.g. 0400 for self), if there are subscriber on ldmsd, only the subscriber clients that are allowed will get the data.
  • alice can connect to ldmsd and can subscribe to any stream, but alice will only get the allowed data (controlled by the publisher). For example, even though alice subscribed bob_stream, if bob published bob_stream with 0400, alice won't get the data.

As far as I can tell, each individual message has an attached "uid" "gid" and "perms". So it is not quite correct to say that a stream is published with a particular permission, right? Every individual message has its own authorization and permissions. Unless I am misunderstanding?

Which raises a bigger question in my mind: What exactly is a "stream". It sort of seems like we could have 20 different "streams" all with the same stream name, but differentiated by the permissions of individual messages.

And along the lines of "what is a stream?" How do streams even work? Is there any sort of resliliency to streams? When I subscribe to a stream, can I see the messages that I missed while I was off line? Or are messages in streams completely ephemeral, only being to delivered to the current set of subscribers and being discarded?

  • root gets all the data, even though bob specified 0400.

This line doesn't make much sense unless you also tell us the uid and gid fields associated with those perms.

@morrone
Copy link
Collaborator Author

morrone commented Dec 7, 2024

@morrone, all of these issues are resolved in 4.5.1 (top of tree).

As far as I can tell, these issues are not resolved on main (top of tree). The most basic thing I want as part of authorization in ldmsd streams is to control who can publish to which stream. A system administrator for ldmsd should be able to configure that in ldmsd. But that still appears not to be a thing that can be done in the code on main.

@tom95858
Copy link
Collaborator

tom95858 commented Dec 7, 2024

@morrone, please check __publish_cred_check on line ldms_stream.c:737. But I really don't think this is the most efficient way to go about this. We should schedule a meeting and go through your concerns, not listen to each one of them voiced here and then pointing you at a line of code in the source.

@tom95858
Copy link
Collaborator

tom95858 commented Dec 7, 2024

@morrone, as I have posted earlier, there should be no such thing as ldmsd streams in 4.5 (main), there are only ldms streams that perform many more additional checks.

@morrone
Copy link
Collaborator Author

morrone commented Dec 7, 2024

@morrone, as I have posted earlier, there should be no such thing as ldmsd streams in 4.5 (main), there are only ldms streams that perform many more additional checks.

I'm talking about whatever streams happen to be in main, whether they are called "ldmsd streams" or "ldms streams". I don't know why the "d" is something that you're insistent upon dropping from discussion. But I'll try to say "ldms streams in ldmsd" from now on when talking about main (even though, in point of fact, both still exist on main at my last check).

@tom95858
Copy link
Collaborator

tom95858 commented Dec 9, 2024

@morrone, Just to be clear, there are two different streams implementations. The original one is built on top of LDMSD configuration commands. LDMSD streams are just configuration requests that are forwarded through the infrastructure. It was built for expediency for solving a research problem. Its weaknesses were always known and therefore the LDMS Streams protocol and API were created.

LDMS streams, by contrast, are not built on configuration requests, they are built on new Lightweight Distributed Metric Service (LDMS) protocol messages. So in main, right now, we have two implementations, the one built on configuration commands, and the one in the low-level protocol. LDMSD is an application, LDMS is an interface and a protocol.

When I said "this is resolved on main", what I really meant is that these issues are resolved by LDMS Streams (which is only in main) and why I am proposing getting rid of the LDMSD Streams implementation and API.

All that said, this discussion has resulted in OGC finding some issues with LDMS streams if a Byzantine program were to build it's own messages and submit them without using the LDMS API.

My only point in "obsessing" over the "D" is to ensure that we are talking about the same API and protocol.

@morrone
Copy link
Collaborator Author

morrone commented Dec 9, 2024

these issues are resolved by LDMS Streams (which is only in main)

I kind of feel like we are going in circles here. If these issues are resolved by LDMS Streams on main, then it should be a fairly easy task to answer the two questions "a system administrator using ldmsd would likely have" in discussion #1548. If they are resolved, please share the way that they are resolved.

@tom95858
Copy link
Collaborator

tom95858 commented Dec 9, 2024

@morrone In the current main, in the LDMS API, there are checks on the publish side to ensure that the client is authorized to publish, and on the receive side, the credentials are checked. The receive side checks were added based on your concerns. We did a review of the code and realized that a Byzantine app that didn't use the LDMS API to connect and authenticate could bypass the publish check. @narategithub has added the receive side checks. At this point I would encourage us to review the code and test that we have in fact resolved this issue. I strongly urge that we remove the LDMSD version of the API from main. This will require most existing APPs to have to refactor their code, but I think it is necessary that they do so.

So unfortunately, I think we need to do one more loop :-)

@morrone
Copy link
Collaborator Author

morrone commented Dec 10, 2024

@morrone In the current main, in the LDMS API, there are checks on the publish side to ensure that the client is authorized to publish, and on the receive side, the credentials are checked. The receive side checks were added based on your concerns. We did a review of the code and realized that a Byzantine app that didn't use the LDMS API to connect and authenticate could bypass the publish check. @narategithub has added the receive side checks.

You say that it is checking to "ensure that the client is authorized to publish", but in point of fact it does not do that. The only thing it does is verify that the uid/gid in the message matches the authorization. That is just making sure that the authentication is not spoofed.

It is great that you're adding server side checks for authentication spoofing! That was definitely another concern that I had when skimming the code.

But that is really just part of the authentication; it does not provide publish-time authorization. We are still in the same boat as we were before: any authenticated user can publish to any stream.

Authorization means giving the ldmsd sysadmin a way to control who is allowed to publish to what. (And at subscribe time, control over who is allowed to subscribe to what).

So unfortunately, I think we need to do one more loop :-)

Maybe more than one. :) I don't think we're really on the same page yet about what authorization is.

@tom95858
Copy link
Collaborator

@narategithub could you please explain the publish side checks. I think there is still some confusion? No, anyone cannot publish to any stream using the LDMS API, but they can if they bypass the API and write their own messages.

@tom95858
Copy link
Collaborator

tom95858 commented Dec 10, 2024 via email

@narategithub
Copy link
Collaborator

@tom95858, I think what @morrone meant by "any" was that bob can publish any data to any stream as bob, and does not mean that bob can forge anything.

@morrone, correct me if I'm wrong, I think you want to restrict bob from publishing to some streams, or allow bob to publish to only certain streams. Our use cases so far do not need this capability. The receiver side decides what to do with the data from uid, gid, addr. Maybe ldms_stream.rst (a part of PR #1550) may help?

@morrone
Copy link
Collaborator Author

morrone commented Dec 10, 2024

@tom95858, I think what @morrone meant by "any" was that bob can publish any data to any stream as bob, and does not mean that bob can forge anything.

Right.

@morrone, correct me if I'm wrong, I think you want to restrict bob from publishing to some streams, or allow bob to publish to only certain streams.

Exactly.

The receiver side decides what to do with the data from uid, gid, addr.

Leaving it to subscribers to all individually implement authorization is just kicking the can down the road. It is going to require authorization to be reimplemented multiple times, and configured in multiple ways. It means that streams is not itself implementing authorization. It is just passing along authentication information so the subscribers can all, maybe, implement authorization themselves. And if every subscriber needs to reinvent the wheel, the odds that they'll all do it in different and inconsistent ways greatly increases.

Our use cases so far do not need this capability.

I think our use cases do already need this capability. Lets just consider job scheduling systems (slurm, flux, etc) reporting the current job_ids on a node. Lets say the scheduler is "flux" and the username of the flux process is "fluxuser". I would like a standard way in streams to say that only "fluxuser" can publish to stream "flux_jobid".

If streams doesn't implement authorization, then every consumer of the flux_jobid stream needs to individually implement those checks, and needs its own configuration. That seems like an unreasonable burden on bother software developers and system administrators.

And, just for completeness: if neither streams nor the streams subscriber implement configurable authorization on stream "flux_jobid", then any authenticated user could spoof flux jobid information in ldmsd. Which is the situation that we currently find ourselves in.

Maybe ldms_stream.rst (a part of PR #1550) may help?

The documentation is great, thank you! And I read it before posting my message. I believe that I understand the design for the most part. I'm just pretty well convinced that the design needs improvement. We need publish-time authorization.

@tom95858
Copy link
Collaborator

@morrone, I agree that finer grained authorization would improve the security.

@morrone morrone removed this from the v4.4.5 milestone Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug V4 Version 4 v5-topic Topic for design of version 5
Projects
None yet
Development

No branches or pull requests

3 participants