config-linux: add schemata field to IntelRdt #1230

marquiz · 2023-09-28T12:04:52Z

Add a new "schemata" field to the Linux IntelRdt configuration. This addresses the complexity of separate schema fields and resolves the issue of supporting currently uncovered RDT features like L2 cache allocation and CDP (Code and Data Prioritization).

The new field is for specifying the complete schemata (all schemas) to be written to the schemata file in Linux resctrl fs. The aim is for simple usage and runtime implementation (by not requiring any parsing/filtering of data or otherwise re-implement parsing or validation of the Linux resctrl interface) and also to support all RDT features now and in the future (i.e. schemas like L2, L2CODE, L2DATA, L3CODE and L3DATA and who knows L4 or something else in the future).

Behavior of existing fields is not changed but it is required that the new schemata field is applied last.

marquiz · 2023-09-28T12:26:45Z

cc @kolyshkin @ipuustin @xiaochenshen @giuseppe

thaJeztah · 2023-09-28T12:49:17Z

(disclaimer: I only very briefly glanced over, so apologies for any mistake in interpreting); IIUC, the new field effectively is to replace the other field(s), but those already shipped in a release.

Should we consider deprecating the other fields, or add wording to discourage use of those going forward (only for backward compatibility)?

marquiz · 2023-09-28T12:58:02Z

Thanks @thaJeztah for the quick feedback

(disclaimer: I only very briefly glanced over, so apologies for any mistake in interpreting); IIUC, the new field effectively is to replace the other field(s), but those already shipped in a release.

Yes, I would say so. This is effectively to make those useless/unneeded, without breaking backwards compatibility. The current model of one field per "schema" is infeasible as we can already see that there are many features (L2 and CDP) not covered by the OSI spec.

Should we consider deprecating the other fields, or add wording to discourage use of those going forward (only for backward compatibility)?

I think we could consider that, at least suggest people use the new field. I dunno how deprecation is reacted to in OCI spec.

marquiz · 2023-09-29T04:36:36Z

@thaJeztah et al, one more consideration is that I don't believe any runtime does any parsing/filtering of the existing l3Schema or mbSchema fields. So in practice the existing fields work just like the new schema field I suggested. An alternative approach would be to make a backwards-incompatible change in the spec and just state the "you can put anything you like in the {l3,mb}Schema fields and the runtime will write those (sequentially, two separate writes) into resctrl schemata file". However, I didn't do that because the backwards-incompatibility and the somewhat misleading naming.

thaJeztah · 2023-09-29T10:24:07Z

config-linux.md

@@ -654,15 +655,17 @@ The following rules on parameters MUST be applied:

 * If either `l3CacheSchema` or `memBwSchema` is set, runtimes MUST write the value to the `schemata` file in the that sub-directory discussed in `closID`.

-* If neither `l3CacheSchema` nor `memBwSchema` is set, runtimes MUST NOT write to `schemata` files in any `resctrl` pseudo-filesystems.
+* If `schemata` field is set, runtimes MUST write the value to the `schemata` file in the that sub-directory discussed in `closID`. If also l3CacheSchema` or `memBwSchema` is set the value of `schemata` field must be written last, after the values from `l3CacheSchema` and `memBwSchema` has been written.


I'll write a longer comment, but I'm considering that we should document that if schemata is set, runtimes MUST ignore both the l3CacheSchema and memBwSchema fields, and instead to use the schemata field instead (and write its content "as-is" (that should probably also be documented on the l3CacheSchema and memBwSchema fields)

This field also needs to be added to the JSON-schema;

runtime-spec/schema/config-linux.json

Lines 259 to 278 in c0e9043

"intelRdt": {

"type": "object",

"properties": {

"closID": {

"type": "string"

},

"l3CacheSchema": {

"type": "string"

},

"memBwSchema": {

"type": "string",

"pattern": "^MB:[^\\n]*$"

},

"enableCMT": {

"type": "boolean"

},

"enableMBM": {

"type": "boolean"

}

}

As well as the go implementation;

runtime-spec/specs-go/config.go

Line 795 in c0e9043

type LinuxIntelRdt struct {

Woops 🙄 Added

thaJeztah · 2023-09-29T10:26:50Z

config-linux.md

        "closID": "guaranteed_group",
-        "l3CacheSchema": "L3:0=7f0;1=1f",
-        "memBwSchema": "MB:0=20;1=70"
+        "schemata": "L3:0=7f0;1=1f\nL2:0=f;1=f;2=f;3=f\nMB:0=20;1=70"


Just for my understanding, none of the other fields (ClosID, enableCMT, enableMBM) are to be included in the schemata file?

runtime-spec/specs-go/config.go

Lines 796 to 814 in c0e9043

// The identity for RDT Class of Service

ClosID string `json:"closID,omitempty"`

// The schema for L3 cache id and capacity bitmask (CBM)

// Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..."

L3CacheSchema string `json:"l3CacheSchema,omitempty"`

// The schema of memory bandwidth per L3 cache id

// Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."

// The unit of memory bandwidth is specified in "percentages" by

// default, and in "MBps" if MBA Software Controller is enabled.

MemBwSchema string `json:"memBwSchema,omitempty"`

// EnableCMT is the flag to indicate if the Intel RDT CMT is enabled. CMT (Cache Monitoring Technology) supports monitoring of

// the last-level cache (LLC) occupancy for the container.

EnableCMT bool `json:"enableCMT,omitempty"`

// EnableMBM is the flag to indicate if the Intel RDT MBM is enabled. MBM (Memory Bandwidth Monitoring) supports monitoring of

// total and local memory bandwidth for the container.

EnableMBM bool `json:"enableMBM,omitempty"`

Correct 👍

thaJeztah · 2023-09-29T10:29:30Z

An alternative approach would be to make a backwards-incompatible change in the spec and just state the "you can put anything you like in the {l3,mb}Schema fields and the runtime will write those

Yeah, not sure that's great, as that would effectively make both an "anything goes" field, but they would give the impression they're only to be used for a specific purpose.

I think the existing fields are a bit of a leaky abstraction already; they give the impression that it's an abstraction, but in the end (but I'm not deeply familiar with this matter) it looks like it's just artificially splitting

So I could see two possible directions;

either go "all the way in", and every possible RDT feature must have a counterpart in the specification
- 👍 runtimes can parse, and validate options
- 👍 runtimes (and the spec) can advertise exactly what RDT options it supports (Add features.md to formalize the runc features JSON #1130)
- ☝️ this can improve portability
- 👎 runtimes (and the runtime spec) must be updated for changes in the RDT feature-set (new features, feature deprecation)
make it clear that there's no abstraction at all;
- the schemata field is a 1:1 mapping to the schemata file
- runtimes SHOULD NOT perform any validation of the content (perhaps some restrictions w.r.t. accepted characters and/or length)
- runtimes MUST write the content of the schemata field to the schemata file as-is

For the schemata field, and deprecation of the existing fields (I'd have to look up the correct wording for "deprecated");

mention that l3Schema and memBwSchema SHOULD NOT be used, for reasons other than backward compatibility with older runtimes.
mention that l3Schema and memBwSchema MUST NOT be added to if schemata if set (i.e., if schemata is used, it replaces any use of l3Schema and memBwSchema)
❓ TBD: what would be the best approach to provide a "hybrid" spec for backward compatibility (i.e., an OCI spec that both older (using the l3Schema and memBwSchema) and newer (through schemata) runtimes would accept?
- older runtimes (taking the existing spec into account) would only be able to use L3 (not L2)
- which means that either the spec will result in a different outcome (older runtimes only L3, newer runtimes (from schemata) potentially using both L3 and L2 features)
- so if we want a "hybrid" spec to produce the SAME result, then runtimes MUST validate that concatenating l3Schema and memBwSchema MUST be identical to the content of the schemata field
- if, on the other hand, we are ok with the result being different from older runtimes, we could either just "document" that they may differ, or document that l3Schema and memBwSchema MUST be empty if schemata is non-empty (i.e. can either use new features, or no RDT at all).

For the schemata field itself

We could refer to Intel's documentation as canonical reference for the accepted format (i.e., what's supported being outside of the OCI runtime specification).
However; if the format itself is well-documented and stable (i.e., new features may be added, but all follow the same format), we COULD document the accepted format (accepted characters, overall format).
- I don't think the runtime-spec's go-implementation generally provides utilities for this ("runtime-tools" might)
- Maybe we could provide / document a (suggested) regular expression that captures the format (which runtimes and runtime-tools can implement).
❓ does RDT provide any form of feature detection? And if so, is that something which could be included in the features.md (or would that be beyond scope?)
- This information could be used by "higher level" runtimes (runtimes that generate the OCI spec for a container)

marquiz · 2023-09-29T12:26:51Z

@thaJeztah appreciate your thoughtful review. My comments in-lined below

Yeah, not sure that's great, as that would effectively make both an "anything goes" field, but they would give the impression they're only to be used for a specific purpose.

I agree

I think the existing fields are a bit of a leaky abstraction already; they give the impression that it's an abstraction, but in the end (but I'm not deeply familiar with this matter) it looks like it's just artificially splitting

Exactly my thinking/feeling. We could possibly use something like

type LinuxIntelRdt struct {
...
	Schemata []LinuxIntelRdtSchema
...
}

type LinuxIntelRdtSchema struct {
	Type   String  // "L2", "L3", "MB" etc
	Schema String  // Depends on the Type, e.g. "0=f;1=f"
}

But I don't think that makes anyone's life any easier.

So I could see two possible directions;

either go "all the way in", and every possible RDT feature must have a counterpart in the specification

👍 runtimes can parse, and validate options

👍 runtimes (and the spec) can advertise exactly what RDT options it supports (Add features.md to formalize the runc features JSON #1130)

☝️ this can improve portability

👎 runtimes (and the runtime spec) must be updated for changes in the RDT feature-set (new features, feature deprecation)

I think this would be a maintenance nightmare. Even with the current very limited validation/filtering specified I think this hasn't been implemented. Plus for a full validation the runtime will need virtually to replicate the validation code in linux kernel. Who knows what the actual format for future "FOOBAR" technology will look like. And e.g. in the case of RDT the min/max width of the bitmask varies between HW etc.

make it clear that there's no abstraction at all;

the schemata field is a 1:1 mapping to the schemata file

runtimes SHOULD NOT perform any validation of the content (perhaps some restrictions w.r.t. accepted characters and/or length)

runtimes MUST write the content of the schemata field to the schemata file as-is

I'm a strong proponent of this approach. Let kernel do the validation because it can do it correctly 😇

For the schemata field, and deprecation of the existing fields (I'd have to look up the correct wording for "deprecated");

mention that l3Schema and memBwSchema SHOULD NOT be used, for reasons other than backward compatibility with older runtimes.

mention that l3Schema and memBwSchema MUST NOT be added to if schemata if set (i.e., if schemata is used, it replaces any use of l3Schema and memBwSchema)

❓ TBD: what would be the best approach to provide a "hybrid" spec for backward compatibility (i.e., an OCI spec that both older (using the l3Schema and memBwSchema) and newer (through schemata) runtimes would accept?

older runtimes (taking the existing spec into account) would only be able to use L3 (not L2)

which means that either the spec will result in a different outcome (older runtimes only L3, newer runtimes (from schemata) potentially using both L3 and L2 features)

so if we want a "hybrid" spec to produce the SAME result, then runtimes MUST validate that concatenating l3Schema and memBwSchema MUST be identical to the content of the schemata field

if, on the other hand, we are ok with the result being different from older runtimes, we could either just "document" that they may differ, or document that l3Schema and memBwSchema MUST be empty if schemata is non-empty (i.e. can either use new features, or no RDT at all).

I'd strongly suggest to keep it simple. Any schema validation will be a tricky business to do right with all the corner cases like L3:0=f;1=f0 and L3:1=f0;0=f will have the same end result. So either
-l3Schema and memBwSchema MUST NOT be set to if schemata if set

or then just accept that the end result may be different

In any case the "portability" of the schemata is very poor even between reboots/remount of the resctrlfs as the format and accepted fields in the schemata file depends e.g. on the mount options.

For the schemata field itself

We could refer to Intel's documentation as canonical reference for the accepted format (i.e., what's supported being outside of the OCI runtime specification).

However; if the format itself is well-documented and stable (i.e., new features may be added, but all follow the same format), we COULD document the accepted format (accepted characters, overall format).

I don't think the runtime-spec's go-implementation generally provides utilities for this ("runtime-tools" might)

Maybe we could provide / document a (suggested) regular expression that captures the format (which runtimes and runtime-tools can implement).

❓ does RDT provide any form of feature detection? And if so, is that something which could be included in the features.md (or would that be beyond scope?)

This information could be used by "higher level" runtimes (runtimes that generate the OCI spec for a container)

The documentation is not Intel's but in Linux kernel (https://docs.kernel.org/arch/x86/resctrl.html). But 👍 for adding a reference. I would refer to that directly instead of describing it here. With new kernel versions there may be (backwards-compatible) extensions and new technologies enabled that the OCI spec would then be missing. I think that also applies to any regular expression we tried to add here.

RDT does provide information about the currently active setup (what features are enabled and in which configuration) under /sys/fs/resctrl/info/. That might be something useful to add into features.md but I think that would be a subject of a separate PR – to think about thoroughly.

thaJeztah · 2023-09-29T13:00:14Z

I think this would be a maintenance nightmare. Even with the current very limited validation/filtering specified I think this hasn't been implemented

💯 I fully agree. My main intent here was to mention options we considered (for future visitors, including "future self"), and rejected (for reasons mentioned). It's good to try to provide an abstraction, but in some case, such as this one, I don't think that's possible, and we just need to acknowledge that some parts of the OCI just are very platform-specific and tightly coupled to kernel/platform features.

I'd strongly suggest to keep it simple. Any schema validation will be a tricky business

Yes, that's what I somewhat expected the case to be; I guess the recommendation would be "use the new option", and only in exceptional cases consider using "both" (if you like schrödingers cat).

The documentation is not Intel's but in Linux kernel (https://docs.kernel.org/arch/x86/resctrl.html). But 👍 for adding a reference.

Heh, yes, let myself go on the "canonical reference" side; definitely "+1" on keeping docs in this spec as minimal as possible, and refer to the kernel reference as source of truth.

I think that also applies to any regular expression we tried to add here.

If it's well-defined and a regex is possible (either in docs and/or the JSON-schema), I guess that would be good to have.

RDT does provide information about the currently active setup (what features are enabled and in which configuration) under /sys/fs/resctrl/info/. That might be something useful to add into features.md but I think that would be a subject of a separate PR – to think about thoroughly.

Yes, that's for sure not a blocker (from my point of view) for this PR.

My main motivation for some of these comments is to explore what amount of validation can reasonably be performed by (higher-level) runtimes before "the spec hits the kernel".

My general rule of thumb is to keep that as minimal as possible for most cases (unless we can absolutely be sure), but a "well-formedness" check (where possible) can help. Having some amount of early error-handling can reduce burden on maintainers (as all OCI / kernel / runtime errors result in a "OCI runtime" error, and those tend to be piled on by users "I have this same issue!" (no you don't 😂))

marquiz · 2023-09-29T13:23:29Z

If it's well-defined and a regex is possible (either in docs and/or the JSON-schema), I guess that would be good to have.

My feeling is that it's not well-defined enough. The currently supported format for currently supported technologies is described (https://docs.kernel.org/arch/x86/resctrl.html#l3-schemata-file-details-code-and-data-prioritization-disabled) but I wouldn't count on that. Say next kernel version will start supporting id ranges like L2:0-4=ff;5-8=ff00 or the new FOOBAR technology supporting some new format FOOBAR:xyzy;1234 we're screwed

marquiz · 2023-09-29T13:30:08Z

@thaJeztah I think I'll sleep over this and push an update after the weekend, trying to capture all the review comments

thaJeztah · 2023-09-29T15:09:58Z

For sure! Mostly exploring here if it's possible (within reason). Having a quick look at examples on that link;

# echo "L3:0=3;1=3\nMB:0=1024;1=500" > /sys/fs/resctrl/p1/schemata
# echo -e "L3:0=f8000;1=fffff\nMB:0=20;1=100" > p0/schemata

Which would translate to;

L3:0=3;1=3
MB:0=1024;1=500

and;

L3:0=f8000;1=fffff
MB:0=20;1=100

Based on that the format "roughly" is a newline-delimited set of features;

<feature>:<[key=value;]...>
<feature>:<[key=value;]...>

first column is the name of the feature, followed by a colon (:)
followed by (one or more / zero or more?) semi-colon (;) separated key=value pairs.

So I guess the blanks to fill in is what are accepted characters for feature, key, and value

Whether <key>=<value> pairs are compulsory (i.e. is <feature>: without any options valid), might be useful and to some extent if =<value> is compulsory (or "key-only" are allowed), but I guess any of that may be going too much into details for the spec itself (we can treat [key=value;] mostly as a opaque element, other than "valid characters")

marquiz · 2023-10-02T12:22:41Z

Based on that the format "roughly" is a newline-delimited set of features;
<feature>:<[key=value;]...>

Yeah, the currently availabe technologies/features follow this format. My fear is that we could get bitten by future extensions (like L3:1,3,5-10=f) or new stuff, like you speculated <feature>:<key1>[;<key2>...].

One option would be to validate the currently known features and just not validate unknown stuff. Another is to follow the currently ubiquitous format <feature>:<[key=value;]...>. A third variant as you suggested would be to validate <feature>:<blob>[;<blob>] (where blob could be <key>=<value> or smth else)

In any case, we could make the schemata field an array of schemas

Schemata []string

giuseppe · 2023-10-10T19:10:34Z

config-linux.md

  * if `closID` directory in a mounted `resctrl` pseudo-filesystem doesn't exist, the runtimes MUST create it.
-  * if `closID` directory in a mounted `resctrl` pseudo-filesystem exists, runtimes MUST compare `l3CacheSchema` and/or `memBwSchema` value with `schemata` file, and [generate an error](runtime.md#errors) if doesn't match.
+  * if `closID` directory in a mounted `resctrl` pseudo-filesystem exists, runtimes MUST compare `l3CacheSchema` and/or `memBwSchema` and/or `schemata` value with `schemata` file, and [generate an error](runtime.md#errors) if doesn't match.


too late too relax it to "SHOULD"? It is not so obvious how the comparison should be done from a runtime

@giuseppe good point. For the old fields (l3CacheSchema and memBwSchema) let's just keep it as is. But for the new schemata field we could just drop this requirement. As you said it's not at all clear how this comparison should be implemented (I'd consider that as internal implementation details of the Linux kernel).

This might actually even be desirable. If you think that the higher-level runtime would be managing the CLOS configuration. With the old behavior you cannot actually change the configuration of existing groups, you'd need to delete it and re-create it (and if there were some running containers there you'd need to freeze them, list the pids, delete CLOS, re-create CLOS with new config, assign the pids to the new group, unfreeze...).

Add a new "schemata" field to the Linux IntelRdt configuration. This addresses the complexity of separate schema fields and resolves the issue of supporting currently uncovered RDT features like L2 cache allocation and CDP (Code and Data Prioritization). The new field is for specifying the complete schemata (all schemas) to be written to the schemata file in Linux resctrl fs. The aim is for simple usage and runtime implementation (by not requiring any parsing/filtering of data or otherwise re-implement parsing or validation of the Linux resctrl interface) and also to support all RDT features now and in the future (i.e. schemas like L2, L2CODE, L2DATA, L3CODE and L3DATA and who knows L4 or something else in the future). Behavior of existing fields is not changed but it is required that the new schemata field is applied last. Signed-off-by: Markus Lehtonen <[email protected]>

marquiz · 2023-10-11T07:21:40Z

Small update: based on @giuseppe's comment I removed the requirement for comparing the new schemata field to the content of the schemata file. The behavior of old l3CacheSchema and memBwSchema fields is unchanged.

marquiz requested review from AkihiroSuda, crosbymichael, cyphar, dqminh, giuseppe, hqhq, kolyshkin, mrunalp, thaJeztah, tianon, vbatts and utam0k as code owners September 28, 2023 12:04

marquiz force-pushed the devel/rdt branch from 704daee to ba46f4e Compare September 28, 2023 12:05

marquiz mentioned this pull request Sep 28, 2023

Clarify Intel RDT configuration #1196

Open

thaJeztah reviewed Sep 29, 2023

View reviewed changes

marquiz force-pushed the devel/rdt branch from ba46f4e to 8062b07 Compare September 29, 2023 11:38

marquiz mentioned this pull request Oct 10, 2023

config-linux: clarify the handling of ClosID RDT parameter #1104

Merged

giuseppe reviewed Oct 10, 2023

View reviewed changes

marquiz force-pushed the devel/rdt branch from 8062b07 to 063c8d3 Compare October 11, 2023 07:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config-linux: add schemata field to IntelRdt #1230

config-linux: add schemata field to IntelRdt #1230

marquiz commented Sep 28, 2023

marquiz commented Sep 28, 2023

thaJeztah commented Sep 28, 2023

marquiz commented Sep 28, 2023

marquiz commented Sep 29, 2023

thaJeztah Sep 29, 2023

thaJeztah Sep 29, 2023

marquiz Sep 29, 2023

thaJeztah Sep 29, 2023

marquiz Sep 29, 2023

thaJeztah commented Sep 29, 2023

marquiz commented Sep 29, 2023

thaJeztah commented Sep 29, 2023

marquiz commented Sep 29, 2023

marquiz commented Sep 29, 2023

thaJeztah commented Sep 29, 2023

marquiz commented Oct 2, 2023

giuseppe Oct 10, 2023

marquiz Oct 11, 2023

marquiz commented Oct 11, 2023

	"intelRdt": {
	"type": "object",
	"properties": {
	"closID": {
	"type": "string"
	},
	"l3CacheSchema": {
	"type": "string"
	},
	"memBwSchema": {
	"type": "string",
	"pattern": "^MB:[^\\n]*$"
	},
	"enableCMT": {
	"type": "boolean"
	},
	"enableMBM": {
	"type": "boolean"
	}
	}

	// The identity for RDT Class of Service
	ClosID string `json:"closID,omitempty"`
	// The schema for L3 cache id and capacity bitmask (CBM)
	// Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..."
	L3CacheSchema string `json:"l3CacheSchema,omitempty"`

	// The schema of memory bandwidth per L3 cache id
	// Format: "MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;..."
	// The unit of memory bandwidth is specified in "percentages" by
	// default, and in "MBps" if MBA Software Controller is enabled.
	MemBwSchema string `json:"memBwSchema,omitempty"`

	// EnableCMT is the flag to indicate if the Intel RDT CMT is enabled. CMT (Cache Monitoring Technology) supports monitoring of
	// the last-level cache (LLC) occupancy for the container.
	EnableCMT bool `json:"enableCMT,omitempty"`

	// EnableMBM is the flag to indicate if the Intel RDT MBM is enabled. MBM (Memory Bandwidth Monitoring) supports monitoring of
	// total and local memory bandwidth for the container.
	EnableMBM bool `json:"enableMBM,omitempty"`

config-linux: add schemata field to IntelRdt #1230

Are you sure you want to change the base?

config-linux: add schemata field to IntelRdt #1230

Conversation

marquiz commented Sep 28, 2023

marquiz commented Sep 28, 2023

thaJeztah commented Sep 28, 2023

marquiz commented Sep 28, 2023

marquiz commented Sep 29, 2023

thaJeztah Sep 29, 2023

Choose a reason for hiding this comment

thaJeztah Sep 29, 2023

Choose a reason for hiding this comment

marquiz Sep 29, 2023

Choose a reason for hiding this comment

thaJeztah Sep 29, 2023

Choose a reason for hiding this comment

marquiz Sep 29, 2023

Choose a reason for hiding this comment

thaJeztah commented Sep 29, 2023

marquiz commented Sep 29, 2023

thaJeztah commented Sep 29, 2023

marquiz commented Sep 29, 2023

marquiz commented Sep 29, 2023

thaJeztah commented Sep 29, 2023

marquiz commented Oct 2, 2023

giuseppe Oct 10, 2023

Choose a reason for hiding this comment

marquiz Oct 11, 2023

Choose a reason for hiding this comment

marquiz commented Oct 11, 2023