Claiming i915 resources in a single container has 2 cards accessible for Intel GPU Flex 140 #1377

vbedida79 · 2023-04-12T00:50:37Z

Summary

With default allocation policy "none", shared-dev-num=10 and Intel® Data Center GPU Flex 140 - if a container is allocated 3 or more gpu.intel.com/i915 resources, 2 gpu cards are accessible in the pod.

Detail

On Openshift 4.12 cluster, created gpu device plugin CR with allocation-policy none and shared-dev-num 10- on a node with Intel GPU Flex 140 card. So the node has 2 cards and 20 gpu.intel.com/i915 resources. For clinfo pod, if 1 or 2 gpu.intel.com/i915 resources are allocated, one card is accessible from the pod.

sh-4.4$ cd /dev/dri
sh-4.4$ ls
card0  renderD128

If more than three resources are allocated both the cards are accessible inside the pod

sh-4.4$ cd /dev/dri
sh-4.4$ ls
card0  card1  renderD128  renderD129

Is this expected by design?

Possible solutions

Card sharing: Card sharing from the drivers/hardware perspective is limited/not supported. Though 2 containers can share card, there's no card isolation for containers and can have security concerns. As we cannot use GAS yet, can we assume there is no way to support card sharing among containers? Is it possible to disable the sharing of card- i.e change the resource-allocation-policy options. That means user can only input shared dev num and one container can use the card exclusively.
Inconsistency of claiming resources and access to gpu devices: Do containers need multiple i915 resources? In the case of packed policy, shared-dev-num:3, each container requesting 2 resources- one container has access to 1 card, another container has access to 2 cards. 1 container should claim i915 resource, but they can claim more than 1 i915 resource
To resolve it, can containers be limited to request only 1 i915 resource each?

Thanks!

The text was updated successfully, but these errors were encountered:

tkatila · 2023-04-12T08:04:29Z

Thanks for reaching out!

In short, the scenario is by design. The best option for you is to modify the deployment to use balanced mode. For 140 vs. 170, GPU-plugin threats them as same. For now we are not planning on changing the default allocation policy.

Longer story:
With shared-dev-num GPU-plugin creates artificial devices for the node card0-0, card0-1, card0-2 etc. Each of them represent the same card on the node. With multiple cards, plugin generates card0-0, card0-1, card1-0, card1-1, card2-0, card2-1 etc. When GPU-plugin is requested to allocate i915 resources to a container, it gets a list of possibilities from kubelet. With none allocation policy, plugin just selects the first n devices (order is not guaranteed) and returns them. So in the worst case, by requesting 10x i915s, you might only get one card in the container.

Our GPU Aware Scheduler provides better granularity to reserve GPUs, but it's a bigger effort to set up:
https://github.com/intel/platform-aware-scheduling/tree/master/gpu-aware-scheduling

eero-t · 2023-04-12T08:54:50Z

This tells more about the features supported by the GPU Aware Scheduler (GAS): https://github.com/intel/platform-aware-scheduling/blob/master/gpu-aware-scheduling/docs/usage.md

Note that there are few gotchas in installing GAS, see: intel/platform-aware-scheduling#126

uMartinXu · 2023-04-12T14:59:54Z

Thanks for your quick response, we will have a look at how GAS handles it.

However, for the current design, It might cause some inconsistent issue.

For example:
on Flex 140 with 2 GPU Cards, we set Shared-dev-num as 3, and mode as packaked. And we have a set of containers that claim i915 resouces as 2 for each container. and on the cluster, only one Flex 140 there.

For the first container scheduled to access the i915 resource. it only access the single card(card0).
But the second container can access 2 cards(card0, card 1).
And the third container can only see and access singel card(card1)

So with the same configurations, the different container might have different access permission to the two cards. That makes system results and behavior not consistent anymore.

I think maybe the possible solution is to limit the i915 resource to 1 for each container. That can resolve all the potential issue.

Thanks!

vbedida79 · 2023-04-12T15:18:21Z

Thanks for reaching out!

In short, the scenario is by design. The best option for you is to modify the deployment to use balanced mode. For 140 vs. 170, GPU-plugin threats them as same. For now we are not planning on changing the default allocation policy.

Longer story: With shared-dev-num GPU-plugin creates artificial devices for the node card0-0, card0-1, card0-2 etc. Each of them represent the same card on the node. With multiple cards, plugin generates card0-0, card0-1, card1-0, card1-1, card2-0, card2-1 etc. When GPU-plugin is requested to allocate i915 resources to a container, it gets a list of possibilities from kubelet. With none allocation policy, plugin just selects the first n devices (order is not guaranteed) and returns them. So in the worst case, by requesting 10x i915s, you might only get one card in the container.

Our GPU Aware Scheduler provides better granularity to reserve GPUs, but it's a bigger effort to set up: https://github.com/intel/platform-aware-scheduling/tree/master/gpu-aware-scheduling

Thank you! So the none policy allocates number of cards(in this case 1 or 2) for each pod randomly, irrespective of how many resources requested- Its different each time, is my understanding right? In this example case of balanced mode, for containers requesting <= 2 resources, it shows one card. For resources > 2, it shows 2 cards. Is this expected as well?

eero-t · 2023-04-12T15:59:57Z

Yes, "i915" resource is for number of GPU devices that should be visible [1] within given container.

GPU plugin share count (CLI option) is just an upper limit on to how many containers the given GPU device may be assigned at the same time.

When using just the GPU plugin, you can limit each container to its own GPU device only by using share count of 1.

Whereas with GAS, you specify the sharing using GPU memory and millicore resource requests (and having large enough share count for GPU plugin to support that sharing), like you would do with k8s CPU resources.

If e.g. at max 2 containers should be given the same GPU, containers should request half a core i.e. 500 millicores (see linked GAS docs for pod spec examples).

Note that these GPU resource requests are just used for pod scheduling and device assignment decisions, there's no (cgroup based) enforcement for them, like there's for CPU resources [2].

To avoid GPU being underutilized (when it has not enough work to do), or container perf suffering (when GPU has too much work), those requests should roughly correspond to how much GPU your workload actually uses. You can check that e.g. with the intel_gpu_top CLI tool from the intel-gpu-tools package. Or if you have Prometheus / Grafana running in your cluster, you can use XPU manager [3] to export GPU resource usage info to Prometheus: https://github.com/intel/xpumanager

[1] Whether those devices can actually be used when container uses some other user ID than root, depends on the user ID, device file access rights on the node hosts, and container runtime settings. For more info, see: https://kubernetes.io/blog/2021/11/09/non-root-containers-and-devices/

[2] Upstream kernel DRM infrastructure does not support cgroup for GPUs yet. Currently, enforcing things for GPU sharing would require e.g. GPU that supports SR-IOV (as Flex does) and configuring the SR-IOV partitions beforehand. Each SR-IOV partition will then show up as separate GPU.

[3] Most of GPU metrics are available for the SR-IOV PF device, not for the VFs. GPU plugin handles that by providing only VFs for "i915" resource requests, and "i915_monitoring" resource (requested by GPU exporters) including also PF device in addition to all VFs. Note: monitoring resource needs to be explicitly enabled with GPU plugin option, it's not enabled by default.

tkatila · 2023-04-12T16:50:55Z

Thank you! So the none policy allocates number of cards(in this case 1 or 2) for each pod randomly, irrespective of how many resources requested- Its different each time, is my understanding right? In this example case of balanced mode, for containers requesting <= 2 resources, it shows one card. For resources > 2, it shows 2 cards. Is this expected as well?

None policy can result in 1 or 2 actual GPUs to the container if i915 == 2. Balanced policy should result in 2 actual GPUs to the container if i915 == 2. If this is not the case, can you supply GPU-plugin logs for me to check? You'll have to add "-v=5" to gpu-plugin arguments as the default doesn't print much of anything.
https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/deployments/gpu_plugin/overlays/monitoring_shared-dev_nfd/add-args.yaml#L13

For the first container scheduled to access the i915 resource. it only access the single card(card0).
But the second container can access 2 cards(card0, card 1).
And the third container can only see and access singel card(card1)

Yes, that's correct. Packed mode "packs" each GPU before moving to the next one.

I think maybe the possible solution is to limit the i915 resource to 1 for each container. That can resolve all the potential issue.

That's a way to workaround the limitations yes.

uMartinXu · 2023-04-12T23:47:00Z

I think maybe the possible solution is to limit the i915 resource to 1 for each container. That can resolve all the potential issue.

That's a way to workaround the limitations yes.

Do we have a way to limit the i915 resource claim to 1 from gpu device plugin side? Or there has some other way to achieve it? You know currently, we can not use GAS yet.

tkatila · 2023-04-13T10:16:25Z

I get the feeling we might be thinking things differently. Do you have a workload that would require both GPUs from a single Flex 140 physical card? We don't currently have such allocation policy, though I believe it could be implemented.

The next best thing is to set shared-dev-count to >1, set allocation policy to balanced, and use i915==2 in containers. That should allocate GPUs so that each pod gets access to one physical GPU (with two logical devices).

But in general, one should consider the Flex 140 logical cards as normal GPUs. They share power and bus but other than that they work as independent entities.

uMartinXu · 2023-04-17T19:03:48Z

Do you have a workload that would require both GPUs from a single Flex 140 physical card?
I am not sure about that.
We don't currently have such allocation policy, though I believe it could be implemented.
That is the inconsistency issue, for some option combinations, some container can actually access 2 cards while some only access 1 card.

I think since we have several different combinations for the options, we also need to let users know what is the behavior or potential issues of these combinations, And what combination is preferred by us.

tkatila · 2023-04-18T06:17:20Z

That is the inconsistency issue, for some option combinations, some container can actually access 2 cards while some only access 1 card.

IMO shared dev num is not really intended to give more than one GPU to a container. You could ask 10 x i915 resources from a node with only two physical GPUs and the container would receive those 10 resources. But the container would get only one or two actual GPU devices.
Shared dev num is intended for cases where workloads need GPUs intermittently or just required a (constant) portion of the GPU's overall power.
If the workloads require more than one GPU, then it's best to set shared-dev-num=1, or use GAS.

I think since we have several different combinations for the options, we also need to let users know what is the behavior or potential issues of these combinations, And what combination is preferred by us.

Yes, I agree. I created a ticket for improving the documentation: #1381

eero-t · 2023-04-18T08:38:16Z

In general sharing is more for things that are batched, rather than things that have specific latency requirements.

If you do have latency requirements, in addition to either using small enough share count that more cannot be schedule to same GPU, or by using GAS + suitable Pod GPU resource requests to take care of that, you need to vet the workloads to make sure they fit into same GPU without causing latency issues for the workloads running in parallel on the same GPU.

(Reminder: kernel does not provide enforcement for how much GPU time each client can use, except for the "reset the offending GPU context" hang-check, which you may want to disable when running "random" HPC use-cases because their compute kernels may run long enough to trigger the hang check.)

uMartinXu · 2023-04-28T04:53:32Z

Thanks @vbedida79 for updating the "Possible solutions"
I'd like to summarize our suggestions here:

Since card sharing and fractional use will not be supported for our coming Intel technology enabling for OpenShift project release, in order to avoid the confusing from our users, could we only enable shared-dev-num == 1 and disable the other modes in GPU plugin which is mentioned in
https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/cmd/gpu_plugin/README.md#operation-modes-for-different-workload-types
Could GPU device plugin only allows a container can only claim the single i915 resource? So it can resolve the inconsistent issue we mentioned above.
BTW, I have gone through the updated GPU device plugin readme as well as the GAS readme. I still can't find any clue why i915 resource can be set as > 1. Please educate me.
So as to fractional use, card sharing and GAS support, Could we continue to discuss for the future release?

Thanks for your support!

tkatila · 2023-04-28T06:04:50Z

Since card sharing and fractional use will not be supported for our coming Intel technology enabling for OpenShift project release, in order to avoid the confusing from our users, could we only enable shared-dev-num == 1 and disable the other modes in GPU plugin which is mentioned in
https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/cmd/gpu_plugin/README.md#operation-modes-for-different-workload-types

Using shared-dev-num = 1 (=no sharing) is the simplest case. I'd say that's the best basic setup. You can then have as many containers using a GPU as you have GPUs (as long as each container requests one GPU).

Could GPU device plugin only allows a container can only claim the single i915 resource? So it can resolve the inconsistent issue we mentioned above.

Currently with only GPU-plugin, the only way to achieve that is to use shared-dev-num == 1. Though, I guess we could change the default functionality so that it tries to pick different GPUs, if possible. But even that would start returning duplicate GPUs when the available GPUs are scarce.

BTW, I have gone through the updated GPU device plugin readme as well as the GAS readme. I still can't find any clue why i915 resource can be set as > 1. Please educate me.

What would be the use cases for having shared-dev-num > 1?

So as to fractional use, card sharing and GAS support, Could we continue to discuss for the future release?

Sure.

eero-t · 2023-04-28T12:16:18Z

BTW, I have gone through the updated GPU device plugin readme as well as the GAS readme. I still can't find any clue why i915 resource can be set as > 1. Please educate me.

It's required to support GPU sharing with GAS resource management, as it's the upper limit on the sharing.

It can also be used when you have only single GPU per node, know all the workloads, and manually make sure that everything you schedule fits to GPUs (heavier jobs have affinity rules to avoid them being scheduled to same node, and if you have multiple heavy deployments, you either use labels to put them to different sets of nodes, or do not schedule them at same time).

It can also be useful for testing / development if you do not have that many GPUs in your cluster.

uMartinXu · 2023-04-28T18:07:53Z

Please allowed me to describe "the i915 resource set >1" more clearly.
Normally each container only claimes gpu.intel.com/i915 as 1 as the below example

        resources:
          limits:
            gpu.intel.com/i915: 1
            gpu.intel.com/tiles: 2

But we found that users actually can set gpu.intel.com/i915 as any number they want, like

        resources:
          limits:
            gpu.intel.com/i915: 3
            gpu.intel.com/tiles: 2

That will cause the inconsistent issue we mentioned.
Is it by design or an issue?

We hope that users can only set gpu.intel/i915 as 1, all the other numbers should not be allowed.

eero-t · 2023-05-02T14:48:40Z

But we found that users actually can set gpu.intel.com/i915 as any number they want, like
        resources:
          limits:
            gpu.intel.com/i915: 3
            gpu.intel.com/tiles: 2
That will cause the inconsistent issue we mentioned. Is it by design or an issue?

It's by design. While same workload using multiple physical GPUs is uncommon, it's a valid "scale-up" use-case. For example AI training back propagation with GPU-to-GPU communication (e.g. using PVC XeLink).

PR linked by Tuomas makes best-effort to provide workload separate GPU devices when it requests multiple ones, even when using "none" policy and no GAS is in use.

We hope that users can only set gpu.intel/i915 as 1, all the other numbers should not be allowed.

Share count = 1 with nodes having only single GPU at max, would result in pods which have container(s) that request more, remaining in Pending state.

If you'd like such pods being kept in Pending state also on other situations, or to be rejected outright... That's probably better done on k8s control plane side, as generic resources management functionality, rather than something GPU specific.

There's already LimitRange for normal k8s resources (memory/CPU): https://kubernetes.io/docs/concepts/policy/limit-range/

Maybe that could be extended to support also extended resource counts. It would be upstream k8s feature, but you might want to file separate ticket here as reminder, if you really need such thing. Getting it upstream will be much longer process though.

PS. If you want want one-liner for finding out how many GPUs containers currently running in your cluster have requested, you can use this:

kubectl get pods --all-namespaces -o go-template --template='
{{range .items}}
  {{"pod: "}}{{.metadata.name}}{{range .spec.containers}}{{if .resources.requests}}
      {{"container name: "}}{{.name}}
      {{"image: "}}{{.image}}
      {{if .resources.requests}}
        {{"requests: "}}{{.resources.requests}}
      {{else if .resources.limits}}
        {{"limits: "}}{{.resources.limits}}
      {{end}}{{end}}{{end}}{{end}}' |\
grep -F gpu.intel.com

which provides output like this:

        requests: map[gpu.intel.com/i915_monitoring:1]
        requests: map[gpu.intel.com/i915_monitoring:1]
        requests: map[gpu.intel.com/i915_monitoring:1]

To see to which pods these resources belong to, just add -B4 option for grep:

  pod: xpu-manager-268s4
      container name: xpumd
      image: intel/xpu-manager:V1.2.8
      
        requests: map[gpu.intel.com/i915_monitoring:1]
--
  pod: xpu-manager-js2fg
      container name: xpumd
      image: intel/xpu-manager:V1.2.8
      
        requests: map[gpu.intel.com/i915_monitoring:1]
--
...

uMartinXu · 2023-05-02T17:28:36Z

@eero-t
I am l little lost here. :-)
Can you let me know what is the definition of resource "gpu.intel.com/i915"?
Maybe from the beginning, we have some misunderstandings. :-)

uMartinXu · 2023-05-02T21:19:02Z

From https://github.com/intel/platform-aware-scheduling/tree/master/gpu-aware-scheduling
I can see below definition
The only resource treated differently is the GPU-plugin "i915"-resource, which is considered to describe "from how many GPUs the GPU-resources for the POD should be evenly consumed". That is, if each GPU has e.g. capacity of 1000 "gpu.intel.com/millicores", and POD spec has a limit for two (2) "gpu.intel.com/i915" and 2000 "gpu.intel.com/millicores", that POD will consume 1000 millicores from two GPUs, totaling 2000 millicores.
However from Intel device plugin according to our testing, i915 resources is: how many containers can share the same GPU card.

tkatila · 2023-05-03T07:01:34Z

With shared-dev-num == 1 => no sharing

i915 resource (of gpu.intel.com/i915) represents a GPU device on a node. It is similar to k8s' traditional resources like cpu or memory with the exception that i915 only supports full values (not 0.5 or 500m). In practice, when a Pod requests an i915 resource in its pod spec, the resulting container will see /dev/dri/cardX and /dev/dri/renderDY devices.

If the pod spec indicates 2 i915 resources, the resulting container will see /dev/dri/cardX and /dev/dri/cardX+1 (and the render devices).

No other containers will have access to the GPUs that were allocated to the Pod.

With shared-dev-num > 1 => sharing

The difference here is that a GPU device on a node is not dedicated to any particular Pod. There can be multiple Pods using a GPU on a node (up to the shared-dev-num count). Lets say the shared-dev-num is set to 2. A Pod with i915: 1 will get "half" of one GPU on a node. If another Pod is scheduled (with i915: 1), it will get the other half of the same GPU. The two Pods would end up sharing one GPU - sharing its execution time, memory etc. The i915 count in the Pod spec, in most cases, should be 1.

Shared-dev-num selection

Shared-dev-num should be selected depending on the use case of the cluster. If you are doing heavy computing, shared-dev-num should be probably set to 1. You are using the whole GPU and there's no room for other users.
If the cluster is more of a generic use, moderate workloads, then maybe shared-dev-num should be set to 2, 5 or 10. The expectation is that a Pod shouldn't consume the GPU's resources completely: short heavy loads, or long light loads.

Note: as you are not using GAS, you shouldn't mix millicores or memory.max resources in the equation. They don't apply when you are only using GPU plugin.

eero-t · 2023-05-03T14:41:40Z

FYI: When Tuomas says "Pod" in above, he really means "container in a Pod". Resources are container attributes...

Can you let me know what is the definition of resource "gpu.intel.com/i915"?

(Tuomas/Ukri, please correct in case I've misunderstood / remember something wrong.)

Technically, it's a per-node list of resource IDs [1] and matching device file paths (/dev/dri/{card,renderD}*) on given node, provided to kubelet (k8s node agent) by the per-node GPU plugin instances. When Intel GPU plugin sharing option is used, that list includes given number of aliases for each device.

K8s scheduler knows just the size of that list, and how many of them are already allocated on each node. When scheduling a pod that requests i915 GPU resource, k8s scheduler picks list of candidate nodes where unallocated i915 resource count (integer) is large enough to satisfy the request, and that list is filtered further by scheduler extensions (like GAS), to get target node for the pod.

When pod is deployed on that node, kubelet (k8s node agent) asks device plugin for device paths matching the specified resource IDs. At that point Intel GPU plugin can still affect them, based e.g. on pod annotations added by GAS.

(When GAS is in use, it tells GPU plugin what device should be mapped to which container using pod annotations. Only when GAS is not used, GPU plugin does (some) independent decisions about that.)

Kubelet adds provided device paths to pod container's OCI spec before giving it to container runtime, which uses devices cgroup or BPF to provide access [2] only to specified devices files for the created container (in addition to few devices provided for all containers).

[1] Different GPU plugins (Intel, AMD, Nvidia) use different resource IDs: BNF (PCI bus address), device UUID, or in case of Intel GPU plugin, GPU device control file name (as that makes some things simpler with GAS).

[2] For non-root containers, see: https://kubernetes.io/blog/2021/11/09/non-root-containers-and-devices/

Btw. There's also a use-case where different containers want the same GPU device. For example cloud game pod where one of its containers is for the actual game, and another one for the media streaming functionality.

GAS has functionality to handle these different kind of use-cases, for example:
https://github.com/intel/platform-aware-scheduling/blob/master/gpu-aware-scheduling/docs/usage.md#enforcing-same-gpu-to-multiple-containers-within-pod

(IMHO using share count > 1 without GAS is a bit of a corner-case.)

PS. In future, GPU workloads could use also DRA (dynamic resource allocation): https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/3063-dynamic-resource-allocation#summary

With DRA, GPU device requests would work more like current k8s volume handling. You have separate device allocation objects (similar to PVs, Persistent Volumes) which pods can then claim (similar to PVCs, Persistent Volume Claims).

eero-t · 2023-05-03T15:44:45Z

(IMHO using share count > 1 without GAS is a bit of a corner-case.)

Discussed with Ukri, and there's a clear use-case for this. A dedicated cluster with specific GPU workload pod, where it has been validated how many instances of that pod can work in parallel on the same GPU (without interfering each other too much). That does not need GAS, just a GPU plugin with suitable share count.

uMartinXu · 2023-05-04T17:49:41Z

@mythi Could you help to clarify the i915 resource definition in GPU device plugin? You know, we have to clearly understand this basic resource definition

What @eero-t mentioned i915 resources definition makes sense to me. But It is not aligned with what we observed in the current GPU device plugin.
according to our testing:
i915 == share-dev-num * cards number on the node.
-shared-dev-num specifies how many containers is allowed to access that one card
Thanks!
@uniemimu Could you please reopen this issue? I think we haven't finished the discussing yet. :-)

mythi · 2023-05-04T17:59:19Z

@mythi Could you help to clarify the i915 resource definition in GPU device plugin?

I'm not the right person for this. The GPU maintainers in this thread can follow up with this if necessary

tkatila · 2023-05-05T05:53:50Z

What @eero-t mentioned i915 resources definition makes sense to me. But It is not aligned with what we observed in the current GPU device plugin. according to our testing: i915 == share-dev-num * cards number on the node. -shared-dev-num specifies how many containers is allowed to access that one card Thanks!

Your equation is correct: i915 == share-dev-num * cards on the node

@uniemimu Could you please reopen this issue? I think we haven't finished the discussing yet. :-)

I've created two issues for the continuation: #1407 for the i915 documentation improvements and #1408 for the i915 resource limitation per Pod. I'd like the discussion to continue in those tickets. I hope our understanding of the i915 and shared-dev-num is now aligned.

eero-t · 2023-05-05T13:37:43Z

What @eero-t mentioned i915 resources definition makes sense to me. But It is not aligned with what we observed in the current GPU device plugin. according to our testing: i915 == share-dev-num * cards number on the node. -shared-dev-num specifies how many containers is allowed to access that one card

It is aligned to what I described. share-dev-num is resource "alias" count for the devices i.e. how many containers (requesting single i915 resource) can map the given device at the same time.

uMartinXu mentioned this issue Apr 12, 2023

Upstream & OCP Tasks Tracking List intel/intel-technology-enabling-for-openshift#28

Closed

14 tasks

uMartinXu mentioned this issue Apr 21, 2023

Task Checking List for 1.0.0 Release intel/intel-technology-enabling-for-openshift#14

Closed

6 tasks

tkatila mentioned this issue May 2, 2023

gpu: change 'none' allocation policy #1398

Merged

uniemimu closed this as completed in #1398 May 4, 2023

This was referenced May 5, 2023

gpu: improve documentation related to i915 resource and align with GAS' documentation #1407

Open

gpu: study and implement a way to limit Pod's i915 count #1408

Open

brgavino mentioned this issue Feb 8, 2024

Intel SGX Device Plugin returns error "permission denied" for OpenShift 4.13 intel/intel-technology-enabling-for-openshift#113

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claiming i915 resources in a single container has 2 cards accessible for Intel GPU Flex 140 #1377

Claiming i915 resources in a single container has 2 cards accessible for Intel GPU Flex 140 #1377

vbedida79 commented Apr 12, 2023 •

edited

Loading

tkatila commented Apr 12, 2023

eero-t commented Apr 12, 2023

uMartinXu commented Apr 12, 2023

vbedida79 commented Apr 12, 2023 •

edited

Loading

eero-t commented Apr 12, 2023

tkatila commented Apr 12, 2023 •

edited

Loading

uMartinXu commented Apr 12, 2023 •

edited

Loading

tkatila commented Apr 13, 2023

uMartinXu commented Apr 17, 2023

tkatila commented Apr 18, 2023

eero-t commented Apr 18, 2023 •

edited

Loading

uMartinXu commented Apr 28, 2023 •

edited

Loading

tkatila commented Apr 28, 2023

eero-t commented Apr 28, 2023

uMartinXu commented Apr 28, 2023

eero-t commented May 2, 2023 •

edited

Loading

uMartinXu commented May 2, 2023

uMartinXu commented May 2, 2023

tkatila commented May 3, 2023

eero-t commented May 3, 2023

eero-t commented May 3, 2023

uMartinXu commented May 4, 2023

mythi commented May 4, 2023

tkatila commented May 5, 2023

eero-t commented May 5, 2023

Claiming i915 resources in a single container has 2 cards accessible for Intel GPU Flex 140 #1377

Claiming i915 resources in a single container has 2 cards accessible for Intel GPU Flex 140 #1377

Comments

vbedida79 commented Apr 12, 2023 • edited Loading

Summary

Detail

Possible solutions

tkatila commented Apr 12, 2023

eero-t commented Apr 12, 2023

uMartinXu commented Apr 12, 2023

vbedida79 commented Apr 12, 2023 • edited Loading

eero-t commented Apr 12, 2023

tkatila commented Apr 12, 2023 • edited Loading

uMartinXu commented Apr 12, 2023 • edited Loading

tkatila commented Apr 13, 2023

uMartinXu commented Apr 17, 2023

tkatila commented Apr 18, 2023

eero-t commented Apr 18, 2023 • edited Loading

uMartinXu commented Apr 28, 2023 • edited Loading

tkatila commented Apr 28, 2023

eero-t commented Apr 28, 2023

uMartinXu commented Apr 28, 2023

eero-t commented May 2, 2023 • edited Loading

uMartinXu commented May 2, 2023

uMartinXu commented May 2, 2023

tkatila commented May 3, 2023

With shared-dev-num == 1 => no sharing

With shared-dev-num > 1 => sharing

Shared-dev-num selection

eero-t commented May 3, 2023

eero-t commented May 3, 2023

uMartinXu commented May 4, 2023

mythi commented May 4, 2023

tkatila commented May 5, 2023

eero-t commented May 5, 2023

vbedida79 commented Apr 12, 2023 •

edited

Loading

vbedida79 commented Apr 12, 2023 •

edited

Loading

tkatila commented Apr 12, 2023 •

edited

Loading

uMartinXu commented Apr 12, 2023 •

edited

Loading

eero-t commented Apr 18, 2023 •

edited

Loading

uMartinXu commented Apr 28, 2023 •

edited

Loading

eero-t commented May 2, 2023 •

edited

Loading