-
Notifications
You must be signed in to change notification settings - Fork 39
hived don't aware gpu topology #35
Comments
Note we only best effort to Topology-Aware Intra-VC Scheduling, see https://github.com/microsoft/hivedscheduler/blob/master/example/feature/README.md#topology-aware-intra-vc-scheduling Other gpus may already allocated to other pods |
we are working on an update to provide an option to enforce a job to honor the gpu topology (if the job chooses to) |
|
When submit these current 2 pods, do you have any other previous pods running other GPUs (they may complete now)? BTW, could you kill all these pods on the machine and try again to just submit 2 pods? |
The env is added by PAI rest server, instead of hived, see https://github.com/microsoft/pai/blob/b8fa58782addfc835ba813ad4dc261fff400ee4a/src/rest-server/src/models/v2/job/k8s.js#L653 Hived only generate the annotations. BTW, the NVIDIA_VISIBLE_DEVICES should generally match the GPU index showed by nvidia-smi. |
i don't use PAI,so need to add NVIDIA_VISIBLE_DEVICES env in pod templates. Yeah the value of hivedscheduler.microsoft.com/pod-leaf-cell-isolation in annotations accord to the gpu index showed by nvidia-smi |
@fanyangCS maybe we should also add hived user doc for users who do not use PAI, such as tell them set the |
Sure. Can you update the document? |
May I know which solution you use? |
my mistake |
run mpijob on p4 node in kubernetes1.11,one gpu per pod。
the p4 gpu topology as fellow:
the worker-0 pod see gpu as fellow:
the worker1 pod see gpu as fellow:
the hived config as fellow:
apiVersion: v1
kind: ConfigMap
metadata:
name: hivedscheduler-config
namespace: kube-system
data:
policy.cfg : |
{
"kind": "Policy",
"apiVersion": "v1",
"extenders": [
{
"urlPrefix": "http://10.220.187.143:30096/v1/extender",
"filterVerb": "filter",
"preemptVerb": "preempt",
"bindVerb": "bind",
"enableHttps": false,
"httpTimeout": 5000000000,
"nodeCacheCapable": true,
"ignorable": false,
"managedResources": [
{
"name": "hivedscheduler.microsoft.com/pod-scheduling-enable",
"ignoredByScheduler": true
}
]
}
]
}
hivedscheduler.yaml: |
webServerAddress: ":30096"
waitingPodSchedulingBlockMilliSec: 50
physicalCluster:
skuTypes:
V100:
gpu: 1
cpu: 6
memory: 6Gi
P4:
gpu: 1
cpu: 1
memory: 2Gi
cellTypes:
V100-PCIE:
childCellType: V100
childCellNumber: 4
P4-CPU:
childCellType: P4
childCellNumber: 2
V100-NODE:
childCellType: V100-PCIE
childCellNumber: 2
isNodeLevel: true
P4-NODE:
childCellType: P4-CPU
childCellNumber: 2
isNodeLevel: true
V100-NODE-POOL:
childCellType: V100-NODE
childCellNumber: 1
P4-NODE-POOL:
childCellType: P4-NODE
childCellNumber: 2
physicalCells:
- cellType: V100-NODE-POOL
cellChildren:
- cellAddress: tx-220-189-58.h.chinabank.com.cn
- cellType: P4-NODE-POOL
cellChildren:
- cellAddress: tx-220-189-26.h.chinabank.com.cn
- cellAddress: tx-220-189-33.h.chinabank.com.cn
the mpijob yaml as fellow:
apiVersion: kubeflow.org/v1alpha2
kind: MPIJob
metadata:
name: mpi-hived-cpu
namespace: kubeflow
spec:
cleanPodPolicy: Running
mpiReplicaSpecs:
Launcher:
replicas: 1
template:
metadata:
annotations:
hivedscheduler.microsoft.com/pod-scheduling-spec: |-
virtualCluster: vc1
priority: 1
leafCellType: P4
leafCellNumber: 1
affinityGroup:
name: mpi-hived-cpu
members:
- podNumber: 1
leafCellNumber: 1
- podNumber: 2
leafCellNumber: 1
spec:
containers:
- command:
- /bin/bash
- -c
- horovodrun -np 2 python /benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py
--model resnet50 --batch_size 32 --variable_update horovod --num_epochs=1
image: idockerhub.jd.com/zhouzijiang/horovod-training:v1.2
imagePullPolicy: Always
name: mpi-hived
resources:
limits:
cpu: "1"
memory: 2Gi
nodeSelector:
nvidia.com/accelerator: nvidia-tesla-p4
schedulerName: hivedscheduler
tolerations:
- effect: NoSchedule
key: dedicated
value: lambda-training
- effect: NoSchedule
key: nvidia.com/gpu
Worker:
replicas: 2
template:
metadata:
annotations:
hivedscheduler.microsoft.com/pod-scheduling-spec: |-
virtualCluster: vc1
priority: 1
leafCellType: P4
leafCellNumber: 1
affinityGroup:
name: mpi-hived-cpu
members:
- podNumber: 2
leafCellNumber: 1
spec:
containers:
- image: idockerhub.jd.com/zhouzijiang/horovod-training:v1.2
imagePullPolicy: Always
name: mpi-hived
resources:
limits:
cpu: "1"
memory: 2Gi
nvidia.com/gpu: "1"
hivedscheduler.microsoft.com/pod-scheduling-enable: 1
securityContext:
capabilities:
add:
- IPC_LOCK
nodeSelector:
nvidia.com/accelerator: nvidia-tesla-p4
schedulerName: hivedscheduler
serviceAccountName: mpi-operator
tolerations:
- effect: NoSchedule
key: dedicated
value: lambda-training
- effect: NoSchedule
key: nvidia.com/gpu
the pod can't allocate the GPU0 and GPU1 in p4 node
The text was updated successfully, but these errors were encountered: