Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E2E Test TODO #788

Open
3 tasks
Rei1010 opened this issue Jan 8, 2025 · 2 comments
Open
3 tasks

E2E Test TODO #788

Rei1010 opened this issue Jan 8, 2025 · 2 comments
Assignees
Labels
kind/feature new function

Comments

@Rei1010
Copy link
Collaborator

Rei1010 commented Jan 8, 2025

TODO List:

  • Abstract E2E code to facilitate adaptation for different platform devices.
  • Add documentation for integrating new runners/devices.
  • Reduce redundant E2E executions by running tests based on PR changes.
@Rei1010 Rei1010 added the kind/feature new function label Jan 8, 2025
@Rei1010 Rei1010 self-assigned this Jan 8, 2025
@haitwang-cloud
Copy link
Contributor

@Rei1010
Copy link
Collaborator Author

Rei1010 commented Jan 10, 2025

@Rei1010 https://github.com/Project-HAMi/HAMi/actions/runs/12705629008/job/35417191171 PTAL

hi @haitwang-cloud I just check and found container vgpu-scheduler-extender in hami-scheduler not properly worked, below is the detailed error:

│ I0110 09:21:08.551571       1 devices.go:189] Loading device configuration from file: /device-config.yaml                                                                                                                                  ││ I0110 09:21:08.551578       1 devices.go:358] Reading config file from path: /device-config.yaml                                                                                                                                           │
│ I0110 09:21:08.552048       1 devices.go:368] Successfully read and parsed config file                                                                                                                                                     │
│ I0110 09:21:08.552065       1 devices.go:194] Loaded config: &{{nvidia.com/gpu nvidia.com/gpumem nvidia.com/gpucores nvidia.com/gpumem-percentage nvidia.com/priority false 0 0 1 10 1 1 false [{[A30] [[{1g.6gb 6144 4}] [{2g.12gb 12288  │
│ I0110 09:21:08.552143       1 devices.go:92] Initializing devices with configuration                                                                                                                                                       │
│ I0110 09:21:08.552150       1 devices.go:100] Initializing GPU device                                                                                                                                                                      │
│ I0110 09:21:08.552170       1 device.go:128] "initializing nvidia device" resourceName="nvidia.com/gpu" resourceMem="nvidia.com/gpumem" DefaultGPUNum=1                                                                                    │
│ I0110 09:21:08.552184       1 devices.go:109] GPU device initialized successfully                                                                                                                                                          │
│ I0110 09:21:08.552188       1 devices.go:100] Initializing MLU device                                                                                                                                                                      │
│ I0110 09:21:08.552192       1 devices.go:109] MLU device initialized successfully                                                                                                                                                          │
│ I0110 09:21:08.552194       1 devices.go:100] Initializing DCU device                                                                                                                                                                      │
│ I0110 09:21:08.552197       1 devices.go:109] DCU device initialized successfully                                                                                                                                                          │
│ I0110 09:21:08.552200       1 devices.go:100] Initializing Iluvatar device                                                                                                                                                                 │
│ I0110 09:21:08.552203       1 devices.go:109] Iluvatar device initialized successfully                                                                                                                                                     │
│ I0110 09:21:08.552205       1 devices.go:100] Initializing Mthreads device                                                                                                                                                                 │
│ I0110 09:21:08.552209       1 devices.go:109] Mthreads device initialized successfully                                                                                                                                                     │
│ I0110 09:21:08.552211       1 devices.go:100] Initializing Metax device                                                                                                                                                                    │
│ I0110 09:21:08.552214       1 devices.go:109] Metax device initialized successfully                                                                                                                                                        │
│ I0110 09:21:08.552218       1 devices.go:180] All devices initialized successfully                                                                                                                                                         │
│ I0110 09:21:08.552232       1 scheduler.go:63] "Initializing HAMi scheduler"                                                                                                                                                               │
│ I0110 09:21:08.552239       1 scheduler.go:71] "Scheduler initialized successfully"                                                                                                                                                        │
│ I0110 09:21:08.552243       1 scheduler.go:130] "Starting HAMi scheduler components"                                                                                                                                                       │
│ I0110 09:21:08.552693       1 reflector.go:289] Starting reflector *v1.Node (1h0m0s) from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229                                                                                    │
│ I0110 09:21:08.552703       1 reflector.go:325] Listing and watching *v1.Node from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229                                                                                           │
│ I0110 09:21:08.552916       1 reflector.go:289] Starting reflector *v1.Pod (1h0m0s) from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229                                                                                     │
│ I0110 09:21:08.552957       1 reflector.go:325] Listing and watching *v1.Pod from pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:229                                                                                            │
│ I0110 09:21:08.652834       1 shared_informer.go:341] caches populated                                                                                                                                                                     │
│ I0110 09:21:08.652899       1 shared_informer.go:341] caches populated                                                                                                                                                                     │
│ I0110 09:21:08.660970       1 route.go:42] Initializing Predicate Route                                                                                                                                                                    │
│ I0110 09:21:08.661283       1 metrics.go:252] Initializing metrics for scheduler                                                                                                                                                           │
│ panic: runtime error: invalid memory address or nil pointer dereference                                                                                                                                                                    │
│ [signal SIGSEGV: segmentation violation code=0x1 addr=0xc8 pc=0x14e8da6]                                                                                                                                                                   │
│                                                                                                                                                                                                                                            │
│ goroutine 149 [running]:                                                                                                                                                                                                                   │
│ github.com/Project-HAMi/HAMi/pkg/util.GetNode({0xc0002983d0, 0xa})                                                                                                                                                                         │
│     /k8s-vgpu/pkg/util/util.go:59 +0x26                                                                                                                                                                                                    │
│ github.com/Project-HAMi/HAMi/pkg/scheduler.(*Scheduler).RegisterFromNodeAnnotations(0xc0002ad540)                                                                                                                                          │
│     /k8s-vgpu/pkg/scheduler/scheduler.go:206 +0xbca                                                                                                                                                                                        │
│ created by main.start in goroutine 1                                                                                                                                                                                                       │
│     /k8s-vgpu/cmd/scheduler/main.go:78 +0xec

You can get the image from https://github.com/Project-HAMi/HAMi/actions/runs/12704412472/artifacts/2411324893 and helm chart from https://github.com/Project-HAMi/HAMi/actions/runs/12704412472/artifacts/2411318827

Feel free contact me if you had any issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature new function
Projects
None yet
Development

No branches or pull requests

2 participants