-
Notifications
You must be signed in to change notification settings - Fork 39
Cell as SKU in intra-vc scheduler #34
base: master
Are you sure you want to change the base?
Conversation
Add new interface for pod group, including scheduling request, interal status, and scheduled placement.
Add conversion, defaulting, and validation for v2 pod scheduling spec, update ExtractPodSchedulingSpec accordingly.
Convert AlgoAffinityGroup to PodGroupSchedulingStatus before intra VC scheduler.
db3d1ec
to
e56fe3b
Compare
Add cell type to level mapping for intra-vc.
Fix pod group placement type in scheduling status.
* adjust virtual and physical placements for scheduling results * update pod binding info for tree structure * expose pod group in api through webserver
* pick best affinity cells * convert cells to leaf cells in placement
Fix legacy unit tests.
Codecov Report
@@ Coverage Diff @@
## master #34 +/- ##
==========================================
- Coverage 88.46% 80.68% -7.79%
==========================================
Files 8 10 +2
Lines 1890 2205 +315
==========================================
+ Hits 1672 1779 +107
- Misses 166 334 +168
- Partials 52 92 +40
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Fix bugs in new v2 test cases, including: * calculate leaf cell numbers per pod when creating new PodGroupSchedulingStatus * get cell chain through within cell type (no higher than node) instead of leaf cell type * sort cells by virtual address in cluster view
9fd0560
to
ae26632
Compare
Fix comments.
pkg/api/v2/types.go
Outdated
|
||
// PodGroupMemberCellSpec represents cell spec for each pod in pod group. | ||
type PodGroupMemberCellSpec struct { | ||
CellType api.CellType `yaml:"cellType"` // cell type to be used in pod, differnt group can have differnt cell typs in the same chain, TODO: not support multiple chains yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
pkg/api/v2/types.go
Outdated
Name string `yaml:"name"` // pod group name | ||
WithinOneCell api.CellType `yaml:"withinOneCell"` // within cell for all cells in current group, e.g., two GPU-cell within one numa-cell, two node-cell within one rack-cell | ||
Pods []PodGroupMemberSpec `yaml:"pods"` // pod list for current group, any two pods are gang with each other in the groupS | ||
ChildGroups []*PodGroupSpec `yaml:"childGroups"` // child group in the hierarchical structure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pods and ChildGroups are order sensitive? #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
Only backtrace to parent pod group may not enough to find the whole pod root group placement if it exist in theory. We can add more pod group sorting criterias, such as if it has So given pod group trees are generally small trees with few nodes, backtrace to previous bound pod group (i.e. parent or left sibling node) should not too time consuming, which is enough to find the whole pod root group placement if it exist in theory. In reply to: 918971150 |
pkg/algorithm/sku_scheduler.go
Outdated
// virtual cluster view | ||
type skuClusterView []*skuCell | ||
|
||
// skuScheduler can schedule a pod group of pods with arbitrary cell types on a cluster view. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
pkg/algorithm/sku_scheduler.go
Outdated
// skuScheduler can schedule a pod group of pods with arbitrary cell types on a cluster view. | ||
// It first tries to place pod group without preemption, then enable preemption if schedule failed. | ||
// For each try, it will find within cells for each pod group, then find cells for each pods with better affinity. | ||
type skuScheduler struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed
pkg/algorithm/hived_algorithm.go
Outdated
cellChains map[string][]CellChain | ||
// map each level in a chain to the specific cell type name | ||
cellTypes map[CellChain]map[CellLevel]api.CellType | ||
// map each type name in a chain to the specific cell level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
pkg/algorithm/sku_scheduler.go
Outdated
placement, failedReason = PodGroupPlacement{}, "no matched cells in vc" | ||
|
||
cv := skuClusterView{nil} | ||
if level, ok := s.cellLevels[podGroup.WithinOneCell]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
pkg/algorithm/sku_scheduler.go
Outdated
|
||
func (s *skuScheduler) findCellsForPods( | ||
pods []apiv2.PodGroupMemberSpec, | ||
priority CellPriority, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
early return if pods nil or empty? #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
pkg/algorithm/sku_scheduler.go
Outdated
placement, failedReason = PodGroupPlacement{}, "no matched cells in vc" | ||
|
||
cv := skuClusterView{nil} | ||
if level, ok := s.cellLevels[podGroup.WithinOneCell]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pkg/algorithm/sku_scheduler.go
Outdated
} | ||
|
||
func (s *skuScheduler) createSkuClusterView( | ||
within *skuCell, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment the func return and its param relation #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
pkg/algorithm/sku_scheduler.go
Outdated
} | ||
|
||
func (s *skuScheduler) createSkuClusterView( | ||
within *skuCell, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
pkg/algorithm/sku_scheduler.go
Outdated
|
||
// skuCell type for selected level cell in virtual cluster view | ||
type skuCell struct { | ||
cell Cell // within cell level, maybe higher or lower than node level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
pkg/algorithm/sku_scheduler.go
Outdated
return false | ||
} | ||
|
||
func ancestorNoHigherThanLevel(withinLevel CellLevel, cell Cell) Cell { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
pkg/algorithm/sku_scheduler.go
Outdated
for _, c := range s.chainCellList[l] { | ||
if (within != nil && !isAncestor(within.cell, c)) || | ||
cv.contains(ancestorNoHigherThanLevel(withinLevel, c)) { | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will "cv.contains() continue" cause some cells with more free gpus are excluded in the final cv? #Closed
pkg/algorithm/sku_scheduler.go
Outdated
"have to use at least one bad cell %v", withinCell.address) | ||
} | ||
podPlacement := s.findCellsForSinglePod(pods[podIndex], priority, withinCell, allocatedCells) | ||
if podPlacement == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
schedule pods also need backtrace to previous bound pod? otherwise, we need to prove it will not miss solution #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we have changed podGroup.Pods
to podGroup.Pod
, all pods
here with different podIndex
are the same now
pkg/algorithm/sku_scheduler.go
Outdated
func (s *skuScheduler) findCellsForSinglePod( | ||
pod apiv2.PodGroupMemberSpec, | ||
priority CellPriority, | ||
within *skuCell, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
pkg/algorithm/sku_scheduler.go
Outdated
} | ||
podPlacement := s.findCellsForSinglePod(pods[podIndex], priority, withinCell, allocatedCells) | ||
if podPlacement == nil { | ||
withinCellIndex++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we have changed podGroup.Pods
to podGroup.Pod
, all pods
here with different podIndex
are the same, no need to reset withinCellIndex
anymore
pkg/algorithm/sku_scheduler.go
Outdated
freeCells, preemptibleCells = getFreeCellsAtLevel( | ||
within.cell, currLevel, priority, allocatedCells, freeCells, preemptibleCells) | ||
// free leaf cells will be used first (before preemptible leaf cells) | ||
freeCells = append(freeCells, preemptibleCells...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
pkg/algorithm/sku_scheduler.go
Outdated
} | ||
|
||
func (s *skuScheduler) findCellsForSinglePod( | ||
pod apiv2.PodGroupMemberSpec, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment for single pod we still best effort find nearest cells for it (old behaiour) #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
pkg/algorithm/sku_scheduler.go
Outdated
currLevel := s.cellLevels[pod.CellsPerPod.CellType] | ||
freeCells, preemptibleCells := CellList{}, CellList{} | ||
freeCells, preemptibleCells = getFreeCellsAtLevel( | ||
within.cell, currLevel, priority, allocatedCells, freeCells, preemptibleCells) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First use cell that does not need to be got from higher level cell split? #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sort cells when splitting in getFreeCellsAtLevel
so that cells need higher level split can be used later
pkg/algorithm/hived_algorithm.go
Outdated
// map each leaf cell type to all chains that contain this type | ||
leafCellChains map[string][]CellChain | ||
// map each within cell type to all chains that contain this type | ||
cellChains map[string][]CellChain | ||
// map each level in a chain to the specific cell type name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
pkg/algorithm/sku_scheduler.go
Outdated
// 1. cell health (prefer healthy) | ||
// 2. cell level (prefer lower) | ||
// 3. usedLeafCellNumAtPriority (prefer higher) | ||
// 4. usedLeafCellNumHigherPriority (prefer lower) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if greedy search, may need use same order to sort here? #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment usedLeafCellNumAtPriority different meaning in different crossPriorityPack case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
pkg/api/v2/types.go
Outdated
Name string `yaml:"name"` // pod group name | ||
WithinOneCell api.CellType `yaml:"withinOneCell"` // within cell for all cells in current group, e.g., two GPU-cell within one numa-cell, two node-cell within one rack-cell | ||
Pods []PodGroupMemberSpec `yaml:"pods"` // pod list for current group, any two pods are gang with each other in the groupS | ||
ChildGroups []*PodGroupSpec `yaml:"childGroups"` // child group in the hierarchical structure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems no need Pod list, and then pods search no need back trace #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to PodGroup.Pod
, reserve Pods field to flatten Pod for internal structure
Update according to comments.
Fix GitHub Action config.
Change `PodGroup.Pods` to `PodGroup.Pod` to resolve backtracing search issue.
* early return check for pods and withinOneCell * sortcells when splitting in getFreeCellsAtLevel * add comments for sorting usedLeafCellNumAtPriority
Update config in examples accordingly.
Skip build for legacy topology aware scheduler through build tag.
New cell as SKU feature in intra-vc scheduler, work items: