Fix data race when migrating slots #324

greatsharp · 2025-07-07T14:24:49Z

update the migrate status of each shard separately,
migrate slot only should clear the migration info.
invoke MigrateSlot API serially to resolve issue 282

… should clear the migration info.

codecov-commenter · 2025-07-07T18:19:25Z

Codecov Report

❌ Patch coverage is 44.82759% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 47.08%. Comparing base (6c56470) to head (77bcab6).
⚠️ Report is 77 commits behind head on unstable.

Files with missing lines	Patch %	Lines
controller/cluster.go	0.00%	9 Missing ⚠️
server/api/cluster.go	76.47%	3 Missing and 1 partial ⚠️
store/cluster.go	0.00%	2 Missing ⚠️
server/route.go	0.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable     #324      +/-   ##
============================================
+ Coverage     43.38%   47.08%   +3.69%     
============================================
  Files            37       45       +8     
  Lines          2971     4420    +1449     
============================================
+ Hits           1289     2081     +792     
- Misses         1544     2130     +586     
- Partials        138      209      +71

Flag	Coverage Δ
unittests	`47.08% <44.82%> (+3.69%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull Request Overview

This PR aims to resolve migration inconsistencies by resetting shard migration state per-shard, ensuring slot-only migrations clear previous migration info, and serializing the MigrateSlot API to avoid concurrent conflicts.

Clear migration state for source shard on slot-only migrations
Serialize MigrateSlot API calls with a handler-level mutex and manual cluster lookup
Adjust routing middleware and change loop error handling to continue on per-shard failures

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
store/cluster.go	Clear migration state before moving slots when `slotOnly` is true
server/route.go	Removed `RequiredCluster` middleware from the migrate endpoint
server/api/cluster.go	Added `sync.Mutex`, switched to manual cluster retrieval, and serialized `MigrateSlot` calls
controller/cluster.go	Replaced early `return` with `continue` in shard status update loop

Comments suppressed due to low confidence (4)

server/route.go:72

The migrate route no longer includes RequiredNamespace and RequiredCluster middleware, which may bypass essential validation and context injection. Consider re-adding the appropriate middleware to enforce namespace and cluster context.

			clusters.POST("/:cluster/migrate", handler.Cluster.MigrateSlot)

server/api/cluster.go:50

[nitpick] The field name mu is generic; consider renaming it to something more descriptive like migrateMu or clusterMu to clarify its purpose.

	mu sync.Mutex

store/cluster.go:210

[nitpick] It may be helpful to add a comment explaining why ClearMigrateState is only called for the source shard when slotOnly is true, to improve maintainability and clarity of intent.

		cluster.Shards[sourceShardIdx].ClearMigrateState()

server/api/cluster.go:123

Using a single mutex on the handler serializes all MigrateSlot calls across all clusters, potentially creating a bottleneck. Consider using a more granular lock mechanism (e.g., per-cluster) to allow concurrent migrations on different clusters.

	handler.mu.Lock()

controller/cluster.go

server/route.go

greatsharp added 3 commits July 1, 2025 21:31

update the migrate status of each shard separately, migrate slot only…

7e7e321

… should clear the migration info.

Merge branch 'apache:unstable' into unstable

e2d08dc

invoke MigrateSlot API serially to resolve issue 282

cc0fa92

PragmaTwice requested a review from Copilot July 11, 2025 06:46

Copilot AI reviewed Jul 11, 2025

View reviewed changes

controller/cluster.go Outdated Show resolved Hide resolved

greatsharp added 2 commits July 12, 2025 16:42

Merge branch 'apache:unstable' into unstable

e4667b5

allow concurrent migrations on different clusters.

77bcab6

git-hulk reviewed Jul 13, 2025

View reviewed changes

server/route.go Show resolved Hide resolved

greatsharp requested a review from git-hulk July 14, 2025 02:50

git-hulk approved these changes Jul 14, 2025

View reviewed changes

git-hulk changed the title ~~fix migration issue~~ Fix data race when migrating slots Jul 14, 2025

git-hulk merged commit ab9f92e into apache:unstable Jul 14, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix data race when migrating slots #324

Fix data race when migrating slots #324

Uh oh!

greatsharp commented Jul 7, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Jul 7, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix data race when migrating slots #324

Fix data race when migrating slots #324

Uh oh!

Conversation

greatsharp commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greatsharp commented Jul 7, 2025 •

edited

Loading

codecov-commenter commented Jul 7, 2025 •

edited

Loading