- 
                Notifications
    You must be signed in to change notification settings 
- Fork 63
Fix data race when migrating slots #324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Codecov Report❌ Patch coverage is  
 Additional details and impacted files@@             Coverage Diff              @@
##           unstable     #324      +/-   ##
============================================
+ Coverage     43.38%   47.08%   +3.69%     
============================================
  Files            37       45       +8     
  Lines          2971     4420    +1449     
============================================
+ Hits           1289     2081     +792     
- Misses         1544     2130     +586     
- Partials        138      209      +71     
 Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR aims to resolve migration inconsistencies by resetting shard migration state per-shard, ensuring slot-only migrations clear previous migration info, and serializing the MigrateSlot API to avoid concurrent conflicts.
- Clear migration state for source shard on slot-only migrations
- Serialize MigrateSlot API calls with a handler-level mutex and manual cluster lookup
- Adjust routing middleware and change loop error handling to continue on per-shard failures
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description | 
|---|---|
| store/cluster.go | Clear migration state before moving slots when slotOnlyis true | 
| server/route.go | Removed RequiredClustermiddleware from the migrate endpoint | 
| server/api/cluster.go | Added sync.Mutex, switched to manual cluster retrieval, and serializedMigrateSlotcalls | 
| controller/cluster.go | Replaced early returnwithcontinuein shard status update loop | 
Comments suppressed due to low confidence (4)
server/route.go:72
- The migrate route no longer includes RequiredNamespace and RequiredCluster middleware, which may bypass essential validation and context injection. Consider re-adding the appropriate middleware to enforce namespace and cluster context.
			clusters.POST("/:cluster/migrate", handler.Cluster.MigrateSlot)
server/api/cluster.go:50
- [nitpick] The field name muis generic; consider renaming it to something more descriptive likemigrateMuorclusterMuto clarify its purpose.
	mu sync.Mutex
store/cluster.go:210
- [nitpick] It may be helpful to add a comment explaining why ClearMigrateStateis only called for the source shard whenslotOnlyis true, to improve maintainability and clarity of intent.
		cluster.Shards[sourceShardIdx].ClearMigrateState()
server/api/cluster.go:123
- Using a single mutex on the handler serializes all MigrateSlot calls across all clusters, potentially creating a bottleneck. Consider using a more granular lock mechanism (e.g., per-cluster) to allow concurrent migrations on different clusters.
	handler.mu.Lock()
Uh oh!
There was an error while loading. Please reload this page.