Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(migration): improve job progress logging in MigrateStateTree #329

Merged
merged 7 commits into from
Dec 3, 2024

Conversation

virajbhartiya
Copy link
Member

reference to filecoin-project/lotus#12732
This pull request includes an update to the logging format in the MigrateStateTree function to improve clarity during the migration process.

Logging improvements:

  • builtin/v9/migration/top.go: Changed the log message to include the percentage of jobs completed and rephrased the message for better readability.

@codecov-commenter
Copy link

codecov-commenter commented Nov 28, 2024

Codecov Report

Attention: Patch coverage is 0% with 9 lines in your changes missing coverage. Please review.

Project coverage is 3.39%. Comparing base (111b58c) to head (c3f3a34).

Files with missing lines Patch % Lines
migration/runner.go 0.00% 9 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #329   +/-   ##
=======================================
  Coverage    3.39%    3.39%           
=======================================
  Files         662      662           
  Lines      177818   177816    -2     
=======================================
  Hits         6041     6041           
+ Misses     169928   169926    -2     
  Partials     1849     1849           
Files with missing lines Coverage Δ
migration/runner.go 0.00% <0.00%> (ø)

@rvagg
Copy link
Member

rvagg commented Nov 29, 2024

@virajbhartiya do this to builtin/v16/migration/top.go instead of v9. Actors v14 is for network version 25, the next one (which hasn't been scheduled), so it'll actually be used then. v9 on mainnet was in 2022 for nv17.

After each network upgrade, one of the tasks performed is that someone (usually @rjan90) opens a PR here that copies the previous actors to a new version and then copies the most basic migration, without any of the custom stuff that was needed for that last network upgrade. You can see that for v16 here: #325

@rvagg
Copy link
Member

rvagg commented Nov 29, 2024

Actually, the later migrations run through migration/runner.go, builtin/v16/migration/top.go will be responsible for calling it but the runner is in one place and that's where the logging happens so that's the file you should edit.

Comment on lines 402 to 405
percentComplete := float64(doneNow) / float64(jobsNow) * 100

log.Log(rt.INFO, "Performing migration: %d of %d jobs complete (%.1f%%, %.0f/s)",
doneNow, jobsNow, percentComplete, rate)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you watch a migration happen using lotus-shed you might find that percentComplete is not helpful and we need to remove it. I hoped we could do this and my original issue I think said we should, but watching it progress myself we are adding jobs at the same time as completing them, so we never know what "complete" is. I don't think we have an easy way to pre-calculate the number of jobs. So, unfortunately, the main benefit of this logging is simply letting people know that it's happening and that their node isn't stuck.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will something like this work

log.Log(rt.INFO, "Performing migration: %s of %s jobs processed (%s/s) [%v elapsed]", doneStr, jobsStr, rateStr, elapsed.Round(time.Second))

even though the done and total jobs will be changing, it will still show that the node isn't stuck and working just fine.

Comment on lines 115 to 120
jobsStr := fmt.Sprintf("%d", jobsNow)
doneStr := fmt.Sprintf("%d", doneNow)
rateStr := fmt.Sprintf("%.0f", rate)

log.Log(rt.INFO, "Performing migration: %s of %s jobs processed (%s/s) [%v elapsed]",
doneStr, jobsStr, rateStr, elapsed.Round(time.Second))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR is looking good @virajbhartiya but why do we have the separate fmt.Sprintf's for the numbers and not just using the log.Log for that? If we're not pulling apart the Log call manually in lotus then they can be arbitrary values so numbers should be fine, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right that could be done directly, thanks!, updated it

Copy link
Member

@rvagg rvagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, I'd like to see output before merging though

migration/runner.go Outdated Show resolved Hide resolved
Copy link
Member

@rvagg rvagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice 👌

@rvagg rvagg merged commit 32f613e into filecoin-project:master Dec 3, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ☑️ Done (Archive)
Development

Successfully merging this pull request may close these issues.

3 participants