Skip to content

validate checkpoint is consistent with meta_to_tune flag #2736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

lijia19
Copy link

@lijia19 lijia19 commented May 15, 2025

Summary:
add a flag that do validation when load_checkpoint passed unexpected meta_to_tune flag.
i.e. if this flag is true but checkpoint is not in meta format, or
if this flag is flase but checkpoint is in meta format
Do validation early to avoid unexpected error later.

Differential Revision: D74784778

Copy link

pytorch-bot bot commented May 15, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/2736

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6647e8c with merge base 0a08c2f (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 15, 2025
@facebook-github-bot
Copy link

This pull request was exported from Phabricator. Differential Revision: D74784778

lijia19 added a commit to lijia19/torchtune that referenced this pull request May 15, 2025
Summary:

add a flag that do validation when load_checkpoint passed unexpected meta_to_tune flag.
i.e. if this flag is true but checkpoint is not in meta format, or
if this flag is flase but checkpoint is in meta format
Do validation early to avoid unexpected error later.

Differential Revision: D74784778
@lijia19 lijia19 force-pushed the export-D74784778 branch from 49128a6 to 36930f5 Compare May 15, 2025 20:55
@facebook-github-bot
Copy link

This pull request was exported from Phabricator. Differential Revision: D74784778

lijia19 added a commit to lijia19/torchtune that referenced this pull request May 15, 2025
Summary:
Pull Request resolved: pytorch#2736

add a flag that do validation when load_checkpoint passed unexpected meta_to_tune flag.
i.e. if this flag is true but checkpoint is not in meta format, or
if this flag is flase but checkpoint is in meta format
Do validation early to avoid unexpected error later.

Differential Revision: D74784778
@lijia19 lijia19 force-pushed the export-D74784778 branch from 36930f5 to f96afd2 Compare May 15, 2025 20:58
Summary:

add a flag that do validation when load_checkpoint passed unexpected meta_to_tune flag.
i.e. if this flag is true but checkpoint is not in meta format, or
if this flag is flase but checkpoint is in meta format
Do validation early to avoid unexpected error later.

Differential Revision: D74784778
@lijia19 lijia19 force-pushed the export-D74784778 branch from f96afd2 to 053b938 Compare May 15, 2025 21:41
@facebook-github-bot
Copy link

This pull request was exported from Phabricator. Differential Revision: D74784778

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 11.11111% with 8 lines in your changes missing coverage. Please review.

Project coverage is 59.93%. Comparing base (c8e670b) to head (053b938).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
torchtune/models/convert_weights.py 11.11% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2736      +/-   ##
==========================================
- Coverage   60.64%   59.93%   -0.71%     
==========================================
  Files         428      430       +2     
  Lines       26091    26435     +344     
==========================================
+ Hits        15823    15844      +21     
- Misses      10268    10591     +323     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@ebsmothers ebsmothers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should land this in OSS torchtune. If I understand correctly this is only needed for internal checkpointer implementations? In torchtune we have a specific class for Meta format checkpoints, so adding this functionality can cause confusion. Also it will just be zombie code. Personally I would recommend to implement this inside of the internal integration layer instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants