-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download models with pooch #276
Download models with pooch #276
Conversation
…s should belong with the custom model download function)
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## dev #276 +/- ##
==========================================
+ Coverage 39.72% 40.55% +0.83%
==========================================
Files 33 33
Lines 4179 4167 -12
==========================================
+ Hits 1660 1690 +30
+ Misses 2519 2477 -42 ☔ View full report in Codecov by Sentry. |
If/when we go ahead with #276, we'll need to make sure |
Overall this is great!
That's good to know, and we can address this in a follow up.
I think that makes sense. I need to think about this a bit more because
|
micro_sam/util.py
Outdated
@@ -177,7 +131,6 @@ def _available_devices(): | |||
def get_sam_model( | |||
model_type: str = _DEFAULT_MODEL, | |||
device: Optional[str] = None, | |||
checkpoint_path: Optional[Union[str, os.PathLike]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep this. Merging this into get_custom_sam_model
would be confusing and I don't see a big issue keeping it here. See my main comment for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I have restored the checkpoint_path
keyword argument.
I will need you to test this (because all the example scripts involving checkpoint_path rely on data you personally have). The _get_checkpoint
function is not covered by any of the existing tests, so it's safe to say we're not testing anything involving checkpoint_path
right now.
Could you please run:
examples/annotator_with_custom_model.py
examples/finetuning/use_finetuned_model.py
examples/use_as_library/instance_segmentation.py
It would be great if we could upload the data and custom weights for examples/annotator_with_custom_model.py
to zenodo or similar - or even make a similar example but with smaller 2d data and model weights. Ideally then anybody could run this example as a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @GenevieveBuckley,
the models are all available via zenodo already. See https://github.com/computational-cell-analytics/micro-sam/blob/master/micro_sam/util.py#L46-L49. (I agree though that we should list this in the doc, I have added that to #249)
and I changed the examples in #280 so that all the examples use sample data.
The
_get_checkpoint
function is not covered by any of the existing tests, so it's safe to say we're not testing anything involvingcheckpoint_path
right now.
I don't think that's true either since merging #280. This now uses a custom checkpoint. But you would need to merge the current dev
branch in here first to get those tests.
I will still go ahead and test the examples with this branch.
micro_sam/util.py
Outdated
@@ -203,7 +156,7 @@ def get_sam_model( | |||
Returns: | |||
The segment anything predictor. | |||
""" | |||
checkpoint = _get_checkpoint(model_type, checkpoint_path) | |||
checkpoint = MODELS.fetch(model_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just add something like this to support checkpoint_path
:
if checkpoint_path is None:
checkpoint = MODELS.fetch(model_type)
else:
checkpoint = checkpoint_path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I don't see any problem with mixing up hashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused - if the weights are different, the hash value for the object should be different too, right? So how can anybody possibly do some extra fine tuning on a particular model, save the new weights, and then use checkpoint_path and give it the hash value stored in the pooch registry?
I might misunderstand what is actually happening with checkpoint_path
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I should have written a better description here. I will try to explain what happens if a checkpoint_path
is passed or not:
- if
checkpoint_path
is given: this is a local path to sam weigths. E.g. if a user has fine-tuned on their own data. In this case we don't download any weights but initialize the model from these local weights. We do not check the weight file against any hashes. - if
checkpoint_path
is not given: we download the SAM weights corresponding tomodel_type
. (I agree that this name is confusing, see comment below). We can check their hash.
So how can anybody possibly do some extra fine tuning on a particular model, save the new weights, and then use checkpoint_path and give it the hash value stored in the pooch registry?
When finetuning (or passing any other weights via checkpoint_path
) we don't check the weights against the hash values in the pooch registry. From what I understand the current code is doing exactly what I described, since we only use the model registry in the case where checkpoint_path is None
, and only then we actually check the hashes.
Hi @GenevieveBuckley , get_sam_modelI think we should keep the (optional)
So these two functions do something conceptually different, and it would also complicate the implementation if
We don't want to support downloading weights from arbitrary locations, but only for known models (either original SAM or our finetuned ones). So I don't really see any problems with hash validation here. I hope that makes sense to you @GenevieveBuckley. test coverage and example scripts
|
Hi @GenevieveBuckley, |
micro_sam/util.py
Outdated
os.makedirs(_CHECKPOINT_FOLDER, exist_ok=True) | ||
_download(checkpoint_url, checkpoint_path, model_type) | ||
os.makedirs(microsam_cachedir()/"models", exist_ok=True) | ||
pooch.retrieve(url=checkpoint_url, known_hash=models().registry.get(model_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does known_hash
need to be None here (i.e. unknown)? I don't quite understand how someone can do fine tuning of model weights, but have the hash stay unchanged (which is I think what would have to happen for what's written here to make any sense).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think I don't fully understand what pooch
is doing. I will check this out and follow up.
micro_sam/util.py
Outdated
# Our custom model types have a suffix "_...". This suffix needs to be stripped | ||
# before calling sam_model_registry. | ||
model_type_ = model_type[:5] | ||
assert model_type_ in ("vit_h", "vit_b", "vit_l", "vit_t") | ||
if model_type == "vit_t" and not VIT_T_SUPPORT: | ||
if model_type_ == "vit_t" and not VIT_T_SUPPORT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems possible that in the future people might fine tune models based off vit_t
, so we'd have model_type="vit_t_em"
, etc.
I find the model_type
/model_type_
distinction here confusing, I'd kinda like to rename model_type
-> model_name
instead. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fully agree, we should have a better distinction here and the current distinction is confusing. Let's do what you suggest and rename model_type -> model_name
and model_type_ -> model_type
micro_sam/util.py
Outdated
@@ -219,7 +222,7 @@ def get_sam_model( | |||
sam = sam_model_registry[model_type_](checkpoint=checkpoint) | |||
sam.to(device=device) | |||
predictor = SamPredictor(sam) | |||
predictor.model_type = model_type | |||
predictor.model_type = model_type_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct.
The most important next step is to check that Coping this comment from above, so it doesn't get lost/hidden if I make updates to that section of the code.
|
Yep, you're right. I think I misunderstood a few things about how |
…l_type in get_sam_model function
Update pooch download, rename model_type->model_name in get_sam_model
Thanks @GenevieveBuckley! This cleans up things significantly and I hope now the usage of the |
c2a4e54
into
computational-cell-analytics:dev
I've made a start on #36
get_sam_model
. I think it makes more sense to have custom weights of any sort go through theget_custom_sam_model
function (I haven't touched this function yet), and also if we download weights from different checkpoints then the hash validation is going to break.examples/annotator_with_custom_model.py
)