Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document if clearml is a full model store or only model management #480

Open
Make42 opened this issue Feb 16, 2023 · 3 comments
Open

Document if clearml is a full model store or only model management #480

Make42 opened this issue Feb 16, 2023 · 3 comments

Comments

@Make42
Copy link
Contributor

Make42 commented Feb 16, 2023

After more playing around, it seems that ClearML Server does not store models or artifacts itself. These are stored somewhere else (e.g., AWS S3-bucket) or on my local machine and ClearML Server is only storing configuration parameters and previews (e.g., when the artifact is a pandas DataFrame). Is that right? Is there a way to save the models completely on the ClearML server?

@ainoam
Copy link
Collaborator

ainoam commented Feb 20, 2023

@Make42 ClearML is built in a modular fashion (nicely illustrated here) for maximum flexibility: the backend databases logs artifact/model references to let users use whichever storage solution (or combination of which) fits best their use case.

You can make use of ClearML's built-in file-server by setting the global default_output_uri parameter or specifically through the SDK interfaces for specifying output (e.g. output_uri in Task.init).

@Make42
Copy link
Contributor Author

Make42 commented Feb 21, 2023

@ainoam: Thank you for your reply, but I am not sure this answers my question: I am using the ClearML Hosted Service.

The page https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server/#what-is-clearml-server_ says

The ClearML Hosted Service is essentially the ClearML Server maintained for you.

but the page does not clarify unambiguously, whether only metadata of the models is stored on the ClearML Server or the models themselves are also stored. You write

the backend databases logs artifact/model references to let users use whichever storage solution (or combination of which) fits best their use case.

which sounds like only the metadata/references etc. to models/artifacts etc. is saved, not the artifacts and models themselves.

However, by experimenting, I found out that I am able to upload an artifact and get it back itself, not just its metadata/reference. On the other hand, so far, I have not been able to do the same for a model.

PS: I suspect you mean sdk.development.default_output_uri in https://clear.ml/docs/latest/docs/configs/clearml_conf/#sdkdevelopment, right?

@Make42
Copy link
Contributor Author

Make42 commented Feb 21, 2023

Ok, so basically, I had to set output_uri=True as an argument of the Task.init. I am sure, this is what you, @ainoam, were trying to tell me, but I did not understand what you meant, until I found out myself. The thread https://app.slack.com/client/TT9ATQXJ5/CTK20V944/thread/CTK20V944-1676902373.840529 details my painful journey.

Ironically, John C. also said at the beginning of that slack thread

configure Task.init(..., output_uri=True) and this will save the models to the clearml file server

but I did not understand this, because I would never have thought that output_uri=True means "the File Server of ClearML Hosted Server". I think this is a unfortunate design of the SDK interface. In fact, I believe using "None" and "True" here as possible argument and/or defaults is questionable.


All of this, this should be mentioned also at places where it is about about the external or non-external storage. Also it should be mentioned everywhere we talk about models or artifacts or others. By mentioning I do not mean details, but one sentence and a link to the detailed documentation.


The documentation of output_uri=True in the docstrings https://github.com/allegroai/clearml/blob/4ebe714165cfdacdcc48b8cf6cc5bddb3c15a89f/clearml/task.py#L334 only says

Default file server: True

but I do not see any mentioning that this would be internal file storage of the ClearML Hosted Server if I use the ClearML Hosted Server.

Similarly, this is not mentioned at https://clear.ml/docs/latest/docs/getting_started/ds/ds_second_steps/#models.
Also, I would not look at the page "Next Steps", once I am knee-deep in developmen.
The reason is, because once I am, I would look for the topical text that fit with the things I working on right now, and not a beginners tutorial.
Specifically, here, I would look in the "Model" section, not a section called "Next Steps".
In other words, anything that is mentioned in a section "Next Steps", should also be documented in the respective topical places.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants