-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minio support for ModelDB #870
Comments
Hi, @Atharex! Currently the artifacts go directly to S3 via signed URLs. To my knowledge, Minio supports such calls, so it should work out of the box, but we have never tested against it. Are you getting some specific error? Maybe we can help figure out what's going on. |
Hi, @conradoverta! Probably there are not many changes needed for it. Could be I'm missing something in the configuration or there is no capability yet to specify a custom endpoint in the S3 configuration (like a local Minio installation). I've got the S3 artifact store type in my config.yaml configured like this:
And I get the following error: So it seems that ModelDB tries to use those credentials to save the data into AWS, instead of my local Minio installation. |
Oh, that is a fair point. I don't think we have any configuration for the custom endpoint. It should be easy to add a configuration and pass it around, but we don't have a Minio setup currently to test. Would you be willing to contribute a PR with that new configuration? We'd be happy to point you to useful information for this. Otherwise, I need to discuss with the team and put this in one of our coming sprints. |
OK, I guess I could give it a try :) Send me the information you have and I'll see what I can do. |
@Atharex : I believe modifying https://github.com/VertaAI/modeldb/blob/master/backend/src/main/java/ai/verta/modeldb/artifactStore/storageservice/S3Service.java#L34-L51 should get you unblocked. If it does n't, it will be helpful for me if you can share a few more lines from the stack trace. |
I've started from where you pointed me out and I got a working example up and running for my Minio installation. I was able to log datasets into Minio successfully with it. Now I also opened the pull request (#889) with my proposed changes. The changes also support setting the config:
|
Awesome! That was fast =) We'll take a look tomorrow. |
Thanks @Atharex for the request and the fix. Could you close the ticket if things are functional for you. |
My pleasure @ravishetye :) I would rather keep this ticket still open, as the support is not yet 100% (because of the still needed changes in the DB artifact storage path). You can show me where the changes should be made, but I cannot guarantee I will have time for another pull request in the near future :/ |
@ravishetye I got some time to take another look at this. Can someone from your side point out to me the code, which is creating the frontend links? |
@ravishetye I see you guys are doing loads of refactoring on the codebase. I presume you are planning for a new release, where Minio support will already be completed by someone from your side? |
Hi, @Atharex! Could you clarify what you mean by links? I might be missing something here. |
Might have been misled... I thought the DB stores direct links to the artifacts, which the frontend uses for downloads. I install ModelDB with this config:
Then I followed this example: https://github.com/VertaAI/modeldb/blob/master/client/workflows/demos/census-end-to-end-local-data-example.ipynb This is my postgres DB output when I tried your latest modeldb version
The URL request (seen in the network analyzer of the browser) when I click on the download artifact button in the ModelDB web UI seems correct: When I look up my local Minio instance, I see the artifacts correctly stored there and I can download them directly: Even "docker exec-ing" into the backend container and fetching the artifact links from there works. But somehow when I try to download that same file from the web UI I get an error message:
The webapp log seems fine...
Also the modeldb-backend logs don't look suspicious
But now I'm out of ideas how to further investigate... |
Is My current suspicion is that you have different DNS resolution for things running in the cluster than when you access from your other machine. What happens is that the webapp tries to fetch the URL Could you verify if you can resolve that hostname? You can usually do |
No Though that external URL should not be used by ModelDB at all, since all of it's traffic is happening inside of the kubernetes cluster, where it has access to the |
The problem here seems to be that ModelDB and your browser are seeing different hostnames for the same system. So when ModelDB asks minio for the link to the artifact, the link comes back with ModelDB's hostname perspective. When the backend sends to the webapp, the webapp tries to make the request and it fails because it's a different name. Would you mind configuring ModelDB to use the same hostname you use internally? |
Aha, I see your point! I thought that GET request I see in the traffic analyzer happens on the web app side, (the web app transfers the file from the artifact storage and then let's me download that cached copy), but it actually gives me a direct link to the storage from it's internally resolved DNS address where on the user side I want the externally defined DNS address: Got confused because deleting an artifact did not throw an error (later realized it's because the webapp invokes it's REST API to perform the step With this it deletes the entry from ModelDB, but leaves the artifact in MinIO intact (guess that is so by design also with other artifact stores? Or should the delete also happen inside the store?) I guess some URL rewriting would need to take place to correctly resolve address handling on the web UI for this particular use-case (an external storage service, which has both an internal (cluster) and external (ingress) DNS name). Maybe an optional "AlternativeStoreURL" parameter supplied in the ModelDB configuration file to rewrite the generated links on the webapp side? Just a thought... Not sure how other projects handle similar situations. Configuring ModelDB to the external name might not be easy, as there is a port in the internal service name and I would not be able to CNAME an external entry onto an internal address with a port, if I reconfigured my internal kubernetes DNS resolver. |
We use the direct link because it's usually much faster (since their services are built for big downloads and uploads). I think adding an alternative base makes sense to me to simplify the process. Usually we handle this by adding the CNAME entries in the right place, but it might be a high barrier to use. If we pointed you to the right places for the change, would you be willing to contribute a PR with support for this feature? It would be greatly appreciated! |
Sure, I'd go for it! This feature would help me out nicely. |
Great! @ad-47 @ravishetye could you share some pointers on how we could add a config field |
@Atharex Would setting the minio endpoint to |
Sadly no. There is a port in my service name and I cannot get DNS to resolve |
The challenge that Ravi correctly pointed out when I discussed this with him is that MDB would always use that alternative URL, even if the client was running inside the cluster. Would that be an issue for you? |
Would be cool if ModelDB team created an example for Minio so future users can just refer to the example |
Can the S3 storage adapter support a Minio backend?
The text was updated successfully, but these errors were encountered: