Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need help using custom ckpt file from S3 #33

Open
DenimMazuki opened this issue Feb 14, 2023 · 7 comments
Open

Need help using custom ckpt file from S3 #33

DenimMazuki opened this issue Feb 14, 2023 · 7 comments

Comments

@DenimMazuki
Copy link

Hello!

I'm trying to use a custom ckpt to deploy to banana. My file is in S3 and I tried setting the CHECKPOINT_URL ARG in the Dockerfile with no luck (looks like the default stability weight got loaded instead of my ckpt in the S3 bucket).

I tried setting MODEL_URL to the s3 location as well, and not seeing too much luck either (it reports a tar error of the ckpt file from s3).

Am I approaching this the wrong way? I digged around and saw there's code to convert ckpt to diffuser format (and use it while building on banana). Would appreciate some guidance, thank you! 🙏🏻

@gadicc
Copy link
Collaborator

gadicc commented Feb 15, 2023

Hey, welcome!

You're totally on the right track... this should be working with CHECKPOINT_URL. Are you able to provide the full build logs so I can see exactly where it's getting stuck?

Two most common reasons:

  1. It couldn't convert the checkpoint file. Sometimes diffusers convert script fails on certain checkpoints. Newer versions are more versatile. What version of diffusers-api are you using? There's a link in the readme on how to use the v1 prerelease which will probably work better.

  2. Wrong S3 URL, but since it managed to correctly download with MODEL_URL, don't think this is it. It's possible the checkpoint download code doesn't have the full S3 support, I need to double check this.

Just before a long international flight but will check in tomorrow. You can send the log in the meantime if you have it 🙏

@DenimMazuki
Copy link
Author

Thank you for the response Gadi, hope the flight treated you well :).

Re: 1, I think I am using the v1 diffuser-api (the Dockerfile I'm using is from this repo). A stupid reason I could think of could be difference between the file extension being .pth vs. .ckpt but I tried using both format to no avail

Re:2, I noticed that copying the URI from S3 gets a URI that looks like s3://<bucket-name>/model.pth while the Dockerfile expects something like s3:///<bucket-name>/model.pth (with the difference being the extra /). Is that the right observation? (I noticed the download taking place in MODEL_URL when I added the extra /).

This is the log when I set the CHECKPOINT_URL parameter, but left MODEL_URL flag unset (left it alone per the Dockerfile):

You've triggered a build + deploy on Banana. It may take ~1 hr to complete. Thanks for your patience. 
Waiting for logs...

SUCCESS: Git Authorization
SUCCESS: Build Started
SUCCESS: Build Finished... Running optimizations

SUCCESS: Model Registered

Your model was updated and is now deployed!

This is when I set the MODEL_URL to the same s3 path as CHECKPOINT_URL:

You've triggered a build + deploy on Banana. It may take ~1 hr to complete. Thanks for your patience. 
Waiting for logs...

SUCCESS: Git Authorization
SUCCESS: Build Started
ERROR: Build Failed - Logs follow:

{"stream":"Step 1/38 : ARG FROM_IMAGE=\"gadicc/diffusers-api\""}
{"stream":"\n"}
{"stream":"Step 2/38 : FROM ${FROM_IMAGE} as base"}
{"stream":"\n"}
{"status":"Pulling from gadicc/diffusers-api","id":"latest"}
{"stream":"Step 31/38 : RUN python3 download.py"}
{"stream":" ---\u003e Running in f48408c6849a\n"}
{"stream":"download_model {'model_url': 's3:///<bucket>/model.pth', 'model_id': 'stabilityai/stable-diffusion-2-1-base', 'model_revision': 'fp16', 'hf_model_id': ''}\n{'normalized_model_id': 'models--stabilityai--stable-diffusion-2-1-base--fp16'}\nself.endpoint_url None\nDownloading s3:///<bucket>/model.pth to /root/.cache/diffusers-api/model.pth...\n"}
{"stream":"\u001b[91m\rDownloading:   0%|          | 0.00/2.13G [00:00\u003c?, ?B/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:   0%|          | 262k/2.13G [00:00\u003c1:22:25, 431kB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:   0%|          | 1.31M/2.13G [00:00\u003c15:59, 2.22MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:   0%|          | 6.03M/2.13G [00:00\u003c03:07, 11.3MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:   1%|          | 16.0M/2.13G [00:00\u003c01:08, 30.9MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:   2%|▏         | 33.3M/2.13G [00:01\u003c00:32, 65.2MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:   2%|▏         | 51.1M/2.13G [00:01\u003c00:22, 91.7MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:   3%|▎         | 67.9M/2.13G [00:01\u003c00:19, 105MB/s] \u001b[0m"}
{"stream":"\u001b[91m\rDownloading:   4%|▍         | 80.2M/2.13G [00:01\u003c00:23, 88.9MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:   5%|▍         | 99.1M/2.13G [00:01\u003c00:18, 111MB/s] \u001b[0m"}
{"stream":"\u001b[91m\rDownloading:   6%|▌         | 117M/2.13G [00:01\u003c00:15, 128MB/s] \u001b[0m"}
{"stream":"\u001b[91m\rDownloading:   6%|▌         | 132M/2.13G [00:01\u003c00:15, 128MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:   7%|▋         | 146M/2.13G [00:01\u003c00:18, 108MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:   7%|▋         | 160M/2.13G [00:02\u003c00:17, 114MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:   8%|▊         | 176M/2.13G [00:02\u003c00:15, 125MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:   9%|▉         | 189M/2.13G [00:02\u003c00:15, 122MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:   9%|▉         | 202M/2.13G [00:02\u003c00:15, 123MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  10%|█         | 215M/2.13G [00:02\u003c00:16, 115MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  11%|█         | 228M/2.13G [00:02\u003c00:16, 117MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  11%|█▏        | 245M/2.13G [00:02\u003c00:14, 129MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  12%|█▏        | 263M/2.13G [00:02\u003c00:13, 142MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  13%|█▎        | 278M/2.13G [00:02\u003c00:14, 129MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  14%|█▎        | 291M/2.13G [00:03\u003c00:17, 107MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  14%|█▍        | 306M/2.13G [00:03\u003c00:15, 118MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  15%|█▌        | 323M/2.13G [00:03\u003c00:13, 130MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  16%|█▌        | 338M/2.13G [00:03\u003c00:13, 135MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  17%|█▋        | 352M/2.13G [00:03\u003c00:16, 107MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  17%|█▋        | 364M/2.13G [00:03\u003c00:17, 103MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  18%|█▊        | 376M/2.13G [00:03\u003c00:17, 103MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  19%|█▊        | 395M/2.13G [00:04\u003c00:13, 124MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  19%|█▉        | 413M/2.13G [00:04\u003c00:12, 133MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  20%|██        | 427M/2.13G [00:04\u003c00:14, 115MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  21%|██        | 439M/2.13G [00:04\u003c00:14, 115MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  21%|██▏       | 454M/2.13G [00:04\u003c00:13, 123MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  22%|██▏       | 472M/2.13G [00:04\u003c00:12, 138MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  23%|██▎       | 486M/2.13G [00:04\u003c00:12, 135MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  23%|██▎       | 500M/2.13G [00:04\u003c00:13, 124MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  24%|██▍       | 513M/2.13G [00:04\u003c00:13, 124MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  25%|██▍       | 533M/2.13G [00:05\u003c00:11, 143MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  26%|██▌       | 550M/2.13G [00:05\u003c00:10, 151MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  27%|██▋       | 566M/2.13G [00:05\u003c00:10, 150MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  27%|██▋       | 581M/2.13G [00:05\u003c00:10, 142MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  28%|██▊       | 599M/2.13G [00:05\u003c00:10, 151MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  29%|██▉       | 616M/2.13G [00:05\u003c00:09, 156MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  29%|██▉       | 616M/2.13G [00:05\u003c00:09, 161MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  30%|██▉       | 633M/2.13G [00:05\u003c00:09, 159MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  30%|███       | 650M/2.13G [00:05\u003c00:09, 160MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  31%|███       | 666M/2.13G [00:05\u003c00:10, 141MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  32%|███▏      | 683M/2.13G [00:06\u003c00:09, 148MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  33%|███▎      | 701M/2.13G [00:06\u003c00:09, 157MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  34%|███▍      | 720M/2.13G [00:06\u003c00:08, 166MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  35%|███▍      | 737M/2.13G [00:06\u003c00:08, 164MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  35%|███▌      | 754M/2.13G [00:06\u003c00:10, 137MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  35%|███▌      | 754M/2.13G [00:06\u003c00:11, 124MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  36%|███▌      | 771M/2.13G [00:06\u003c00:10, 132MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  37%|███▋      | 791M/2.13G [00:06\u003c00:09, 147MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  38%|███▊      | 810M/2.13G [00:06\u003c00:08, 159MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  39%|███▉      | 827M/2.13G [00:07\u003c00:09, 131MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  39%|███▉      | 841M/2.13G [00:07\u003c00:09, 135MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  40%|████      | 856M/2.13G [00:07\u003c00:09, 133MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  41%|████      | 873M/2.13G [00:07\u003c00:08, 141MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  42%|████▏     | 890M/2.13G [00:07\u003c00:08, 149MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  42%|████▏     | 905M/2.13G [00:07\u003c00:08, 148MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  43%|████▎     | 922M/2.13G [00:07\u003c00:07, 153MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  44%|████▍     | 937M/2.13G [00:07\u003c00:07, 150MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  45%|████▍     | 953M/2.13G [00:07\u003c00:07, 150MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  46%|████▌     | 972M/2.13G [00:07\u003c00:07, 161MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  46%|████▋     | 988M/2.13G [00:08\u003c00:07, 162MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  47%|████▋     | 1.00G/2.13G [00:08\u003c00:07, 154MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  48%|████▊     | 1.02G/2.13G [00:08\u003c00:07, 155MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  49%|████▉     | 1.04G/2.13G [00:08\u003c00:06, 166MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  50%|████▉     | 1.06G/2.13G [00:08\u003c00:06, 170MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  50%|█████     | 1.08G/2.13G [00:08\u003c00:06, 153MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  51%|█████     | 1.09G/2.13G [00:08\u003c00:06, 154MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  51%|█████     | 1.09G/2.13G [00:08\u003c00:06, 156MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  52%|█████▏    | 1.11G/2.13G [00:08\u003c00:06, 166MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  53%|█████▎    | 1.13G/2.13G [00:08\u003c00:05, 178MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  54%|█████▍    | 1.15G/2.13G [00:09\u003c00:05, 166MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  55%|█████▍    | 1.17G/2.13G [00:09\u003c00:06, 151MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  56%|█████▌    | 1.18G/2.13G [00:09\u003c00:06, 153MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  56%|█████▌    | 1.18G/2.13G [00:09\u003c00:06, 155MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  56%|█████▋    | 1.20G/2.13G [00:09\u003c00:06, 143MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  57%|█████▋    | 1.22G/2.13G [00:09\u003c00:06, 140MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  58%|█████▊    | 1.23G/2.13G [00:09\u003c00:06, 147MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  59%|█████▊    | 1.25G/2.13G [00:09\u003c00:05, 151MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  59%|█████▉    | 1.27G/2.13G [00:09\u003c00:05, 157MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  60%|██████    | 1.28G/2.13G [00:09\u003c00:05, 145MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  61%|██████    | 1.30G/2.13G [00:10\u003c00:06, 136MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  62%|██████▏   | 1.31G/2.13G [00:10\u003c00:05, 139MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  62%|██████▏   | 1.33G/2.13G [00:10\u003c00:05, 147MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  63%|██████▎   | 1.35G/2.13G [00:10\u003c00:04, 159MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  63%|██████▎   | 1.35G/2.13G [00:10\u003c00:04, 167MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  64%|██████▍   | 1.37G/2.13G [00:10\u003c00:04, 163MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  65%|██████▍   | 1.38G/2.13G [00:10\u003c00:05, 150MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  66%|██████▌   | 1.40G/2.13G [00:10\u003c00:04, 150MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  66%|██████▋   | 1.41G/2.13G [00:10\u003c00:04, 154MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  67%|██████▋   | 1.43G/2.13G [00:10\u003c00:04, 157MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  68%|██████▊   | 1.45G/2.13G [00:11\u003c00:04, 162MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  69%|██████▊   | 1.47G/2.13G [00:11\u003c00:04, 152MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  69%|██████▉   | 1.48G/2.13G [00:11\u003c00:04, 152MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  70%|███████   | 1.50G/2.13G [00:11\u003c00:03, 160MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  71%|███████   | 1.52G/2.13G [00:11\u003c00:03, 157MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  72%|███████▏  | 1.53G/2.13G [00:11\u003c00:03, 156MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  73%|███████▎  | 1.55G/2.13G [00:11\u003c00:03, 160MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  73%|███████▎  | 1.56G/2.13G [00:11\u003c00:03, 160MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  74%|███████▍  | 1.58G/2.13G [00:11\u003c00:03, 154MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  75%|███████▍  | 1.60G/2.13G [00:12\u003c00:03, 154MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  76%|███████▌  | 1.61G/2.13G [00:12\u003c00:03, 161MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  77%|███████▋  | 1.63G/2.13G [00:12\u003c00:03, 162MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  77%|███████▋  | 1.65G/2.13G [00:12\u003c00:02, 162MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  78%|███████▊  | 1.66G/2.13G [00:12\u003c00:03, 153MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  79%|███████▉  | 1.68G/2.13G [00:12\u003c00:03, 144MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  80%|███████▉  | 1.70G/2.13G [00:12\u003c00:02, 159MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  81%|████████  | 1.72G/2.13G [00:12\u003c00:02, 171MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  81%|████████▏ | 1.74G/2.13G [00:12\u003c00:02, 163MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  82%|████████▏ | 1.75G/2.13G [00:13\u003c00:02, 150MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  83%|████████▎ | 1.77G/2.13G [00:13\u003c00:02, 154MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  84%|████████▍ | 1.79G/2.13G [00:13\u003c00:02, 162MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  85%|████████▍ | 1.81G/2.13G [00:13\u003c00:01, 172MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  86%|████████▌ | 1.83G/2.13G [00:13\u003c00:02, 141MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  86%|████████▋ | 1.84G/2.13G [00:13\u003c00:02, 128MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  87%|████████▋ | 1.86G/2.13G [00:13\u003c00:01, 143MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  88%|████████▊ | 1.88G/2.13G [00:13\u003c00:01, 155MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  89%|████████▉ | 1.90G/2.13G [00:13\u003c00:01, 166MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  90%|████████▉ | 1.92G/2.13G [00:14\u003c00:01, 133MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  91%|█████████ | 1.93G/2.13G [00:14\u003c00:01, 134MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  92%|█████████▏| 1.95G/2.13G [00:14\u003c00:01, 146MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  92%|█████████▏| 1.97G/2.13G [00:14\u003c00:01, 156MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  93%|█████████▎| 1.99G/2.13G [00:14\u003c00:00, 148MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  94%|█████████▍| 2.00G/2.13G [00:14\u003c00:00, 138MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  95%|█████████▍| 2.02G/2.13G [00:14\u003c00:00, 148MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  96%|█████████▌| 2.04G/2.13G [00:14\u003c00:00, 154MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  96%|█████████▋| 2.05G/2.13G [00:15\u003c00:00, 156MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  97%|█████████▋| 2.07G/2.13G [00:15\u003c00:00, 148MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  98%|█████████▊| 2.09G/2.13G [00:15\u003c00:00, 160MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading:  99%|█████████▉| 2.11G/2.13G [00:15\u003c00:00, 166MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading: 100%|█████████▉| 2.12G/2.13G [00:15\u003c00:00, 163MB/s]\u001b[0m"}
{"stream":"\u001b[91m\rDownloading: 100%|██████████| 2.13G/2.13G [00:15\u003c00:00, 136MB/s]\n\u001b[0m"}
{"stream":"\u001b[91mzstd: /*stdin*\\: unsupported format \n\u001b[0m"}
{"stream":"\u001b[91mtar: Child returned status 1\ntar: Error is not recoverable: exiting now\n\u001b[0m"}
{"stream":"\u001b[91mTraceback (most recent call last):\n  File \"/api/download.py\", line 175, in \u003cmodule\u003e\n    download_model(\n  File \"/api/download.py\", line 75, in download_model\n\u001b[0m"}
{"stream":"\u001b[91m    subprocess.run(\n  File \"/opt/conda/envs/xformers/lib/python3.9/subprocess.py\", line 528, in run\n\u001b[0m"}
{"stream":"\u001b[91m    raise CalledProcessError(retcode, process.args,\n\u001b[0m"}
{"stream":"\u001b[91msubprocess.CalledProcessError: Command '['tar', '--use-compress-program=unzstd', '-C', '/root/.cache/diffusers-api/models--stabilityai--stable-diffusion-2-1-base--fp16', '-xvf', '/root/.cache/diffusers-api/model.pth']' returned non-zero exit status 2.\n\u001b[0m"}
{"stream":"\u001b[91mERROR conda.cli.main_run:execute(47): `conda run /bin/bash -c python3 download.py` failed. (See above for error)\n\u001b[0m"}
{"errorDetail":{"code":1,"message":"The command '/opt/conda/bin/conda run --no-capture-output -n xformers /bin/bash -c python3 download.py' returned a non-zero code: 1"},"error":"The command '/opt/conda/bin/conda run --no-capture-output -n xformers /bin/bash -c python3 download.py' returned a non-zero code: 1"}

Thank you again for your help 🙏🏻

@DenimMazuki
Copy link
Author

I might be looking at the wrong place, but is the correct flow in the download.py for the ckpt to go to this line?

Seems like my build is getting stuck here.

I could be looking at older version of the API (or misunderstanding something) but wanted to point it out just in case it's helpful.

Thank you again!

@gadicc
Copy link
Collaborator

gadicc commented Feb 17, 2023

Hey, @DenimMazuki! Thanks for your patience and detailed reporting and troubleshooting steps.

Ok great, you're indeed on the new v1, the correct repo, and, just all round doing everything right actually :) And yes, in our S3 URLs we expect a triple slash ("///") in the beginning unless you want to explicitly set an ENDPOINT_URL (which is the much less common case) - there are some very brief docs about that here.

And indeed, you want CHECKPOINT_URL and not MODEL_URL. I've thought about combining the two, but, since conversion is not necessarily 1:1 if you don't get the configuration correct, I decided against it, even though it would be a little easier and more intuitive in most (but not all) cases. I did also check that CHECKPOINT_URL is indeed using the updated S3 download code, so not expecting any problems here.

In short, all looks good. It's a pity banana doesn't give the full build logs on a successful build too, in case anything came up there. However, I'm recalling now that you said it's downloading the default stability weights... so here's my game plan:

  1. Could you please try setting the build-arg MODEL_ID to something unique, and see if that makes a difference?
  2. Failing that, could you send the full runtime logs from the call, just so I can see exactly what is happening on this side of the equation?
  3. Lastly, any chance you could give me access somewhere to the checkpoint file, so run the conversion locally and see if there's anything useful in the build output?

We'll get this working and thanks for your patience too over my travels.

@DenimMazuki
Copy link
Author

Hey @gadicc , thank you for the detailed response! I really appreciate the responses :)

Gameplan 1: I set the MODEL_ID into something unique and got a build error regarding downloading the repo on HF

2023-02-17T14:38:50.000Z You've triggered a build + deploy on Banana. It may take ~1 hr to complete. Thanks for your patience. 
Waiting for logs...

SUCCESS: Git Authorization
SUCCESS: Build Started
ERROR: Build Failed - Logs follow:

{"stream":"Step 1/38 : ARG FROM_IMAGE=\"gadicc/diffusers-api\""}
{"stream":"\n"}
{"stream":"Step 2/38 : FROM ${FROM_IMAGE} as base"}
{"stream":"\n"}
{"status":"Pulling from gadicc/diffusers-api","id":"latest"}
{"status":"Digest: sha256:a98c52386a23f0c68d159f23d9f0b4e88f2cfcd7c0c8f8833151ca7affc3831f"}
{"status":"Status: Image is up to date for gadicc/diffusers-api:latest"}
{"stream":" ---\u003e 8e20dd55e5a8\n"}
{"stream":"Step 3/38 : ENV FROM_IMAGE=${FROM_IMAGE}"}
{"stream":"\n"}
{"stream":" ---\u003e Using cache\n"}
{"stream":" ---\u003e b802fbdfbe20\n"}
{"stream":"Step 4/38 : ARG MODEL_ID=\"something-unique\""}
{"stream":"\n"}
{"stream":" ---\u003e Running in 74b835e75dbc\n"}
{"stream":" ---\u003e 3d116f08acc6\n"}
{"stream":"Step 5/38 : ENV MODEL_ID=${MODEL_ID}"}
{"stream":"\n"}
{"stream":" ---\u003e Running in e72f23377dd2\n"}
{"stream":" ---\u003e 7855c26e3dc3\n"}
{"stream":"Step 6/38 : ARG HF_MODEL_ID=\"\""}
{"stream":"\n"}
{"stream":" ---\u003e Running in 770b14ef4d74\n"}
{"stream":" ---\u003e 23a6d3c341da\n"}
{"stream":"Step 7/38 : ENV HF_MODEL_ID=${HF_MODEL_ID}"}
{"stream":"\n"}
{"stream":" ---\u003e Running in 1f08822a2b2a\n"}
{"stream":" ---\u003e a772cef0a9d8\n"}
{"stream":"Step 8/38 : ARG MODEL_PRECISION=\"fp16\""}
{"stream":"\n"}
{"stream":" ---\u003e Running in 8b32a482186f\n"}
{"stream":" ---\u003e c5c5f054e40c\n"}
{"stream":"Step 9/38 : ENV MODEL_PRECISION=${MODEL_PRECISION}"}
{"stream":"\n"}
{"stream":" ---\u003e Running in ab2ce97ea8df\n"}
{"stream":" ---\u003e 66f46c521857\n"}
{"stream":"Step 10/38 : ARG MODEL_REVISION=\"fp16\""}
{"stream":"\n"}
{"stream":" ---\u003e Running in 21a3fd0683b9\n"}
{"stream":" ---\u003e 2a4e359e948b\n"}
{"stream":"Step 11/38 : ENV MODEL_REVISION=${MODEL_REVISION}"}
{"stream":"\n"}
{"stream":" ---\u003e Running in 6d493d6fabd6\n"}
{"stream":" ---\u003e 3626502d2183\n"}
{"stream":"Step 12/38 : ARG MODEL_URL"}
{"stream":"\n"}
{"stream":" ---\u003e Running in 642d7dabba3f\n"}
{"stream":" ---\u003e 8dafdd6c3ce0\n"}
{"stream":"Step 13/38 : ENV MODEL_URL=${MODEL_URL}"}
{"stream":"\n"}
{"stream":" ---\u003e Running in 75c45f9dbfe1\n"}
{"stream":" ---\u003e 44595abc3f5d\n"}
{"stream":"Step 14/38 : ARG CHECKPOINT_URL=\"s3://<bucket>/model.pth\""}
{"stream":"\n"}
{"stream":" ---\u003e Running in 226f2a270f69\n"}
{"stream":" ---\u003e 94e6d58b35bb\n"}
{"stream":"Step 15/38 : ENV CHECKPOINT_URL=${CHECKPOINT_URL}"}
{"stream":"\n"}
{"stream":" ---\u003e Running in ec5d6c678ea2\n"}
{"stream":" ---\u003e ce840451980a\n"}
{"stream":"Step 16/38 : ARG CHECKPOINT_CONFIG_URL=\"\""}
{"stream":"\n"}
{"stream":" ---\u003e Running in 7a2d17387a33\n"}
{"stream":" ---\u003e 2680223e087e\n"}
{"stream":"Step 17/38 : ENV CHECKPOINT_CONFIG_URL=${CHECKPOINT_CONFIG_URL}"}
{"stream":"\n"}
{"stream":" ---\u003e Running in 9d887618e5a1\n"}
{"stream":" ---\u003e 81677619059e\n"}
{"stream":"Step 18/38 : ARG PIPELINE=\"ALL\""}
{"stream":"\n"}
{"stream":" ---\u003e Running in 8e8cb5bf2284\n"}
{"stream":" ---\u003e 9ab472dbbeae\n"}
{"stream":"Step 19/38 : ENV PIPELINE=${PIPELINE}"}
{"stream":"\n"}
{"stream":" ---\u003e Running in f032f4b08235\n"}
{"stream":" ---\u003e 06b307219eb6\n"}
{"stream":"Step 31/38 : RUN python3 download.py"}
{"stream":"\n"}
{"stream":" ---\u003e Running in b542319dca3f\n"}
{"stream":"\u001b[91mTraceback (most recent call last):\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py\", line 264, in hf_raise_for_status\n\u001b[0m"}
{"stream":"download_model {'model_url': '', 'model_id': 'something-unique', 'model_revision': 'fp16', 'hf_model_id': ''}\nloadModel {'model_id': 'something-unique', 'load': False, 'precision': 'fp16', 'revision': 'fp16'}\nDownloading model: something-unique (fp16)\nInitializing DPMSolverMultistepScheduler for something-unique...\n"}
{"stream":"\u001b[91m    response.raise_for_status()\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/requests/models.py\", line 1021, in raise_for_status\n\u001b[0m"}
{"stream":"\u001b[91m    raise HTTPError(http_error_msg, response=self)\nrequests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/something-unique/resolve/main/scheduler/scheduler_config.json\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n  File \"/api/diffusers/src/diffusers/configuration_utils.py\", line 326, in load_config\n\u001b[0m"}
{"stream":"\u001b[91m    config_file = hf_hub_download(\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py\", line 124, in _inner_fn\n\u001b[0m"}
{"stream":"\u001b[91m    return fn(*args, **kwargs)\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/huggingface_hub/file_download.py\", line 1105, in hf_hub_download\n\u001b[0m"}
{"stream":"\u001b[91m    metadata = get_hf_file_metadata(\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py\", line 124, in _inner_fn\n\u001b[0m"}
{"stream":"\u001b[91m    return fn(*args, **kwargs)\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/huggingface_hub/file_download.py\", line 1440, in get_hf_file_metadata\n\u001b[0m"}
{"stream":"\u001b[91m    hf_raise_for_status(r)\n  File \"/opt/conda/envs/xformers/lib/python3.9/site-packages/huggingface_hub/utils/_errors.py\", line 306, in hf_raise_for_status\n\u001b[0m"}
{"stream":"\u001b[91m    raise RepositoryNotFoundError(message, response) from e\nhuggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-63ef9189-26224b654059849d236317da)\n\nRepository Not Found for url: https://huggingface.co/something-unique/resolve/main/scheduler/scheduler_config.json.\nPlease make sure you specified the correct `repo_id` and `repo_type`.\nIf you are trying to access a private or gated repo, make sure you are authenticated.\nInvalid username or password.\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/api/download.py\", line 175, in \u003cmodule\u003e\n\u001b[0m"}
{"stream":"\u001b[91m    download_model(\n  File \"/api/download.py\", line 148, in download_model\n\u001b[0m"}
{"stream":"\u001b[91m    loadModel(\n  File \"/api/loadModel.py\", line 52, in loadModel\n\u001b[0m"}
{"stream":"\u001b[91m    scheduler = getScheduler(model_id, DEFAULT_SCHEDULER, not load)\n  File \"/api/getScheduler.py\", line 88, in getScheduler\n\u001b[0m"}
{"stream":"\u001b[91m    scheduler = initScheduler(MODEL_ID, scheduler_id, download)\n  File \"/api/getScheduler.py\", line 50, in initScheduler\n\u001b[0m"}
{"stream":"\u001b[91m    inittedScheduler = scheduler.from_pretrained(\n  File \"/api/diffusers/src/diffusers/schedulers/scheduling_utils.py\", line 134, in from_pretrained\n\u001b[0m"}
{"stream":"\u001b[91m    config, kwargs = cls.load_config(\n  File \"/api/diffusers/src/diffusers/configuration_utils.py\", line 341, in load_config\n\u001b[0m"}
{"stream":"\u001b[91m    raise EnvironmentError(\nOSError: something-unique is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'\nIf this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login`.\n\u001b[0m"}
{"stream":"\u001b[91mERROR conda.cli.main_run:execute(47): `conda run /bin/bash -c python3 download.py` failed. (See above for error)\n\u001b[0m"}
{"errorDetail":{"code":1,"message":"The command '/opt/conda/bin/conda run --no-capture-output -n xformers /bin/bash -c python3 download.py' returned a non-zero code: 1"},"error":"The command '/opt/conda/bin/conda run --no-capture-output -n xformers /bin/bash -c python3 download.py' returned a non-zero code: 1"}

GP 2: Failing the first, here's the runtime logs on the re-deploy (with stability weights)

17 Feb, 11:16:24
if not hasattr(tensorboard, "__version__") or LooseVersion(
17 Feb, 11:16:24
/opt/conda/envs/xformers/lib/python3.9/site-packages/torch/utils/tensorboard/__init__.py:4: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
17 Feb, 11:16:25
np.bool8: (False, True),
17 Feb, 11:16:25
/opt/conda/envs/xformers/lib/python3.9/site-packages/skimage/util/dtype.py:27: DeprecationWarning: `np.bool8` is a deprecated alias for `np.bool_`.  (Deprecated NumPy 1.24)
17 Feb, 11:16:25
{'NVIDIA_VISIBLE_DEVICES': 'GPU-9e60da3b-bece-d6ac-8c2d-66a80853c840', 'KUBERNETES_SERVICE_PORT_HTTPS': '443', 'AWS_S3_ENDPOINT_URL': '', 'KUBERNETES_SERVICE_PORT': '443', 'CONDA_EXE': '/opt/conda/bin/conda', '_CE_M': '', 'HOSTNAME': 'denimmazukibananadeployment0027d4a105d53d594b3093eaa817cc9wcdmc', 'AWS_DEFAULT_REGION': 'us-east-1', 'MODEL_REVISION': 'fp16', 'SAFETENSORS_FAST_GPU': '1', 'PWD': '/api', 'CONDA_ROOT': '/opt/conda', 'CONDA_PREFIX': '/opt/conda/envs/xformers', '_': '/opt/conda/envs/xformers/bin/python3', 'AWS_S3_DEFAULT_BUCKET': '<bucket>', 'SEND_URL': '', 'HOME': '/root', 'LANG': 'C.UTF-8', 'KUBERNETES_PORT_443_TCP': 'tcp://10.96.0.1:443', 'MODEL_URL': '', 'SIGN_KEY': '', 'CONDA_PROMPT_MODIFIER': '(xformers) ', 'AWS_SECRET_ACCESS_KEY': 'XXX', 'FROM_IMAGE': '', 'MODEL_ID': 'stabilityai/stable-diffusion-2-1-base', 'MODEL_PRECISION': 'fp16', '_CE_CONDA': '', 'CONDA_SHLVL': '2', 'SHLVL': '1', 'CHECKPOINT_CONFIG_URL': '', 'AWS_ACCESS_KEY_ID': 'XXX', 'KUBERNETES_PORT_443_TCP_PROTO': 'tcp', 'KUBERNETES_PORT_443_TCP_ADDR': '10.96.0.1', 'CONDA_PYTHON_EXE': '/opt/conda/bin/python', 'CHECKPOINT_URL': 's3:///<bucket>/model.pth', 'USE_DREAMBOOTH': '1', 'CONDA_DEFAULT_ENV': 'xformers', 'KUBERNETES_SERVICE_HOST': '10.96.0.1', 'LC_ALL': 'C.UTF-8', 'KUBERNETES_PORT': 'tcp://10.96.0.1:443', 'KUBERNETES_PORT_443_TCP_PORT': '443', 'HF_MODEL_ID': '', 'PATH': '/opt/conda/envs/xformers/bin:/opt/conda/condabin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin', 'PIPELINE': 'ALL', 'RUNTIME_DOWNLOADS': '0', 'CONDA_PREFIX_1': '/opt/conda', 'WANDB_REQUIRE_SERVICE': 'True'}
17 Feb, 11:16:26
None
17 Feb, 11:16:26
Initializing DPMSolverMultistepScheduler for stabilityai/stable-diffusion-2-1-base...
17 Feb, 11:16:26
Loading model: stabilityai/stable-diffusion-2-1-base (fp16)
17 Feb, 11:16:26
loadModel {'model_id': 'stabilityai/stable-diffusion-2-1-base', 'load': True, 'precision': 'fp16', 'revision': 'fp16'}
17 Feb, 11:16:26
2023-02-17 16:16:26.283631 {'type': 'init', 'status': 'start', 'container_id': 'denimmazukibananadeployment0027d4a105d53d594b3093eaa817cc9c6625', 'time': 1676650586284, 't': 406, 'tsl': 0, 'payload': {'device': 'NVIDIA A100-SXM4-40GB', 'hostname': 'denimmazukibananadeployment0027d4a105d53d594b3093eaa817cc9wcdmc', 'model_id': 'stabilityai/stable-diffusion-2-1-base', 'diffusers': '0.13.0.dev0'}}
17 Feb, 11:16:26
Initialized DPMSolverMultistepScheduler for stabilityai/stable-diffusion-2-1-base in 4ms
17 Feb, 11:16:31
2023-02-17 16:16:31.367417 {'type': 'init', 'status': 'done', 'container_id': 'denimmazukibananadeployment0027d4a105d53d594b3093eaa817cc9c6625', 'time': 1676650591367, 't': 5489, 'tsl': 5083, 'payload': {}}
17 Feb, 11:16:31
Loaded from disk in 3668 ms, to gpu in 1412 ms
17 Feb, 11:16:31
[15] [INFO] python: 3.9.15
17 Feb, 11:16:31
[15] [INFO] server: sanic, HTTP/1.1
17 Feb, 11:16:31
[15] [INFO] mode: production, single worker
17 Feb, 11:16:31
[15] [INFO] Goin' Fast @ http://0.0.0.0:8000
17 Feb, 11:16:31
[15] [INFO] Sanic v22.6.2
17 Feb, 11:16:31
[15] [INFO] packages: sanic-routing==22.3.0
17 Feb, 11:16:31
[15] [INFO] platform: Linux-5.15.0-52-generic-x86_64-with-glibc2.31
17 Feb, 11:16:31
[15] [INFO] Starting worker [15]
17 Feb, 11:16:31
- (sanic.access)[INFO][127.0.0.1:33902]: GET http://0.0.0.0:8000/  405 96
17 Feb, 11:16:39
{'always_normalize_model_id': None, 'normalized_model_id': 'stabilityai/stable-diffusion-2-1-base'}
17 Feb, 11:16:39
}
17 Feb, 11:16:39
"callInputs": {}
17 Feb, 11:16:39
},
17 Feb, 11:16:39
"prompt": "ukj person cooking"
17 Feb, 11:16:39
"modelInputs": {
17 Feb, 11:16:39
{
17 Feb, 11:16:39
pipeline.enable_xformers_memory_efficient_attention()
17 Feb, 11:16:39
2023-02-17 16:16:39.386354 {'type': 'inference', 'status': 'start', 'container_id': 'denimmazukibananadeployment0027d4a105d53d594b3093eaa817cc9c6625', 'time': 1676650599386, 't': 13508, 'tsl': 0, 'payload': {'startRequestId': None}}
17 Feb, 11:16:39
Initialized StableDiffusionPipeline for stabilityai/stable-diffusion-2-1-base in 1ms
17 Feb, 11:16:39
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
17 Feb, 11:16:44
0%|          | 0/50 [00:00<?, ?it/s]
  2%|▏         | 1/50 [00:00<00:44,  1.11it/s]
  6%|▌         | 3/50 [00:01<00:13,  3.56it/s]
 10%|█         | 5/50 [00:01<00:07,  5.97it/s]
 14%|█▍        | 7/50 [00:01<00:05,  8.22it/s]
 18%|█▊        | 9/50 [00:01<00:04, 10.18it/s]
 22%|██▏       | 11/50 [00:01<00:03, 11.81it/s]
 26%|██▌       | 13/50 [00:01<00:02, 13.11it/s]
 30%|███       | 15/50 [00:01<00:02, 14.12it/s]
 34%|███▍      | 17/50 [00:01<00:02, 14.88it/s]
 38%|███▊      | 19/50 [00:01<00:02, 15.44it/s]
 42%|████▏     | 21/50 [00:02<00:01, 15.84it/s]
 46%|████▌     | 23/50 [00:02<00:01, 16.13it/s]
 50%|█████     | 25/50 [00:02<00:01, 16.32it/s]
 54%|█████▍    | 27/50 [00:02<00:01, 16.44it/s]
 58%|█████▊    | 29/50 [00:02<00:01, 16.54it/s]
 62%|██████▏   | 31/50 [00:02<00:01, 16.59it/s]
 66%|██████▌   | 33/50 [00:02<00:01, 16.66it/s]
 70%|███████   | 35/50 [00:02<00:00, 16.71it/s]
 74%|███████▍  | 37/50 [00:03<00:00, 16.73it/s]
 78%|███████▊  | 39/50 [00:03<00:00, 16.75it/s]
 82%|████████▏ | 41/50 [00:03<00:00, 16.77it/s]
 86%|████████▌ | 43/50 [00:03<00:00, 16.75it/s]
 90%|█████████ | 45/50 [00:03<00:00, 16.73it/s]
 94%|█████████▍| 47/50 [00:03<00:00, 16.68it/s]
 98%|█████████▊| 49/50 [00:03<00:00, 16.71it/s]
100%|██████████| 50/50 [00:03<00:00, 13.05it/s]
17 Feb, 11:16:44
2023-02-17 16:16:44.479397 {'type': 'inference', 'status': 'done', 'container_id': 'denimmazukibananadeployment0027d4a105d53d594b3093eaa817cc9c6625', 'time': 1676650604479, 't': 18601, 'tsl': 5093, 'payload': {'startRequestId': None}}
17 Feb, 11:16:44
- (sanic.access)[INFO][127.0.0.1:58246]: POST http://0.0.0.0:8000/  200 574321

GP 3: Yes! I'll send the path to you privately on discord :)

Thank you again and hope that's helpful!

@gadicc
Copy link
Collaborator

gadicc commented Feb 18, 2023

Hey @DenimMazuki.

Ok great, thanks so much for all this... I can see exactly what's going on now.

Firstly, let me apologize 😅 It was a silly issue on my side. I know you've spent a few days on this and how frustrating that can be. But we're very close to an official v1 release and your pain will help many others moving forward ;)

I've released a fix to the dev branch. To try this out, it's as simple as editing the Dockerfile and changing:

ARG FROM_IMAGE="gadicc/diffusers-api"     # <- from this
ARG FROM_IMAGE="gadicc/diffusers-api:dev" # <- to this

(or overriding that build arg variable on the banana dashboard, but last time I checked, you still have to push a commit anyways to get it to use the new values)

The reason why this wasn't working was an faulty assumption that checkpoints would be optimized first and saved to S3. This will work now regardless, but since we don't support banana's optimization anymore, your cold starts will be slower. There's info on creating an optimized build at https://forums.kiri.art/t/safetensors-our-own-optimization-faster-model-init/98.

The short of it is, you need a 2nd deployment of the main repo (not build-downloads), which can perform the optimization for you. It's not too much extra work but is a a little time consuming, especially the first time, so as a thank you for your patience, I'm happy to do the conversion for you if you send the checkpoint file.

(The reason why we need a second deploy is because banana builds happen without GPU, which is required for optimization. In banana's case, after the first stage of the build, they move the built image to separate "optimization servers" to complete the optimization, but we have no way of hooking into this).

Anyway, hope you find all the extra info interesting, otherwise you can safely ignore for now and just get up and running in the meantime without the optimization. I should be around to help if you experience any further issues. And thanks again for your patience :)

@gadicc
Copy link
Collaborator

gadicc commented Feb 18, 2023

Whoa, sorry, one other note... I haven't deployed to banana in quite a while, and things have improved there big time!!!

First, their optimization worked, which I wasn't expecting (it was breaking on docker-diffusers-api for a really long time!)... so even if you don't use our optimization, you'll still get there's.

Secondly, wow, they've really improved things... everything completed in a few minutes instead of the 1hr+ I remember.

Anyways, just wanted to correct my earlier note about things being slow without using "our" optimization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants