Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very simple http server #367

Draft
wants to merge 17 commits into
base: master
Choose a base branch
from
Draft

Very simple http server #367

wants to merge 17 commits into from

Conversation

stduhpf
Copy link
Contributor

@stduhpf stduhpf commented Aug 25, 2024

This is a very simple server that I made to be able to generate different prompts without reloading the models everytime.

Starting the server

The syntax is pretty much the same as the cli.

.\build\bin\Release\sd-server.exe --diffusion-model  ..\ComfyUI\models\unet\flux1-schnell-Q3_k.gguf --vae ..\ComfyUI\models\vae\ae.q8_0.gguf --clip_l ..\ComfyUI\models\clip\clip_l.q8_0.gguf --t5xxl ..\ComfyUI\models\clip\t5xxl_q4_k.gguf  -p "Default prompt" --cfg-scale 1.0 --sampling-method euler -v --steps 4 -o "server_output.png"

How to use (example):

Using the example client script

  1. Make sure you have python with the modules requests and pilloware installed
    pip install requests pillow
  2. Launch client in interactive mode
    python -i examples/server/test_client.py

Simplest setup

  1. Make sure you have python installed with the requests module:
    pip install requests
  2. Open a python REPL:
    python
  3. Import the requests module
    >>> import requests
  4. Post your prompt directly to the /txt2imgendpoints
    >>> requests.post("http://localhost:8080/txt2img","a lovely cat holding a sign says 'flux.cpp'")
  5. Images will be saved to disk on the server side, and each generation will overwrite the previous one.

Using json payloads

  1. Make sure you have python installed with the requests module:
    pip install requests
  2. Open a python REPL:
    python
  3. Import the requests and json modules
    >>> import requests, json
  4. Construct your json payload with generation parameters
    >>> payload = {'prompt': """a lovely cat holding a sign says "flux.cpp" """,'height': 768, 'seed': 42, 'sample_steps': 4}
  5. Post your payload to the /txt2imgendpoints
    >>> requests.post("http://localhost:8080/txt2img", json.dumps(payload))

Decoding response using pillow

  1. Make sure both requests and pilloware installed
    pip install requests pillow
  2. Open a python REPL:
    python
  3. Import the requests, json and base64 modules
    >>> import requests, json, base64
  4. Import io.BytesIO and PIL.Image
    >>> from io import BytesIO
    >>> from PIL import Image
  5. Get the response from server
    >>> response = requests.post("http://localhost:8080/txt2img","a lovely cat holding a sign says 'flux.cpp'")
  6. Parse the response text as Json
    >>> parsed = json.loads(response.text)
  7. Decode base64 image data
    >>> pngbytes = base64.b64decode(parsed[0]["data"])
  8. Convert to PIL Image
    >>> image = Image.open(BytesIO(pngbytes))
  9. Display the image in default viewer
    >>> image.show()

One-liner

  1. First import the necessary modules
    >>> import requests, json, base64
    >>> from io import BytesIO
    >>> from PIL import Image
  2. Use this line to send the request and open all the generated images.
    >>> [Image.open(BytesIO(base64.b64decode(img["data"]))).show() for img in json.loads(requests.post("http://localhost:8080/txt2img",json.dumps( {'seed': -1, 'batch_count':4, 'sample_steps':4, 'prompt': """a lovely cat holding a sign says "flux.cpp" """} )).text)]
  3. To send another payload after it's finished, press up arrow and you can edit the payload.

If you don't want the image viewer to pause the execution of your command, you can do the following (not needed on macOS for some reason):
>>> from threading import Thread
>>> [Thread(target=Image.open(BytesIO(base64.b64decode(img["data"]))).show, args=()).start() for img in json.loads(requests.post("http://localhost:8080/txt2img",json.dumps( {'seed': -1, 'batch_count':4, 'sample_steps':4, 'prompt': """a lovely cat holding a sign says "flux.cpp" """} )).text)]

@theaerotoad
Copy link

theaerotoad commented Aug 26, 2024

I'm excited about this one, and was attempting to combine with Vulkan

I'm seeing a compile time issue (around the pingpong function) in my merge, and seems it's in the original as well.

stable-diffusion.cpp/examples/server/main.cpp:572:24: error: non-local lambda expression cannot have a capture-default
  572 | const auto pingpong = [&](const httplib::Request &, httplib::Response & res) {
      |                        ^
/home/aerotoad/software/aicpp/sd_vulkan_flux/server/stable-diffusion.cpp/examples/server/main.cpp: In lambda function:
/home/aerotoad/software/aicpp/sd_vulkan_flux/server/stable-diffusion.cpp/examples/server/main.cpp:672:5: warning: control reaches end of non-void function [-Wreturn-type]
  672 |     };

@stduhpf
Copy link
Contributor Author

stduhpf commented Aug 26, 2024

around the pingpong function

Ah! This function should go, I just added it at the start of devlopment to see if I was able to connect to the server. If it's causing issues, just remove it, and all the few things that depend on it.

@stduhpf
Copy link
Contributor Author

stduhpf commented Aug 26, 2024

@theaerotoad just out of curiosity, which C++ compiler are you using? MSVC had no issue with this code (which I believe was technically incorrect).

@theaerotoad
Copy link

@theaerotoad just out of curiosity, which C++ compiler are you using? MSVC had no issue with this code (which I believe was technically incorrect).

Tested it on gcc 12.2.0-14 on Debian.

@theaerotoad
Copy link

Yup, removing the pingpong endpoint allows compilation.

Another thought--the default 'localhost' string didn't work on my end initially. Looks like llama.cpp server defaults to using 127.0.0.1 instead of 'localhost', so it might be worth setting the default string that way. Not a big deal. though.

I was able to generate an image via requests, but segfault immediately afterwards.

[DEBUG] ggml_extend.hpp:977  - flux compute buffer size: 397.27 MB(RAM)
  |==================================================| 4/4 - 79.22s/it
[INFO ] stable-diffusion.cpp:1295 - sampling completed, taking 316.44s
[INFO ] stable-diffusion.cpp:1303 - generating 1 latent images completed, taking 316.44s
[INFO ] stable-diffusion.cpp:1306 - decoding 1 latents
[DEBUG] ggml_extend.hpp:977  - vae compute buffer size: 1664.00 MB(RAM)
[DEBUG] stable-diffusion.cpp:967  - computing vae [mode: DECODE] graph completed, taking 14.59s
[INFO ] stable-diffusion.cpp:1316 - latent 1 decoded, taking 14.59s
[INFO ] stable-diffusion.cpp:1320 - decode_first_stage completed, taking 14.59s
[INFO ] stable-diffusion.cpp:1429 - txt2img completed in 341.62s
save result image to 'server_output.png'
Segmentation fault

I've played around a bit (not much of a c++ coder at this point, and can't reliably track down where it's coming from, though. I'm running with batch 1 (so only one image), and the first image gets written properly, with tags, then the dreaded segfault.

@stduhpf
Copy link
Contributor Author

stduhpf commented Aug 27, 2024

Maybe you could try on the CPU backend to see if the segfault is related to the Vulkan merge or to the server itself? (Also you should probably use a less demanding model than flux when testing)

@theaerotoad
Copy link

Maybe you could try on the CPU backend to see if the segfault is related to the Vulkan merge or to the server itself? (Also you should probably use a less demanding model than flux when testing)

Right--should have said I ran the earlier example with the CPU backend (tried with no BLAS just to confirm it wasn't in my merging it over that caused this!) It's much faster with Vulkan.

I can confirm I seem to throw a segfault with the server everytime with:

  • CPU (no BLAS), from server branch with SDXL
  • CPU (no BLAS), from server branch with Flux Schnell, q8 quants
  • Vulkan, merged into SkuttleOleg's with Flux Schnell and q8 quants

For each of the above, they run fine with the main cli example (although painfully slowly on CPU).

@stduhpf
Copy link
Contributor Author

stduhpf commented Aug 27, 2024

Hmm it doesn't happen on my machine, that's annoying to debug. I'll try running it on WSL to see if it's a linux thing.

Edit: It does happen on WSL too! So maybe i can fix it.

@stduhpf
Copy link
Contributor Author

stduhpf commented Aug 27, 2024

@theaerotoad I belive it's fixed now.

@theaerotoad
Copy link

@stduhpf Yup, that fixes it. Thank you!

Sure nice not to have to reload everything each time.

@theaerotoad
Copy link

@stduhpf -- This is working pretty well, I played around with it a bit this weekend. I have a few tweaks, to enable other inputs to be specified (via html form inputs) and returning the image as part of the POST command, and reduce CPU usage--use t.join() rather than while(1) at the end.

Do you want them? I may just share as a gist, or can branch off your repo. What's your preference?

@stduhpf
Copy link
Contributor Author

stduhpf commented Sep 3, 2024

@theaerotoad Both options are fine with me, thanks for helping.

I thought about returning the image in base64 after each generation, but I was too lazy to implement it.

@stduhpf
Copy link
Contributor Author

stduhpf commented Oct 5, 2024

I just spent hours trying to understand why the server wasn't sending the image metadata as it is supposed to, turns out PIL automatically strips out the metadata, the server was working fine 🙃.

@Green-Sky
Copy link
Contributor

Green-Sky commented Oct 6, 2024

There are some differences to the automatic111 v1 webui api.
You use

  • sampling_steps instead of steps
  • batch_count instead of batch_size
  • sample_method instead of sampler_index

This info however might be outdated, I just wanted to make my bot work with your api, so this just jumped at me.
We should look into what the other api's do (automatic and ComfyUI), and base it on that, to not make it incompatible unnecessarily.

edit: links:
https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/API

@stduhpf
Copy link
Contributor Author

stduhpf commented Oct 6, 2024

There are some differences to the automatic111 v1 webui api. You use

  • sampling_steps instead of steps
  • batch_count instead of batch_size
  • sample_method instead of sampler_index

This info however might be outdated, I just wanted to make my bot work with your api, so this just jumped at me. We should look into what the other api's do (automatic and ComfyUI), and base it on that, to not make it incompatible unnecessarily.

I might look into making the API compatible with other standards in the future. For now, I just use the same arguments as the txt2img() function declaration in stable-diffusion.h :

SD_API sd_image_t* txt2img(sd_ctx_t* sd_ctx,
const char* prompt,
const char* negative_prompt,
int clip_skip,
float cfg_scale,
float guidance,
int width,
int height,
enum sample_method_t sample_method,
int sample_steps,
int64_t seed,
int batch_count,
const sd_image_t* control_cond,
float control_strength,
float style_strength,
bool normalize_input,
const char* input_id_images_path);

@stduhpf
Copy link
Contributor Author

stduhpf commented Oct 6, 2024

Speaking of, shouldn't the schedule method be specified when calling txt2img() rather than when creating the context?

@Green-Sky
Copy link
Contributor

I might look into making the API compatible with other standards in the future. For now, I just use the same arguments as the txt2img() function declaration in stable-diffusion.h

I see.

Speaking of, shouldn't the schedule method be specified when calling txt2img() rather than when creating the context?

I suppose.

@Green-Sky
Copy link
Contributor

If anyone just wants to run a command:

curl -sv --json '{"prompt": "a lovely cat", "seed": -1}' 127.0.0.1:7860/txt2img | jq -r .[0].data | base64 -d - > api_result.png

Green-Sky added a commit to Green-Sky/solanaceae_sdbot-webui that referenced this pull request Oct 6, 2024
leejet/stable-diffusion.cpp#367
commit 1c599839800ed5984e72562968db7e4df5d052bd
@NNDam
Copy link

NNDam commented Oct 22, 2024

@stduhpf thanks for your work. Currently I'm using this pr for photomaker v2 but have error when change the input embedding (I believed it called "input_id_images_path"). How to input the different face without reload the whole SDXL model or some function to reload the face embedding ?

@stduhpf
Copy link
Contributor Author

stduhpf commented Oct 22, 2024

@NNDam You can try with my lastest commit. I can't test it on my end, but it should work now?

@NNDam
Copy link

NNDam commented Oct 22, 2024

@stduhpf thanks, I tried but still not work. The main problem is, at the first time load model, I also need to preload --input-id-images-dir extracted from script face_detect.py in this PR. But the embedding won't reload if I change the input_id_images_path when perform requests to server. It still output the face same with preloaded face at first time (and also Segmentation fault if the number of the current face differ with number of preloaded face)

@stduhpf
Copy link
Contributor Author

stduhpf commented Oct 22, 2024

@stduhpf thanks, I tried but still not work. The main problem is, at the first time load model, I also need to preload --input-id-images-dir extracted from script face_detect.py in this PR. But the embedding won't reload if I change the input_id_images_path when perform requests to server. It still output the face same with preloaded face at first time (and also Segmentation fault if the number of the current face differ with number of preloaded face)

Oh I see. Well, even if Support for PhotoMaker Version 2 was merged, I couldn't get this to work with the current architecture of the server, sorry. Have you tried with photomaker v1?

@NNDam
Copy link

NNDam commented Oct 23, 2024

Hi @bssrdf, can you help us ?

@bssrdf
Copy link
Contributor

bssrdf commented Oct 23, 2024

Hi @bssrdf, can you help us ?

@NNDam, I'll see what can be done to make it work. Photomaker was developed following control net's workflow. It needs to be adjusted to work with this sever setup.

@stduhpf
Copy link
Contributor Author

stduhpf commented Oct 23, 2024

I think some changes need to be made in stable-diffusion.cpp/stable-diffusion.h. Some arguments like scheduler type, vae settings, and controlnets are passed to the new_sd_ctx() function that load the models, but they should probably be passed to functions like txt2img(), img2img() and img2vid() instead.
That's completely out of scope for this PR though, but it would allow the server to easily support controlnet and photomaker v2.

@bssrdf
Copy link
Contributor

bssrdf commented Oct 24, 2024

@NNDam , @stduhpf , I briefly looked at the server code. There may be a simple workaround for photomaker.

// parse req.body as json using jsoncpp
        using json = nlohmann::json;

        try {
            std::string json_str = req.body;
            parseJsonPrompt(json_str, &params);
        } catch (json::parse_error& e) {
            // assume the request is just a prompt
            // LOG_WARN("Failed to parse json: %s\n Assuming it's just a prompt...\n", e.what());
            sd_log(sd_log_level_t::SD_LOG_WARN, "Failed to parse json: %s\n Assuming it's just a prompt...\n", e.what());
            std::string prompt = req.body;
            if (!prompt.empty()) {
                params.prompt = prompt;
            } else {
                params.seed += 1;
            }
        } catch (...) {
            // Handle any other type of exception
            // LOG_ERROR("An unexpected error occurred\n");
            sd_log(sd_log_level_t::SD_LOG_ERROR, "An unexpected error occurred\n");
        }

Could there be a parsing of input_id_images_path added in above block and set params.input_id_images_path to the new path from the request?

@stduhpf
Copy link
Contributor Author

stduhpf commented Oct 24, 2024

@bssrdf That's exactly what I did in the last commit (d0704a5): https://github.com/stduhpf/stable-diffusion.cpp/blob/d0704a536bae4904f9133ef0f1076ac8f7c44f0b/examples/server/main.cpp#L696. In theory this should work for photomaker v1 support (though I haven't tried it).

But Photomaker v2 support from your PR requires passing params.input_id_images_path as an arument to new_sd_ctx(), instead of just txt2img() .

@bssrdf
Copy link
Contributor

bssrdf commented Oct 25, 2024

@bssrdf That's exactly what I did in the last commit (d0704a5): https://github.com/stduhpf/stable-diffusion.cpp/blob/d0704a536bae4904f9133ef0f1076ac8f7c44f0b/examples/server/main.cpp#L696. In theory this should work for photomaker v1 support (though I haven't tried it).

But Photomaker v2 support from your PR requires passing params.input_id_images_path as an arument to new_sd_ctx(), instead of just txt2img() .

Thanks for the information, @stduhpf.
I updated loading id_embeds to using raw binary tensor file load (using load_tensor_from_file). It is more efficient to load this way since there is only 1 tensor. Now it should change/update id_embed based on the request and feed photomaker V2. @NNDam, please retry my PR and let me know if there is still problem.

@NNDam
Copy link

NNDam commented Nov 8, 2024

It worked !!! Thanks @bssrdf @stduhpf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants