Very simple http server #367

stduhpf · 2024-08-25T22:02:40Z

This is a very simple server that I made to be able to generate different prompts without reloading the models everytime.

Starting the server

The syntax is pretty much the same as the cli.

.\build\bin\Release\sd-server.exe --diffusion-model  ..\ComfyUI\models\unet\flux1-schnell-Q3_k.gguf --vae ..\ComfyUI\models\vae\ae.q8_0.gguf --clip_l ..\ComfyUI\models\clip\clip_l.q8_0.gguf --t5xxl ..\ComfyUI\models\clip\t5xxl_q4_k.gguf  -p "Default prompt" --cfg-scale 1.0 --sampling-method euler -v --steps 4 -o "server_output.png"

How to use (example):

Using the example client script

Make sure you have python with the modules requests and pilloware installed
pip install requests pillow
Launch client in interactive mode
python -i examples/server/test_client.py

Simplest setup

Make sure you have python installed with the requests module:
pip install requests
Open a python REPL:
python
Import the requests module
>>> import requests
Post your prompt directly to the /txt2imgendpoints
>>> requests.post("http://localhost:8080/txt2img","a lovely cat holding a sign says 'flux.cpp'")
Images will be saved to disk on the server side, and each generation will overwrite the previous one.

Using json payloads

Make sure you have python installed with the requests module:
pip install requests
Open a python REPL:
python
Import the requests and json modules
>>> import requests, json
Construct your json payload with generation parameters
>>> payload = {'prompt': """a lovely cat holding a sign says "flux.cpp" """,'height': 768, 'seed': 42, 'sample_steps': 4}
Post your payload to the /txt2imgendpoints
>>> requests.post("http://localhost:8080/txt2img", json.dumps(payload))

Decoding response using pillow

Make sure both requests and pilloware installed
pip install requests pillow
Open a python REPL:
python
Import the requests, json and base64 modules
>>> import requests, json, base64
Import io.BytesIO and PIL.Image
>>> from io import BytesIO
>>> from PIL import Image
Get the response from server
>>> response = requests.post("http://localhost:8080/txt2img","a lovely cat holding a sign says 'flux.cpp'")
Parse the response text as Json
>>> parsed = json.loads(response.text)
Decode base64 image data
>>> pngbytes = base64.b64decode(parsed[0]["data"])
Convert to PIL Image
>>> image = Image.open(BytesIO(pngbytes))
Display the image in default viewer
>>> image.show()

One-liner

First import the necessary modules
>>> import requests, json, base64
>>> from io import BytesIO
>>> from PIL import Image
Use this line to send the request and open all the generated images.
>>> [Image.open(BytesIO(base64.b64decode(img["data"]))).show() for img in json.loads(requests.post("http://localhost:8080/txt2img",json.dumps( {'seed': -1, 'batch_count':4, 'sample_steps':4, 'prompt': """a lovely cat holding a sign says "flux.cpp" """} )).text)]
To send another payload after it's finished, press up arrow and you can edit the payload.

If you don't want the image viewer to pause the execution of your command, you can do the following (not needed on macOS for some reason):
>>> from threading import Thread
>>> [Thread(target=Image.open(BytesIO(base64.b64decode(img["data"]))).show, args=()).start() for img in json.loads(requests.post("http://localhost:8080/txt2img",json.dumps( {'seed': -1, 'batch_count':4, 'sample_steps':4, 'prompt': """a lovely cat holding a sign says "flux.cpp" """} )).text)]

theaerotoad · 2024-08-26T22:30:38Z

I'm excited about this one, and was attempting to combine with Vulkan

I'm seeing a compile time issue (around the pingpong function) in my merge, and seems it's in the original as well.

stable-diffusion.cpp/examples/server/main.cpp:572:24: error: non-local lambda expression cannot have a capture-default
  572 | const auto pingpong = [&](const httplib::Request &, httplib::Response & res) {
      |                        ^
/home/aerotoad/software/aicpp/sd_vulkan_flux/server/stable-diffusion.cpp/examples/server/main.cpp: In lambda function:
/home/aerotoad/software/aicpp/sd_vulkan_flux/server/stable-diffusion.cpp/examples/server/main.cpp:672:5: warning: control reaches end of non-void function [-Wreturn-type]
  672 |     };

stduhpf · 2024-08-26T22:34:49Z

around the pingpong function

Ah! This function should go, I just added it at the start of devlopment to see if I was able to connect to the server. If it's causing issues, just remove it, and all the few things that depend on it.

stduhpf · 2024-08-26T22:40:03Z

@theaerotoad just out of curiosity, which C++ compiler are you using? MSVC had no issue with this code (which I believe was technically incorrect).

theaerotoad · 2024-08-26T22:43:00Z

@theaerotoad just out of curiosity, which C++ compiler are you using? MSVC had no issue with this code (which I believe was technically incorrect).

Tested it on gcc 12.2.0-14 on Debian.

theaerotoad · 2024-08-27T00:05:54Z

Yup, removing the pingpong endpoint allows compilation.

Another thought--the default 'localhost' string didn't work on my end initially. Looks like llama.cpp server defaults to using 127.0.0.1 instead of 'localhost', so it might be worth setting the default string that way. Not a big deal. though.

I was able to generate an image via requests, but segfault immediately afterwards.

[DEBUG] ggml_extend.hpp:977  - flux compute buffer size: 397.27 MB(RAM)
  |==================================================| 4/4 - 79.22s/it
[INFO ] stable-diffusion.cpp:1295 - sampling completed, taking 316.44s
[INFO ] stable-diffusion.cpp:1303 - generating 1 latent images completed, taking 316.44s
[INFO ] stable-diffusion.cpp:1306 - decoding 1 latents
[DEBUG] ggml_extend.hpp:977  - vae compute buffer size: 1664.00 MB(RAM)
[DEBUG] stable-diffusion.cpp:967  - computing vae [mode: DECODE] graph completed, taking 14.59s
[INFO ] stable-diffusion.cpp:1316 - latent 1 decoded, taking 14.59s
[INFO ] stable-diffusion.cpp:1320 - decode_first_stage completed, taking 14.59s
[INFO ] stable-diffusion.cpp:1429 - txt2img completed in 341.62s
save result image to 'server_output.png'
Segmentation fault

I've played around a bit (not much of a c++ coder at this point, and can't reliably track down where it's coming from, though. I'm running with batch 1 (so only one image), and the first image gets written properly, with tags, then the dreaded segfault.

stduhpf · 2024-08-27T00:18:49Z

Maybe you could try on the CPU backend to see if the segfault is related to the Vulkan merge or to the server itself? (Also you should probably use a less demanding model than flux when testing)

theaerotoad · 2024-08-27T01:16:54Z

Maybe you could try on the CPU backend to see if the segfault is related to the Vulkan merge or to the server itself? (Also you should probably use a less demanding model than flux when testing)

Right--should have said I ran the earlier example with the CPU backend (tried with no BLAS just to confirm it wasn't in my merging it over that caused this!) It's much faster with Vulkan.

I can confirm I seem to throw a segfault with the server everytime with:

CPU (no BLAS), from server branch with SDXL
CPU (no BLAS), from server branch with Flux Schnell, q8 quants
Vulkan, merged into SkuttleOleg's with Flux Schnell and q8 quants

For each of the above, they run fine with the main cli example (although painfully slowly on CPU).

stduhpf · 2024-08-27T09:02:33Z

Hmm it doesn't happen on my machine, that's annoying to debug. I'll try running it on WSL to see if it's a linux thing.

Edit: It does happen on WSL too! So maybe i can fix it.

stduhpf · 2024-08-27T09:18:42Z

@theaerotoad I belive it's fixed now.

theaerotoad · 2024-08-27T13:04:27Z

@stduhpf Yup, that fixes it. Thank you!

Sure nice not to have to reload everything each time.

theaerotoad · 2024-09-03T23:01:57Z

@stduhpf -- This is working pretty well, I played around with it a bit this weekend. I have a few tweaks, to enable other inputs to be specified (via html form inputs) and returning the image as part of the POST command, and reduce CPU usage--use t.join() rather than while(1) at the end.

Do you want them? I may just share as a gist, or can branch off your repo. What's your preference?

stduhpf · 2024-09-03T23:05:37Z

@theaerotoad Both options are fine with me, thanks for helping.

I thought about returning the image in base64 after each generation, but I was too lazy to implement it.

stduhpf · 2024-10-05T14:42:50Z

I just spent hours trying to understand why the server wasn't sending the image metadata as it is supposed to, turns out PIL automatically strips out the metadata, the server was working fine 🙃.

Green-Sky · 2024-10-06T11:22:22Z

There are some differences to the automatic111 v1 webui api.
You use

sampling_steps instead of steps
batch_count instead of batch_size
sample_method instead of sampler_index

This info however might be outdated, I just wanted to make my bot work with your api, so this just jumped at me.
We should look into what the other api's do (automatic and ComfyUI), and base it on that, to not make it incompatible unnecessarily.

edit: links:
https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/API

stduhpf · 2024-10-06T11:27:35Z

There are some differences to the automatic111 v1 webui api. You use

sampling_steps instead of steps

batch_count instead of batch_size

sample_method instead of sampler_index

This info however might be outdated, I just wanted to make my bot work with your api, so this just jumped at me. We should look into what the other api's do (automatic and ComfyUI), and base it on that, to not make it incompatible unnecessarily.

I might look into making the API compatible with other standards in the future. For now, I just use the same arguments as the txt2img() function declaration in stable-diffusion.h :

stable-diffusion.cpp/stable-diffusion.h

Lines 148 to 164 in 14206fd

    
           SD_API sd_image_t* txt2img(sd_ctx_t* sd_ctx, 
        
                                      const char* prompt, 
        
                                      const char* negative_prompt, 
        
                                      int clip_skip, 
        
                                      float cfg_scale, 
        
                                      float guidance, 
        
                                      int width, 
        
                                      int height, 
        
                                      enum sample_method_t sample_method, 
        
                                      int sample_steps, 
        
                                      int64_t seed, 
        
                                      int batch_count, 
        
                                      const sd_image_t* control_cond, 
        
                                      float control_strength, 
        
                                      float style_strength, 
        
                                      bool normalize_input, 
        
                                      const char* input_id_images_path);

stduhpf · 2024-10-06T11:28:56Z

Speaking of, shouldn't the schedule method be specified when calling txt2img() rather than when creating the context?

Green-Sky · 2024-10-06T11:39:27Z

I might look into making the API compatible with other standards in the future. For now, I just use the same arguments as the txt2img() function declaration in stable-diffusion.h

I see.

Speaking of, shouldn't the schedule method be specified when calling txt2img() rather than when creating the context?

I suppose.

Green-Sky · 2024-10-06T12:08:50Z

If anyone just wants to run a command:

curl -sv --json '{"prompt": "a lovely cat", "seed": -1}' 127.0.0.1:7860/txt2img | jq -r .[0].data | base64 -d - > api_result.png

leejet/stable-diffusion.cpp#367 commit 1c599839800ed5984e72562968db7e4df5d052bd

NNDam · 2024-10-22T14:34:31Z

@stduhpf thanks for your work. Currently I'm using this pr for photomaker v2 but have error when change the input embedding (I believed it called "input_id_images_path"). How to input the different face without reload the whole SDXL model or some function to reload the face embedding ?

stduhpf · 2024-10-22T15:11:57Z

@NNDam You can try with my lastest commit. I can't test it on my end, but it should work now?

NNDam · 2024-10-22T16:28:48Z

@stduhpf thanks, I tried but still not work. The main problem is, at the first time load model, I also need to preload --input-id-images-dir extracted from script face_detect.py in this PR. But the embedding won't reload if I change the input_id_images_path when perform requests to server. It still output the face same with preloaded face at first time (and also Segmentation fault if the number of the current face differ with number of preloaded face)

stduhpf · 2024-10-22T17:09:28Z

@stduhpf thanks, I tried but still not work. The main problem is, at the first time load model, I also need to preload --input-id-images-dir extracted from script face_detect.py in this PR. But the embedding won't reload if I change the input_id_images_path when perform requests to server. It still output the face same with preloaded face at first time (and also Segmentation fault if the number of the current face differ with number of preloaded face)

Oh I see. Well, even if Support for PhotoMaker Version 2 was merged, I couldn't get this to work with the current architecture of the server, sorry. Have you tried with photomaker v1?

NNDam · 2024-10-23T02:00:59Z

Hi @bssrdf, can you help us ?

bssrdf · 2024-10-23T12:51:04Z

Hi @bssrdf, can you help us ?

@NNDam, I'll see what can be done to make it work. Photomaker was developed following control net's workflow. It needs to be adjusted to work with this sever setup.

stduhpf · 2024-10-23T13:00:45Z

I think some changes need to be made in stable-diffusion.cpp/stable-diffusion.h. Some arguments like scheduler type, vae settings, and controlnets are passed to the new_sd_ctx() function that load the models, but they should probably be passed to functions like txt2img(), img2img() and img2vid() instead.
That's completely out of scope for this PR though, but it would allow the server to easily support controlnet and photomaker v2.

bssrdf · 2024-10-24T12:59:29Z

@NNDam , @stduhpf , I briefly looked at the server code. There may be a simple workaround for photomaker.

// parse req.body as json using jsoncpp
        using json = nlohmann::json;

        try {
            std::string json_str = req.body;
            parseJsonPrompt(json_str, &params);
        } catch (json::parse_error& e) {
            // assume the request is just a prompt
            // LOG_WARN("Failed to parse json: %s\n Assuming it's just a prompt...\n", e.what());
            sd_log(sd_log_level_t::SD_LOG_WARN, "Failed to parse json: %s\n Assuming it's just a prompt...\n", e.what());
            std::string prompt = req.body;
            if (!prompt.empty()) {
                params.prompt = prompt;
            } else {
                params.seed += 1;
            }
        } catch (...) {
            // Handle any other type of exception
            // LOG_ERROR("An unexpected error occurred\n");
            sd_log(sd_log_level_t::SD_LOG_ERROR, "An unexpected error occurred\n");
        }

Could there be a parsing of input_id_images_path added in above block and set params.input_id_images_path to the new path from the request?

stduhpf · 2024-10-24T13:52:06Z

@bssrdf That's exactly what I did in the last commit (d0704a5): https://github.com/stduhpf/stable-diffusion.cpp/blob/d0704a536bae4904f9133ef0f1076ac8f7c44f0b/examples/server/main.cpp#L696. In theory this should work for photomaker v1 support (though I haven't tried it).

But Photomaker v2 support from your PR requires passing params.input_id_images_path as an arument to new_sd_ctx(), instead of just txt2img() .

bssrdf · 2024-10-25T16:53:07Z

@bssrdf That's exactly what I did in the last commit (d0704a5): https://github.com/stduhpf/stable-diffusion.cpp/blob/d0704a536bae4904f9133ef0f1076ac8f7c44f0b/examples/server/main.cpp#L696. In theory this should work for photomaker v1 support (though I haven't tried it).

But Photomaker v2 support from your PR requires passing params.input_id_images_path as an arument to new_sd_ctx(), instead of just txt2img() .

Thanks for the information, @stduhpf.
I updated loading id_embeds to using raw binary tensor file load (using load_tensor_from_file). It is more efficient to load this way since there is only 1 tensor. Now it should change/update id_embed based on the request and feed photomaker V2. @NNDam, please retry my PR and let me know if there is still problem.

NNDam · 2024-11-08T04:19:33Z

It worked !!! Thanks @bssrdf @stduhpf

stduhpf mentioned this pull request Sep 27, 2024

Run as a server #419

Closed

stduhpf force-pushed the server branch from 10feda0 to 83095ca Compare October 4, 2024 14:52

stduhpf mentioned this pull request Oct 4, 2024

[Bug] width or height can not be 608 #44

Closed

Green-Sky added a commit to Green-Sky/solanaceae_sdbot-webui that referenced this pull request Oct 6, 2024

add sdcpp stduhpf webapi

a9fbf54

leejet/stable-diffusion.cpp#367 commit 1c599839800ed5984e72562968db7e4df5d052bd

stduhpf added 17 commits October 24, 2024 18:38

Add server example

cda835e

server: remove pingpong endpoint

15b3cd7

Server: Fix missing return on non-void function

5a8f532

Server: change default host

87531a7

Server: Fix printf

5e1d598

server: move httplib to thirdparty folder

6c459e6

Server: accept json inputs + return bas64 image

8a259ce

server: use t.join() instead of infinite loop

57a963c

server: fix CI Build

d5034d9

server: attach image metadata in response

e1093f4

server: add simple client script

3afe86b

Server: add client docstrings

e51d783

server: client: fix image preview on non-windows os

91aede0

server: support sampling method arg

0e3f985

server: small test client fixes

0bc559c

Try adding photomaker support

048be6a

Server: sd3.5

3370683

stduhpf force-pushed the server branch from d0704a5 to 3370683 Compare October 24, 2024 16:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very simple http server #367

Very simple http server #367

stduhpf commented Aug 25, 2024 •

edited

Loading

theaerotoad commented Aug 26, 2024 •

edited

Loading

stduhpf commented Aug 26, 2024

stduhpf commented Aug 26, 2024

theaerotoad commented Aug 26, 2024

theaerotoad commented Aug 27, 2024

stduhpf commented Aug 27, 2024

theaerotoad commented Aug 27, 2024

stduhpf commented Aug 27, 2024 •

edited

Loading

stduhpf commented Aug 27, 2024

theaerotoad commented Aug 27, 2024

theaerotoad commented Sep 3, 2024

stduhpf commented Sep 3, 2024

stduhpf commented Oct 5, 2024

Green-Sky commented Oct 6, 2024 •

edited

Loading

stduhpf commented Oct 6, 2024

stduhpf commented Oct 6, 2024

Green-Sky commented Oct 6, 2024

Green-Sky commented Oct 6, 2024

NNDam commented Oct 22, 2024

stduhpf commented Oct 22, 2024

NNDam commented Oct 22, 2024

stduhpf commented Oct 22, 2024 •

edited

Loading

NNDam commented Oct 23, 2024

bssrdf commented Oct 23, 2024

stduhpf commented Oct 23, 2024 •

edited

Loading

bssrdf commented Oct 24, 2024 •

edited

Loading

stduhpf commented Oct 24, 2024

bssrdf commented Oct 25, 2024 •

edited

Loading

NNDam commented Nov 8, 2024

Very simple http server #367

Are you sure you want to change the base?

Very simple http server #367

Conversation

stduhpf commented Aug 25, 2024 • edited Loading

Starting the server

How to use (example):

Using the example client script

Simplest setup

Using json payloads

Decoding response using pillow

One-liner

theaerotoad commented Aug 26, 2024 • edited Loading

stduhpf commented Aug 26, 2024

stduhpf commented Aug 26, 2024

theaerotoad commented Aug 26, 2024

theaerotoad commented Aug 27, 2024

stduhpf commented Aug 27, 2024

theaerotoad commented Aug 27, 2024

stduhpf commented Aug 27, 2024 • edited Loading

stduhpf commented Aug 27, 2024

theaerotoad commented Aug 27, 2024

theaerotoad commented Sep 3, 2024

stduhpf commented Sep 3, 2024

stduhpf commented Oct 5, 2024

Green-Sky commented Oct 6, 2024 • edited Loading

stduhpf commented Oct 6, 2024

stduhpf commented Oct 6, 2024

Green-Sky commented Oct 6, 2024

Green-Sky commented Oct 6, 2024

NNDam commented Oct 22, 2024

stduhpf commented Oct 22, 2024

NNDam commented Oct 22, 2024

stduhpf commented Oct 22, 2024 • edited Loading

NNDam commented Oct 23, 2024

bssrdf commented Oct 23, 2024

stduhpf commented Oct 23, 2024 • edited Loading

bssrdf commented Oct 24, 2024 • edited Loading

stduhpf commented Oct 24, 2024

bssrdf commented Oct 25, 2024 • edited Loading

NNDam commented Nov 8, 2024

stduhpf commented Aug 25, 2024 •

edited

Loading

theaerotoad commented Aug 26, 2024 •

edited

Loading

stduhpf commented Aug 27, 2024 •

edited

Loading

Green-Sky commented Oct 6, 2024 •

edited

Loading

stduhpf commented Oct 22, 2024 •

edited

Loading

stduhpf commented Oct 23, 2024 •

edited

Loading

bssrdf commented Oct 24, 2024 •

edited

Loading

bssrdf commented Oct 25, 2024 •

edited

Loading