Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

there should be a way to check if node is running #33

Open
fiatrete opened this issue Apr 7, 2023 · 2 comments
Open

there should be a way to check if node is running #33

fiatrete opened this issue Apr 7, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@fiatrete
Copy link
Owner

fiatrete commented Apr 7, 2023

The Stable Diffusion webui seems to frequently crash for unknown reasons and lacks an automatic restart mechanism. dan-server is not aware of the operational status of registered nodes, which can result in task scheduling to offline nodes and consequently cause task failures.

I checked docs of Stable Diffusion webui and find an easy way to check its running status, to call the app_id API:
curl -X 'GET' 'http://127.0.0.1:7860/app_id/' -H 'accept: application/json'

@fiatrete fiatrete added the enhancement New feature or request label Apr 7, 2023
@DiligentCatCat
Copy link
Contributor

Of course this works, but this problem will be solved if there is a daemon program in your custom SD-WebUI (or other things that are similar to SD-WebUI).

The daemon program takes the responsibility to notify the scheduler whether the worker node is alive. I think this is a more solution.

@fiatrete
Copy link
Owner Author

fiatrete commented Apr 7, 2023

Of course this works, but this problem will be solved if there is a daemon program in your custom SD-WebUI (or other things that are similar to SD-WebUI).

The daemon program takes the responsibility to notify the scheduler whether the worker node is alive. I think this is a more solution.

Currently, we are using Stable Diffusion webui running in API mode as a temporary solution for our dan-node usage. However, this is not an ideal approach as Stable Diffusion webui was not designed as server-side software. The ultimate solution would be to rewrite the dan-node program, considering stable operation and fault recovery during the design phase.

Before that, use API monitoring to check if the node is running is a quick and effective temporary solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants