-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve working with disabled MyClouds #27
Comments
I agree with point 3 which is easy to fix, but how big of a project is the rest?
So we are running a special Lambda that constantly pings other Lambdas, so that they stay warm. The alternative solution for the warmup is a configuration option released by AWS called "provisioned concurrency". We can play with it and see how much it costs per month, thus we may not need the disable function altogether, so no need to fix it. But we may need to have a "slow down" feature, which will decrease "provisioned capacity" and decrease the rate at which jobs run to once a day. Hybrid strategy - dormant, but not cold
MicroVM Snapshotting - future solution for "cold start"In cloudpal, we plan to use MicroVM snapshotting mechanism to avoid cold starts. Snapshotting has been slowly productized for FireCracker MicroVM (underlying AWS Lambda) and is now quite reliable. But when AWS is going to start using it, is not clear, as there is also a challenge of uniqueness / randomness as each restored snapshot is identical (problem described here). Snapshotting can be further significantly improved by super-awesome OS paging mechanism, called REAP. Compared to baseline snapshotting, REAP slashes the cold-start delays by 3.7x. It is tested with the help of Hive. REAP was a research project and at this point is not in active development. But there is a more recent work, called SnapFaaS that analyzes limitations of REAP, and offers an alternative approach that claims significant improvement over REAP. Still, as paper the lower bound for this optimization is 15ms cold start. Language-specific sandboxing runtimes, like WASM, as above paper states, can achieve 10-20micro seconds cold start, and some CDNs already have such in production. But they are not-generic (we can't run there unless we rewrite MyCloud in Rust) and more importantly, they provide much lower level of protection from the host (cloud provider). |
lambda costs for a single mycloud seem to be < $0.15 per day |
Each of the points suggested build upon each other and can be completed step-by-step. They are small steps, each easy to be done. Good for persons starting with mycloud/tradleconf development. In the meantime I also thought of further ways to improve this situation beyond the initial tasks:
Offtopic: Slowdown / Why disable?
"Slow down" is a performance knob. I think that is an interesting thought but I think it should be additional/separate to a disable switch. After all: when I re-enable a MyCloud I want it to run in the same configuration as when it was disabled before. We can make a different issue for this?!
While the cost reasons are valid, I think there are two other valid reasons:
The discussion on "why disable" or "how to disable" is something we should have but maybe let's do that in a separate issue? |
When a MyCloud is disabled, the CLI currently shows errors during all new operations like in #26 because the underlying code can not look up information about a mycloud in the disabled state.
Currently re-enabling is a multi-step manual process that involves going into the AWS console.
To improve this I am thinking of 3 steps to improve the situation:
The text was updated successfully, but these errors were encountered: