AWS requirements for when we migrate this to AWS lambdas. #3

randytpierce · 2023-02-10T17:09:33Z

randytpierce
Feb 10, 2023
Maintainer

It might be rational to do this in AWS. One approach to be to have the scorecard app, wherever it is living, to replicate the necessary data to the AWS couchbase instance with a relatively short Time to Live. Then it submits the api request to process the scorecard instance. The resultant scorecard updates (as calculations get made) would be updated to its local couchbase (also short time to live) and replicated back to the MATS couchbase. This is essentially the same flow we would be using for the local implementation of the CalculateService except for a couple of XDCR implementations that are done outside of the code.

To do this we would need AWS resources. Following is some discussion about how to size those....

This is what I am thinking about cloud resources for an ongoing scorecard/parallel data processing resource requirement. I included Ian on this because he knows better than I how to estimate the actual AWS resource costs. Maybe Ian can plug these numbers into the cost calculator, maybe even we can do it together so that I can get that under my belt as well.

We will need a place to run the web service, message queue, and lambda manager. I don't think this needs to be a huge processor but its "beefyness" will depend in the end on how much processing we are doing. I think that each concurrent processing task will need a core. We can load balance it so that it will spin up additional processors as needed but to start we would need one instance with say 4 cores, 16 GB of memory (that is a pure guess).

We will need the resources to do a boatload of lambdas and I don't know how they actually charge for them except that they are supposed to only charge for cpu time plus maybe a fee for warm start vs cold start. We could probably use up to 3000 of those for a large scorecard. For example suppose there are 6 regions, 4 variables, 4 stats, and 15 forecast hours we would have 6 * 4 * 4 * 15 individual cells. That is 1440 of those for a row and I expect that people will want multiple rows. The good part is that those are on demand kind of things and if we have to limit them the system will know how to accommodate that limitation.

The other aspect of this is database. In order to really do this correctly we would run a couchbase instance. It could be sized anywhere from one to several nodes but we could start with one. The purpose would be that the original request from a MATS scorecard or other app would first XDCR (external data center replication) only the required data up to the co-located couchbase instance. The data would have a relatively short time to live so that it would evaporate in the cloud. The data processing would take place against that cloud data and we would only return the result (small) and then let the data evaporate so that we don't get charged for it and so we limit our storage requirements. So we need one of those as well.

All of this stuff should be containerized so maybe we just need a reasonable kubernetes instance with several nodes. That might be the best approach plus the cost for the lambdas.

It occurs to me that I didn't really specify what we actually need in any kind of succinct way. I think that we will need to acquire enough AWS resources to support a kubernetes cluster of at least four nodes (for the service and couchbase instance). That's at least four aws instances or however they charge for a kubernetes cluster. We also need to pay for a certain amount of lambda time that depends on how many scorecards get deployed. I don't know how to estimate how many scorecards will get deployed but each lambda may take some number of seconds (a few probably) to make its calculations and there could be thousands of those for each time a scorecard gets processed. I hope that is a little more succinct.

ian-noaa · 2023-02-16T23:24:20Z

ian-noaa
Feb 16, 2023
Maintainer

@bonnystrong - Randy & I put an "educated wild guess" together with the AWS calculator. We included options for a 4-node k8s cluster that could run couchbase and a large scorecard process on top of EC2 VMs. We also included some very back-of-the-envelope estimates for using Lambdas in place of one of the k8s nodes. Either way, Couchbase costs (via EC2) dominate the expenses.

This estimate can be firmed up as we get further along in developing this backend calculation process and if we get a chance to experiment with Lambdas/Queues and figure out what that architecture could look like. Fine-tuning the couchbase requirements would deliver the most cost savings.

https://calculator.aws/#/estimate?id=1e7a5fd47efdf262944a70b3ba5b53b418353b03

0 replies

randytpierce · 2023-02-16T23:27:26Z

randytpierce
Feb 16, 2023
Maintainer Author

It is possible that we could do it without a Couchbase instance, especially if our GSL couchbase cluster were made public, but that is beyond what we are ready to bite off right now.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS requirements for when we migrate this to AWS lambdas. #3

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

AWS requirements for when we migrate this to AWS lambdas. #3

randytpierce Feb 10, 2023 Maintainer

Replies: 2 comments

ian-noaa Feb 16, 2023 Maintainer

randytpierce Feb 16, 2023 Maintainer Author

randytpierce
Feb 10, 2023
Maintainer

ian-noaa
Feb 16, 2023
Maintainer

randytpierce
Feb 16, 2023
Maintainer Author