A separate importer for every cloud service (EC2, S3, Lambda, Compute Engine, Virtual Machine, Dataflow, RDS, etc.)? #652

josh-swerdlow · 2024-04-22T18:45:14Z

josh-swerdlow
Apr 22, 2024

I noticed a trend of creating importer plugin for many company's (Google, Amazon, Microsoft) cloud services in order to import relevant metrics and then make a model off of that.

My question is broad, but is as follows. Does every service for every company need an importer or can we just use a monitoring service? Does each service need an importer for example what are we measuring for the storage services to determine carbon?

Around the same time I thought of this idea, I stumbled onto the cheat sheet below and decided I wanted to start this discussion as I think it could open up the IF to users without requiring them to develop an importer first

From what I understand, the following services have importers implemented (some by hackathon solutions).

Microsoft Virtual Machines, Google Compute Engine and Dataflow, and Amazon EC2.

Furthermore, one team created a generalized solution (as far as I understand) that would import metrics from Microsofts Azure Monitor API service. This should allow us to import for any service presuming the owner has access to the API.

My thinking on the possible cons:

This monitoring API may not be free or available for everyone (AWS Cloudwatch and G Cloud Monitoring are not free. I can't figure out Azr)
It adds another layer of setup in the users environment (not the IF, their cloud environment).
It add additional compute and carbon when it may not make sense.

The pros are 1 importer (simplicity) with access to whatever metrics they will provide across the entire use cases environment (probably more efficient for large environments to query 1 endpoint of stored metrics).

It makes sense to me to have importers for each service (including the monitoring service for the times it makes sense). What do people think?

On a side note, Microsoft's Carbon Optimization API is something I need to learn more about. Not sure if it makes sense to pursue importers for microsoft services if they can provide a carbon estimate from usage already...

jmcook1186 · 2024-05-17T11:04:07Z

jmcook1186
May 17, 2024
Maintainer

Hi @josh-swerdlow importing data from cloud providers can be challenging. I think it makes sense to have plugins dedicated to specific services, because one of the strengths of IF is that you can swap plugins in and out easily and anyone can build a plugin for their use-case.

A generic importer would be a very big piece of code with a lot of conditional logic and probably awkward input data and credential management. In general I think it's better to be "unix-like" - i.e. break things down into minimal units that do one specific thing well. One other benefit of this is that some market signal can emerge to help optimize dev resources - services that people want to access naturally get plugins built and maintained, whereas services that don't really have demand don't get plugins and people don't waste time and resources building and maintaining them.

That said, there might be examples where some degree of aggregation is beneficial.

Sorry, appreciate this is a bit of a vague response but I don't think there's really a definitive answer, just my fairly weakly held opinion!

0 replies

jawache · 2024-05-21T08:32:23Z

jawache
May 21, 2024
Maintainer

Thanks @josh-swerdlow

I had actually forgotten about what I'm about to write below but your post reminded me!

As @jmcook1186 implied, one thing we learned from creating the initial azure importer was just how amazingly bespoke the logic in the importer had to be. We had assumed it would be more like a simple query (like something you would do with a monitoring DB that stored all your data) but it ended up being a complex set of API calls to find and extract the data that's needed and that was just to get only virtual machine information, there are 100+ services on Azure! Every cloud is apparently the same, would require a lot of work for each one to have an importer which covered 100% of all their services.

One solution that works MIGHT work cross-cloud is to focus on monitoring solutions, e.g. if you are a consumer of cloud data you will likely use a 3rd party monitoring solution like say DataDog or NewRelic. They have had to solve the problem of "Multiple Cloud Providers -> Single DataBase" so one importer from say NewRelic would get you everything across all the clouds. But then you are hit with the same problem, we'd need an importer for every monitoring solution.

Thoughts?

0 replies

josh-swerdlow · 2024-05-21T20:04:27Z

josh-swerdlow
May 21, 2024
Author

3rd party monitoring solutions is another approach that I hadn't considered and makes sense since they've already aggregated the data for us and work cross-cloud. If implemented, it would heavily decrease the amount of effort. I still have the concern that this could price/engineer some groups out. To create a IF that requires one to pay/add tech to use it seems like a barrier that could prohibit certain groups. I don't have any evidence to support this except from my personal experience. Do y'all think this is a substantiated line of thinking?

It really feels more and more like the answer to, "Do we need importers for each service or just the monitoring solutions?" is 'Yes!'.

The subsequent question becomes what is the priority of each solution. Does one chip away at each service or does IF support all monitoring solutions first since it could have broader bigger impact?

Both of your inputs in the bespoke requirements per service are really appreciated. I will look through the azure importer code to get a sense of scope and compare it to the hackathon solution that used azure's monitoring solution instead. This will help my collaborator and I weigh the pros and cons with an effort required parameter.

I'm leaning towards starting with 3rd party multi-cloud monitoring, then move to 1st party monitoring, then do a service by service breakdown, but this could change with more thought.

2 replies

josh-swerdlow May 21, 2024
Author

Furthermore, I agree with Joseph's points of unix-style plugins so even if the 3rd party monitoring solutions do provide a catch all level of breadth. The plugins should be specified in some way that the input parameters aren't a amalgamation of key/strings/int that require more than a few minutes to decipher.

I will keep this style philosophy in mind! Thanks for bringing it up :)

jmcook1186 May 22, 2024
Maintainer

btw we had a hackathon submission that pulled data into IF from datadog - Green-Software-Foundation/hack#99

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A separate importer for every cloud service (EC2, S3, Lambda, Compute Engine, Virtual Machine, Dataflow, RDS, etc.)? #652

{{title}}

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

A separate importer for every cloud service (EC2, S3, Lambda, Compute Engine, Virtual Machine, Dataflow, RDS, etc.)? #652

josh-swerdlow Apr 22, 2024

Replies: 3 comments · 2 replies

jmcook1186 May 17, 2024 Maintainer

jawache May 21, 2024 Maintainer

josh-swerdlow May 21, 2024 Author

josh-swerdlow May 21, 2024 Author

jmcook1186 May 22, 2024 Maintainer

josh-swerdlow
Apr 22, 2024

Replies: 3 comments 2 replies

jmcook1186
May 17, 2024
Maintainer

jawache
May 21, 2024
Maintainer

josh-swerdlow
May 21, 2024
Author

josh-swerdlow May 21, 2024
Author

jmcook1186 May 22, 2024
Maintainer