Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial commit for Azure Blob Storage adaptor #438

Conversation

bensonmiller
Copy link

@bensonmiller bensonmiller commented Nov 18, 2023

Summary

Adds an OpenFn adaptor for Azure Blob Storage

Details

This adaptor adds the following capabilities for working with Azure Blob Storage:

  • Create a container
  • Check the existence of a blob
  • Fetch the properties of a blob
  • Upload a blob
  • Download a blob as JSON
  • Download a blob as string

Issues

This is my first adaptor PR and I'm not an especially competent JS developer. I'm certain there are issues. But I've subjected this adaptor to a fair amount of testing via the CLI and it is working respectably. Some thoughts:

  • Your could make an argument that ALL azure-related functions should go in a single OpenFn adaptor. That's not what I've done here. This may be worth a discussion, but Blob Storage interactions /alone/ could grow to be a fairly substantial adaptor.
  • I haven't spent any meaningful time trying to mock the Azure blob storage client(s), so the testing is embarrassing.
  • I don't see a lot of try/catch logic in other adaptors, so I haven't put such logic in this one. Uncaught exceptions FTW!
  • There are a number of different ways to authenticate against Azure. This adaptor only supports the StorageSharedKeyCredential at the moment.
  • I wonder about the pattern of defining helper functions like downloadAsJSON() and downloadAsString(). The difference between these is minimal, but I saw a couple other adaptors doing similar things so I did it here. I'm not sure that I love it (or, even, that I've implemented it especially well).

Review Checklist

Before merging, the reviewer should check the following items:

  • Does the PR do what it claims to do?
  • If this is a new adaptor, has the migration tool been run and the
    migration guide followed?
  • Are there any unit tests? Should there be?
  • Is there a changeset associated with this PR? Should there be? Note that
    dev only changes don't need a changeset.

@bensonmiller
Copy link
Author

Hello @josephjclark @stuartc. @taylordowns2000 suggested that I flag this to your attention. I suspect that this isn't yet in a merge-able state, but it feels close enough that I wanted to open a PR.

@josephjclark
Copy link
Collaborator

@bensonmiller Thank you for this! Don't worry too much about the testing side - mocking is a really hard problem (we're working on a solution but no news yet).

I'll take a look at this on Monday and see if we can get it merged.

Does Azure have a sandbox that I can test this adaptor against? What's the effort for me, who thinks azure is a colour, to test this adaptor on my own machine?

@bensonmiller
Copy link
Author

bensonmiller commented Nov 18, 2023

@josephjclark Thank you! There are a couple ways to handle Azure testing. I could grant you access to a small Azure subscription that I have. That's not terrible, but would still require you to create and manage a user account in Azure. Alternatively (and maybe the best approach) is to have you sign up for an Azure account. When you create an account for the first time, MSFT puts some cloud credits ($200 IIRC) into new accounts. Pretty standard stuff, but I can't imagine that would exceed those credits on blob storage testing in 1000 years. If you want to go down that second path, this is how I'd approach it:

  1. Signup for an Azure account
  2. Once you have an Azure subscription and are logged into the Azure web portal, create a resource group that is associated with the subscription. Call this resource group openfnadaptortesting or similar.
  3. Create a storage account and assign it to that resource group. Call the storage account blobstorageadaptortest or similar. Accept all other defaults during the creation of the storage account, ensuring that the "Storage account key access" setting is enabled. You'll have a chance to confirm this setting on the review / summary screen immediately before pressing the "Create" button.
  4. Once you've created the storage account, browse to the main "Storage Account" page (find it under the main navigation hamburger or via the search bar) and open the storage account that you just created. Then, in the left nav bar, find and click on the "Access Keys" item under "Security + Networking" heading.
  5. Copy the Storage Account Name and the values of one of the keys (doesn't matter which key) into your state.configuration as the accountName and accountKey properties, respectively.

That should be all that's necessary to get a storage account that works with this adaptor. I think that's a relatively easy process and still (mostly) allows you to assert that Azure is just a color. I'm not far from that perspective myself, feeling much more comfortable in AWS and GCP. . .

EDIT:
As I think more about this, there is actually a third option that might be attractive. Instead of configuring a user account for you in the Azure portal, I could give you an accessKey to a storage account in my subscription. In this model, you'd need to install the Azure CLI and use it with jq to de-bulk the extensive outputs. Please let me know which way you'd prefer to go with this.

@bensonmiller bensonmiller changed the title Initial commit for Azure Blog Storage adaptor Initial commit for Azure Blob Storage adaptor Nov 18, 2023
Copy link
Collaborator

@josephjclark josephjclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @bensonmiller thank you for such a well written and well-documented PR.

There is absolutely nothing wrong with your Javascript and frankly I think you should have a little more faith in yourself! It's all nice and clean and concise and my comments are more around high-level design or low-level patterns that OpenFn uses (and perhaps doesn't document so well).

I really like the idea of splintering out this "mini adaptor" to just cover part of Azure 👌 I wish we'd thought of that elsewhere.

I haven't tested anything yet. Regarding that, given that is adaptor is by you and for you, I am happy to take it on your word that it works. I can see that the API is fairly simple (and maybe will get simpler?) which again makes me feel like I don't need to test too extensively.

A temporary access key could be a good way to go but let's get through this review round first and then see where we are!

packages/azureblobstorage/src/Adaptor.js Outdated Show resolved Hide resolved
packages/azureblobstorage/src/Adaptor.js Outdated Show resolved Hide resolved
packages/azureblobstorage/src/Adaptor.js Outdated Show resolved Hide resolved
packages/azureblobstorage/src/Adaptor.js Outdated Show resolved Hide resolved
packages/azureblobstorage/src/Adaptor.js Outdated Show resolved Hide resolved
@@ -0,0 +1 @@
# @openfn/language-azureblobstorage
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this is approved, you'll want to run pnpm changeset from repo root and follow to steps to generate release notes. But this is the last thing you'll do.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll take care of this before releasing :)

packages/azureblobstorage/assets/rect-512x200.png Outdated Show resolved Hide resolved
packages/azureblobstorage/package.json Outdated Show resolved Hide resolved
"author": "Open Function Group",
"license": "LGPLv3",
"files": [
"dist/",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a note to myself - I think we need to include assets here? We should update the template

packages/azureblobstorage/src/Adaptor.js Outdated Show resolved Hide resolved
@bensonmiller
Copy link
Author

@josephjclark -- Thank you for this superb review. I've skimmed it and I love the feedback. I have a fairly busy day today, so it's unlikely that I'll have any updates to commit before tomorrow, but I'll work through your comments and incorporate at soonest.

@bensonmiller
Copy link
Author

@josephjclark I've updated this PR to incorporate almost all of your comments. Notable changes include:

  • Pulled the client out of state into a variable
  • Eliminate checkBlobExists() function
  • Unified downloadBlob() API with options to download as string | json
  • Updated uploadBlob() function to support creation of container (boolean) or overwriting existing blobs (boolean)
  • Renamed adaptor to azure-storage
  • Functions default to sourcing the container name from state.configuration, but the name can be overridden by a function argument if desired
  • Added Azure Storage images to assets/ directory
  • Removed junk comments
  • Adjusted log levels

@josephjclark josephjclark changed the base branch from main to release/azure-blobstorage November 29, 2023 09:44
Create a file `job.js`, as shown below, to run with the OpenFn CLI.

```js
const data = state.data;
Copy link
Collaborator

@josephjclark josephjclark Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example code here should show some typical adaptor usage and tell a bit of a story. I'd like to re-work it heavily, make it shorter, and align with our best practices.

We're not very good at explaining this stuff so let me take a minute to explain a few things!

  • You don't need to call the execute function - the OpenFn runtime will do that for you!
  • Job code is designed to be a small number of top level "operation" functions with as little procedural code as possible. It's designed to be very lightweight and elegant (although to be fair it rarely is in practice!)
  • As part of that, every line of job code should start with an operation from an adaptor (not a const or let). I'm pretty sure this code won't actually run on our existing app at openfn.org.
  • A cheat code for this is to declare an fn(() => { ... }) block at the start of the job and do all your initialisation code there. Any variables that are re-used in the job should be written to state.
  • By the way, we are guilty all the time of using fn blocks to nest lots of complex procedural code. It's kind of inevitable when dealing with messy data from the real world!

The big thing with your example is that it front-loads a lot of variable declarations and then feeds them through to the operations. What we'd usually do is pass a function in place of a value, and use the function to dynamically calculate stuff (we call this an "open function" by the way).

Ok, that's a lot of words. Here's what I would do with your example:

uploadBlob(
  state => {
    const date = new Date();
    const id = '0e82962a-6ed0-4a88-92c1-51ae785b4126';
    return `${date.getFullYear}/${date.month}/com.example/${id}.json`;
  },
  state.data,
  {
    blobHTTPHeaders: { blobContentType: 'application/json' }
  }
);

Note that the container name is coming from config here - so the state object described above probably needs updating:

Copy link
Author

@bensonmiller bensonmiller Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josephjclark -- Thank you for this comment. I've reworked my example per your guidance and I'm getting close. I'm experiencing a problem when I use an open function to create and return the blobName argument to the uploadBlob() function. For my experiment, I'm using the following two job definitions, below. The first one uses a plain string for the blobName argument, which works fine. The second one uses the open function approach, which results in an error when trying to instantiate the blockBlobClient (which is the first place in the uploadBlob function where the blobName argument is used).

This works:

uploadBlob(
"some/path/myblobname.json",
  state.data,
  {
    blobHTTPHeaders: { blobContentType: 'application/json' }
  },
  { createContainer: true, containerName: 'thisone'}
);

This doesn't work

uploadBlob(
  () => { return "some/path/myblobname.json"; },
  state.data,
  {
    blobHTTPHeaders: { blobContentType: 'application/json' }
  },
  { createContainer: true, containerName: 'thisone'}
);

I run the job via the CLI. When it fails, it returns the following output. The "blobName.split is not a function" error is thrown when I try to instantiate the blockBlobClient. I'm wondering if the open function hasn't yet returned, so the blockBlobClient is trying to instantiate itself with a /function/ instead of a /string/, which causes it to error.

➜  nhgh_workflows openfn uploadAzureSimpler.js -ma azure-storage -s tmp/state_for_azure_upload_simple.json -O  -v
[CLI] ♦ Versions:
         ▸ node.js                           18.17.1
         ▸ cli                               0.4.10
         ▸ runtime                           0.2.1
         ▸ compiler                          0.0.38
         ▸ @openfn/language-azure-storage    latest
[CLI] ✔ Loading adaptors from monorepo at /Users/benson/sandbox/newhorizons/openfn_adaptors
[CLI] ✔ Loaded state from tmp/state_for_azure_upload_simple.json
[CLI] ✔ Compiled from uploadAzureSimple.js
[R/T] ♦ Starting job job-1
[R/T] ✘ Failed job job-1 after 154ms
[R/T] ✘ TypeError: TypeError: blobName.split is not a function
[R/T] ✘ Check state.errors.job-1 for details.
[CLI] ✔ Result: 
[CLI] ♦ {
  "data": {
    "foo": "bar",
    "bif": "baz",
    "sunday": "monday",
    "answer": "42"
  },
  "errors": {
    "job-1": {
      "type": "TypeError",
      "jobId": "job-1",
      "message": "TypeError: blobName.split is not a function",
      "error": {
        "source": "runtime",
        "type": "RuntimeError",
        "severity": "fail",
        "subtype": "TypeError",
        "name": "RuntimeError",
        "message": "TypeError: blobName.split is not a function"
      }
    }
  }
}
[CLI] ⚠ Errors reported in 1 jobs
[CLI] ✔ Finished in **231ms**

I'd appreciate your thoughts on this. I'd be happy to jump on a GMT call if it would help. Thank you!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bensonmiller - that's the expandReferences thing biting us.

Check out this comment: #438 (comment)

You need to expand the blobName reference and use the "resolved" value in the function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes! Thanks for the re-iteration. This should be all fixed up now.

packages/azure-storage/package.json Outdated Show resolved Hide resolved
packages/azure-storage/src/Adaptor.js Outdated Show resolved Hide resolved
packages/azure-storage/src/Adaptor.js Outdated Show resolved Hide resolved
packages/azure-storage/src/Adaptor.js Outdated Show resolved Hide resolved
packages/azure-storage/src/Adaptor.js Outdated Show resolved Hide resolved
packages/azureblobstorage/src/Adaptor.js Outdated Show resolved Hide resolved
Copy link
Collaborator

@josephjclark josephjclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @bensonmiller, this is looking great!

I think I accidentally fired off my review notes early - apologies for that, how embarassing!!

I've tidied them all up now ahem.

There's just a few more things we ought to deal with, then we can get this out in to the world!

- Updating configuration schema
- Better handling of function options
- Adding support for
- Bumped to 1.0
@bensonmiller
Copy link
Author

@josephjclark I think (hope) that this is ready for a final look before prepping for release (pnpm changeset, etc.)

@josephjclark
Copy link
Collaborator

Thank you @bensonmiller! I am very happy with this - we'll handle the changeset stuff and get this released for you ASAP.

@josephjclark josephjclark merged commit 9985fb4 into OpenFn:release/azure-blobstorage Dec 11, 2023
1 check failed
@bensonmiller bensonmiller deleted the azureblobstorage-adaptor branch December 11, 2023 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants