Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Azure managed identity #4897

Merged
merged 37 commits into from
Jun 12, 2024
Merged

Conversation

bentsherman
Copy link
Member

Close #4871

Thanks @swampie for the minimal example. Is there any config required to authenticate with Azure Batch? Currently with this PR, no batch credentials will be provided if only the managed identity is specified

Signed-off-by: Ben Sherman <[email protected]>
@bentsherman bentsherman requested a review from pditommaso April 9, 2024 20:26
@bentsherman bentsherman requested a review from a team as a code owner April 9, 2024 20:26
Copy link

netlify bot commented Apr 9, 2024

Deploy Preview for nextflow-docs-staging ready!

Name Link
🔨 Latest commit 7e0e5e0
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/6669d77f5c87cb00088b527a
😎 Deploy Preview https://deploy-preview-4897--nextflow-docs-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link
Member

@pditommaso pditommaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sound good but there are zero unit tests

@swampie
Copy link
Contributor

swampie commented Apr 11, 2024

@bentsherman in the example I do instantiate a DefaultAzureCredentials as you do in your change. This should be enough as the instance(s) are configured to use that clientId

@adamrtalbot
Copy link
Collaborator

I'm with Ben - where do you authenticate against the Batch service?

Copy link
Collaborator

@adamrtalbot adamrtalbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused as to how this is authenticating against the Batch service. Also, we should leave some scope open for a system assigned (anonymous) managed identity.

Comment on lines 230 to 232
final credential = new DefaultAzureCredentialBuilder()
.managedIdentityClientId(clientId)
.build()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be worth checking if this can pick up a system assigned identity as well as a named one. You should be able to do this by dropping the managedIdentityClientId(clientId) part (educated guess here):

        final credential = new DefaultAzureCredentialBuilder()
        if ( clientId ) {
            credential.managedIdentityClientId(clientId)
        }
        
        finalCredential = credential.build()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also enable people to authenticate as themselves on their personal machines if they're logged in to Azure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need another config option to explicitly enable the system assigned identity.

Here are the docs for the default credential builder: https://learn.microsoft.com/en-us/java/api/com.azure.identity.defaultazurecredentialbuilder?view=azure-java-stable

It's not obvious to me that it defaults to the system-assigned identity, since it seems to support other credentials. But maybe you can discern better than me.

Copy link
Collaborator

@adamrtalbot adamrtalbot Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you just call defaultCredentialBuilder it goes in this order: https://learn.microsoft.com/en-us/azure/developer/java/sdk/identity-azure-hosted-auth#default-azure-credential

  1. Environment - DefaultAzureCredential reads account information specified via environment variables and use it to authenticate.
  2. Managed Identity - If the application deploys to an Azure host with Managed Identity enabled, DefaultAzureCredential authenticates with that account.
  3. IntelliJ - If you've authenticated via Azure Toolkit for IntelliJ, DefaultAzureCredential authenticates with that account.
  4. Visual Studio Code - If you've authenticated via the Visual Studio Code Azure Account plugin, DefaultAzureCredential authenticates with that account.
  5. Azure CLI - If you've authenticated an account via the Azure CLI az login command, DefaultAzureCredential authenticates with that account.

More specifically they give this example where they state that the clientId is only required if using a user assigned identity:

/**
 * Authenticate with a managed identity.
 */
public void createManagedIdentityCredential() {
    ManagedIdentityCredential managedIdentityCredential = new ManagedIdentityCredentialBuilder()
        .clientId("<USER ASSIGNED MANAGED IDENTITY CLIENT ID>") // only required for user assigned
        .build();

    // Azure SDK client builders accept the credential as a parameter
    SecretClient client = new SecretClientBuilder()
        .vaultUrl("https://{YOUR_VAULT_NAME}.vault.azure.net")
        .credential(managedIdentityCredential)
        .buildClient();
}

https://github.com/Azure/azure-sdk-for-java/wiki/Azure-Identity-Examples#authenticating-in-azure-with-managed-identity

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that with the system-assigned identity you would only need to know the batch/storage account names and then you could submit jobs from anywhere without any client-side credentials?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is the "system" the node from which you are submitting tasks i.e. the Nextflow head job which would presumably be running in Azure?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both. You would need to give it the batch and storage account names, but it would authenticate because of what it is, rather than what it knows. So if you ran it on an Azure VM with a System-assigned identity, you would need to:

  • Grant that system assigned identity permissions for the Batch and Storage account
  • Configure Nextflow to try and submit to that Batch and Storage account
  • Run Nextflow on that virtual machine, which will assume that identity

@adamrtalbot
Copy link
Collaborator

You will also need to update the azcopy version for this to work, see here for some additional details: #3314 (comment)

@swampie
Copy link
Contributor

swampie commented Apr 12, 2024

I'm confused as to how this is authenticating against the Batch service. Also, we should leave some scope open for a system assigned (anonymous) managed identity.

IF the pool has (manually at the moment) assigned to a managed identity (user assigned) then any operation performed (towards batch or storage) by instances of the pool will use the clientId for authentication

@adamrtalbot
Copy link
Collaborator

I'm confused as to how this is authenticating against the Batch service. Also, we should leave some scope open for a system assigned (anonymous) managed identity.

IF the pool has (manually at the moment) assigned to a managed identity (user assigned) then any operation performed (towards batch or storage) by instances of the pool will use the clientId for authentication

But how does it know which Batch account to authenticate against? It will assume the identity, then it needs to know which batch account to be using when it submits jobs and tasks, that doesn't appear to be here in the code?

@bentsherman
Copy link
Member Author

But how does it know which Batch account to authenticate against? It will assume the identity, then it needs to know which batch account to be using when it submits jobs and tasks, that doesn't appear to be here in the code?

Good point, currently with the managed identity it will leave the batch credentials as null and the batch account is ignored.

Here are the docs for batch credentials: https://learn.microsoft.com/en-us/java/api/com.microsoft.azure.batch.auth.batchcredentials?view=azure-java-stable

The following implementations are available:

  • BatchApplicationTokenCredentials: Application token based credentials for use with a Batch Service Client.
  • BatchSharedKeyCredentials: Shared key credentials for an Azure Batch account.
  • BatchUserTokenCredentials: User token based credentials for use with a Batch Service Client.

@adamrtalbot
Copy link
Collaborator

pditommaso and others added 3 commits April 15, 2024 10:18
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
…mProvider.groovy

Signed-off-by: Adam Talbot <[email protected]>
@vsmalladi
Copy link
Contributor

@adamrtalbot this will be great to try with TES plugin

Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Adam Talbot <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Adam Talbot <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Adam Talbot <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Adam Talbot <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Adam Talbot <[email protected]>
swampie and others added 4 commits June 1, 2024 11:47
Signed-off-by: Matteo Fiandesio <[email protected]>
Signed-off-by: Matteo Fiandesio <[email protected]>
Signed-off-by: Matteo Fiandesio <[email protected]>
task: update azure batch to 1.0.0-beta version
Signed-off-by: Ben Sherman <[email protected]>
bentsherman and others added 2 commits June 11, 2024 08:25
…mProvider.groovy

Co-authored-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
@pditommaso pditommaso merged commit 21ca16e into master Jun 12, 2024
22 checks passed
@pditommaso pditommaso deleted the 4871-azure-managed-identity branch June 12, 2024 17:48
@adamrtalbot
Copy link
Collaborator

adamrtalbot commented Jun 13, 2024

Something was messed up and it didn't include all the changes from #5042. Now it produces this error:

ERROR ~ If you are using a StorageSharedKeyCredential, and the server returned an error message that says 'Signature did not match', you can compare the string to sign with the one generated by the SDK. To log the string to sign, pass in the context key value pair 'Azure-Storage-Log-String-To-Sign': true to the appropriate method call.
If you are using a SAS token, and the server returned an error message that says 'Signature did not match', you can compare the string to sign with the one generated by the SDK. To log the string to sign, pass in the context key value pair 'Azure-Storage-Log-String-To-Sign': true to the appropriate generateSas method call.
Please remember to disable 'Azure-Storage-Log-String-To-Sign' before going to production as this string can potentially contain PII.
Status code 403, "<?xml version="1.0" encoding="utf-8"?><Error><Code>AuthorizationPermissionMismatch</Code><Message>This request is not authorized to perform this operation using this permission.
RequestId:e7d83a9d-401e-00b7-42a6-bd4ab6000000
Time:2024-06-13T15:28:40.3709234Z</Message></Error>"

@alberto-miranda
Copy link
Contributor

alberto-miranda commented Jun 27, 2024

Is the new managedIdentity keyword introduced by this PR valid only for the azurebatch executor? I'm trying to setup a GitHub Workflow to test this with Fusion (using a local executor) and, no matter how I configure it, I'm always hitting a Missing Azure storage credentials: please specify a managed identity, service principal, or storage account key error.

The current configuration even uses a bogus clientId:

azure {
    managedIdentity {
        clientId = 'foobar'
    }
}

I would expect this to fail, but instead it appears as if clientId was never defined:

user@host:~ $ NXF_VER=24.05.0-edge nextflow run ./tmp/main.nf -profile azure_mi                                                                                                                    

 N E X T F L O W   ~  version 24.05.0-edge

Launching `./tmp/main.nf` [insane_cori] DSL2 - revision: ac120465ff

ERROR ~ Missing Azure storage credentials: please specify a managed identity, service principal, or storage account key

 -- Check '.nextflow.log' file for details

@adamrtalbot
Copy link
Collaborator

Not necessarily Azure Batch, it could just be Azure Storage. You will need to name the storage account though:

azure {
    managedIdentity {
        clientId = 'foobar'
    }
    storage {
        accountName = "<YOUR BLOB ACCOUNT NAME>"
    }
}

@alberto-miranda
Copy link
Contributor

alberto-miranda commented Jun 27, 2024

Not necessarily Azure Batch, it could just be Azure Storage. You will need to name the storage account though:


azure {

    managedIdentity {

        clientId = 'foobar'

    }

    storage {

        accountName = "<YOUR BLOB ACCOUNT NAME>"

    }

}

Fixed it. My problem was that the azure configuration was not defined in the proper scope 🤦‍♂️

Regarding the accountName, this information is currently accessed by Fusion directly via the AZURE_STORAGE_ACOUNT environment variable (if defined). Is this definition a requirement for the MI?

@adamrtalbot
Copy link
Collaborator

It's not a requirement for the Managed Identity per se, but it's needed so that you know what resources you are trying to access, so practically you will need to supply a storage account name.

@alberto-miranda
Copy link
Contributor

I have been testing this manually in the context of https://github.com/seqeralabs/fusion/issues/526 and I have found it to fail occasionally (some runs succeed, some fail with the error below):

azureuser@vm:~ $ NXF_VER=24.05.0-edge nextflow run hello

 N E X T F L O W   ~  version 24.05.0-edge

Launching `https://github.com/nextflow-io/hello` [naughty_wilson] DSL2 - revision: 7588c46ffe [master]

executor >  azurebatch (4)
[83/640680] process > sayHello (3) [100%] 4 of 4, failed: 4 ✘
Execution cancelled -- Finishing pending tasks before exit
ERROR ~ Error executing process > 'sayHello (1)'

Caused by:
  One of the specified Azure Blob(s) is not found


Command executed:

  echo 'Bonjour world!'

Command exit status:
  -

Command output:
  (empty)

Work dir:
  az://fusion-develop/scratch/9f/c7b067d8cd7aa5f91971f47ed8b914

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details

The execution environment is an Azure VM, with nextflow v24.05.0-edge and the following nextflow.config:

process.executor = 'azurebatch'
workDir = "az://fusion-develop/scratch"

fusion {
    enabled = false
}

azure {
    managedIdentity {
        system = true
    }

    storage {
        accountName = "<redacted>"
    }

    batch {
        accountName = "<redacted>"
        location = "<redacted>"
        autoPoolMode = true
        deletePoolsOnCompletion = true

        allowPoolCreation = true
        pools {
            auto {
                autoScale = true
                vmCount = 1
                maxVmCount = 10
            }
        }
    }
}

I initially thought the problems came from our implementation of MI in Fusion, but after stripping everything away the error persists. Any thoughts on what could be happening?

@adamrtalbot
Copy link
Collaborator

adamrtalbot commented Jul 9, 2024

I wasn't able to recreate the issue after running it 10 times with Nextflow v24.06.0-edge, are you using anything else?

for n in {1..10}; do NXF_VER=24.06.0-edge nextflow run -bg hello -c azure.config; done
process.executor = 'azurebatch'
workDir = 'az://container/dir'

fusion {
    enabled = false
}

azure {

    storage {
        accountName = "<redacted>"
    }

    batch {
        location = "<redacted>"
        accountName = "<redacted>"
        copyToolInstallMode = 'node'
        autoPoolMode = true
        allowPoolCreation = true
        deletePoolsOnCompletion = false
    }

    activeDirectory {
        // redacted
    }
}

@alberto-miranda
Copy link
Contributor

I wasn't able to recreate the issue after running it 10 times with Nextflow v24.06.0-edge, are you using anything else?

for n in {1..10}; do NXF_VER=24.06.0-edge nextflow run -bg hello -c azure.config; done
process.executor = 'azurebatch'
workDir = 'az://container/dir'

fusion {
    enabled = false
}

azure {

    storage {
        accountName = "<redacted>"
    }

    batch {
        location = "<redacted>"
        accountName = "<redacted>"
        copyToolInstallMode = 'node'
        autoPoolMode = true
        allowPoolCreation = true
        deletePoolsOnCompletion = false
    }

    activeDirectory {
        // redacted
    }
}

@adamrtalbot I launched several runs with 24.06.0-edge and some failed with the One of the specified Azure Blob(s) is not found error. I'm starting the nextflow run directly from an Azure VM I ssh'd into, which should be configured with a MI. If I try to do the same from my laptop, I'm always hitting the error below:

user@host:~$ NXF_VER=24.06.0-edge nextflow run hello -c nextflow.config-mi                                                                                                                        

 N E X T F L O W   ~  version 24.06.0-edge

NOTE: Your local project version looks outdated - a different revision is available in the remote repository [7588c46ffe]
Launching `https://github.com/nextflow-io/helloworld` [pensive_yonath] DSL2 - revision: 1d71f857bb [master]

ERROR ~ Managed Identity authentication is not available.

 -- Check '.nextflow.log' file for detail

Also, it might not be relevant but your configuration snippet seems to be using a Service Principal (azure.activeDirectory) rather than a Managed Identity (azure.managedIdentity). Is this intentional?

@adamrtalbot
Copy link
Collaborator

Also, it might not be relevant but your configuration snippet seems to be using a Service Principal (azure.activeDirectory) rather than a Managed Identity (azure.managedIdentity). Is this intentional?

You can't use an MI from your laptop (at least, not practically). So from your laptop you need to user a service principal. I will try from the VM and see if it's an issue with the machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants