Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Fusion integration should not fail when managed identity auth is present #5081

Closed
swampie opened this issue Jun 20, 2024 · 6 comments · Fixed by #5278
Closed

Azure Fusion integration should not fail when managed identity auth is present #5081

swampie opened this issue Jun 20, 2024 · 6 comments · Fixed by #5278

Comments

@swampie
Copy link
Contributor

swampie commented Jun 20, 2024

New feature

Usage scenario

Now that Azure Managed Identity has been merged the support for fusion should also be enabled. The goal of this feature is to prevent failures when a managed identity is correctly configured and pass the configuration to fusion.

Suggest implementation

  • one change should be the removal/extension of this logic
@swampie
Copy link
Contributor Author

swampie commented Jun 20, 2024

one possibility to prevent changes on fusion (if it relies on sas tokens) is to change the getOrCreateSasToken() to try to generate a token with the managed identity (unsure if this is possible)

@alberto-miranda
Copy link
Contributor

I'm currently writing a PR solving this in order to unblock https://github.com/seqeralabs/fusion/issues/526, but there's one thing I don't understand: why are we always generating a SAS token with getOrCreateSasToken() and returning it as a result.AZURE_STORAGE_SAS_TOKEN even if the accountKey authentication method has been configured?

I was initially thinking about setting result.AZURE_STORAGE_SAS_TOKEN only for if the SAS token auth was requested, but I don't want to break any existing logic relying on this.

@adamrtalbot
Copy link
Collaborator

returning it as a result.AZURE_STORAGE_SAS_TOKEN even if the accountKey authentication method has been configured?

I believe the SAS key is created by Nextflow, which is sent to the worker nodes as a temporary token for Fusion to use. This is because the accountKey isn't available to worker nodes (?). We can simply bypass this if we believe the worker nodes have an MI associated.

Relates to #5232 which is the non-Fusion equivalent.

@alberto-miranda
Copy link
Contributor

returning it as a result.AZURE_STORAGE_SAS_TOKEN even if the accountKey authentication method has been configured?

I believe the SAS key is created by Nextflow, which is sent to the worker nodes as a temporary token for Fusion to use. This is because the accountKey isn't available to worker nodes (?). We can simply bypass this if we believe the worker nodes have an MI associated.

Relates to #5232 which is the non-Fusion equivalent.

The thing is that AFAICT, Fusion only uses the SAS token if Nextflow has not been configured with an accountKey (cc @pabloaledo), which is why I don't get why we are always generating one here (no matter what auth is used). I can maintain the current logic, but I feel it's a remnant and probably not required nowadays.

alberto-miranda added a commit to alberto-miranda/nextflow that referenced this issue Sep 3, 2024
This commit extends the `AzFusionEnv` class so that it now takes into
account a potential Azure Managed Identity in Nextflow's configuration
file. It also fixes a configuration error (nextflow-io#5081) by which Nextflow complained
of a missing SAS token despite having a Managed Identity properly configured.

Signed-off-by: Alberto Miranda <[email protected]>
alberto-miranda added a commit to alberto-miranda/nextflow that referenced this issue Sep 3, 2024
This commit extends the `AzFusionEnv` class so that it now takes into
account a potential Azure Managed Identity in Nextflow's configuration
file. It also fixes a configuration error (nextflow-io#5081) by which Nextflow complained
of a missing SAS token despite having a Managed Identity properly configured.

Signed-off-by: Alberto Miranda <[email protected]>
pditommaso added a commit that referenced this issue Sep 3, 2024
This commit extends the `AzFusionEnv` class so that it now takes into
account a potential Azure Managed Identity in Nextflow's configuration
file. It also fixes a configuration error (#5081) by which Nextflow complained
of a missing SAS token despite having a Managed Identity properly configured.

Signed-off-by: Alberto Miranda <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>
Co-authored-by: Paolo Di Tommaso <[email protected]>
@alberto-miranda
Copy link
Contributor

returning it as a result.AZURE_STORAGE_SAS_TOKEN even if the accountKey authentication method has been configured?

I believe the SAS key is created by Nextflow, which is sent to the worker nodes as a temporary token for Fusion to use. This is because the accountKey isn't available to worker nodes (?). We can simply bypass this if we believe the worker nodes have an MI associated.
Relates to #5232 which is the non-Fusion equivalent.

The thing is that AFAICT, Fusion only uses the SAS token if Nextflow has not been configured with an accountKey (cc @pabloaledo), which is why I don't get why we are always generating one here (no matter what auth is used). I can maintain the current logic, but I feel it's a remnant and probably not required nowadays.

We actually got a reply to this with #5286 😭, the SAS token is required by Fusion even if SAS token authentication has not been explicitly requested in the nextflow configuration.

@adamrtalbot
Copy link
Collaborator

adamrtalbot commented Sep 5, 2024

We actually got a reply to this with #5286 😭, the SAS token is required by Fusion even if SAS token authentication has not been explicitly requested in the nextflow configuration.

We assume that if Nextflow is using a Managed Identity, Fusion can also use the same identity. A future option could be a config item saying "Do the workers have a managed identity, if so what is the ID?" This could connect with issue #5232 which is the non-Fusion version.

On AWS we have the following settings, we could have a similar format for Azure:

aws.batch.jobRole = 'JOB ROLE ARN'
aws.batch.executionRole = 'EXECUTION ROLE ARN'

e.g.:

azure.batch.managedIdentity = 'MANAGED_IDENTITY_CLIENT_ID'

or even a per-pool setting:

azure.batch.pools.<pool name>.managedIdentity = 'MANAGED_IDENTITY_CLIENT_ID'

When not using a Managed Identity, Nextflow will generate a SAS key using it's own authentication which it passes to Fusion (for now!). This is very simple and robust, but it may be disabled by very security conscious users.

For now, Fusion must use the following settings (generally configured by Nextflow):

  • if there is a managed ID, we assume all is the same and use it
  • if there is not a managed ID, but there is a key or service principle, generate a SAS and use it
  • if there is a SAS, use it directly
  • if none, throw an error before submitting a process

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants