Skip to content

Grant required IAM roles to Compute Engine default SA when --managed-mldiagnostics is passed during xpk cluster create#1187

Closed
rapatchi wants to merge 0 commit into
AI-Hypercomputer:mainfrom
rapatchi:permission_fix
Closed

Grant required IAM roles to Compute Engine default SA when --managed-mldiagnostics is passed during xpk cluster create#1187
rapatchi wants to merge 0 commit into
AI-Hypercomputer:mainfrom
rapatchi:permission_fix

Conversation

@rapatchi

@rapatchi rapatchi commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

When provisioning clusters with --managed-mldiagnostics, XLA ML diagnostics requires roles/hypercomputecluster.editor, roles/storage.objectUser, and roles/logging.logWriter to be bound to the Compute Engine default service account.

This commit:

  1. Automatically resolves projectNumber and grants these 3 required IAM roles via gcloud projects add-iam-policy-binding during cluster create when --managed-mldiagnostics is enabled.
  2. Updates user documentation (permissions.md, clusters.md) and unit test coverage accordingly.

Issue

If not done permissions need to be given manually for mldiagonstics to work.

Testing

Have you performed any manual testing on your change?

Prior IAM Bindings:
image

Cluster Creation Logs:

(xpk_local_venv) rapatchi@rapatchi2:~/xpk_fork/xpk_sa$ xpk cluster create --cluster=maxtest-cluster1 --tpu-type=v5litepod-8 --project=rapatchiconsumer --zone=us-central1-b --num-nodes=2 --spot --managed-mldiagnostics
[XPK] Starting xpk v0.1.dev903+g2b0dc6334
...
[XPK] Task: `Get Project Number` is implemented by `gcloud projects describe rapatchiconsumer --format="value(projectNumber)"`
[XPK] Granting necessary roles to 641919595434-compute@developer.gserviceaccount.com
[XPK] Task: `Grant roles/hypercomputecluster.editor` is implemented by `gcloud projects add-iam-policy-binding rapatchiconsumer --member="serviceAccount:641919595434-compute@developer.gserviceaccount.com" --role="roles/hypercomputecluster.editor" --condition=None`
[XPK] Task: `Grant roles/hypercomputecluster.editor` succeeded.
[XPK] Task: `Grant roles/storage.objectUser` is implemented by `gcloud projects add-iam-policy-binding rapatchiconsumer --member="serviceAccount:641919595434-compute@developer.gserviceaccount.com" --role="roles/storage.objectUser" --condition=None`
[XPK] Task: `Grant roles/storage.objectUser` succeeded.
[XPK] Task: `Grant roles/logging.logWriter` is implemented by `gcloud projects add-iam-policy-binding rapatchiconsumer --member="serviceAccount:641919595434-compute@developer.gserviceaccount.com" --role="roles/logging.logWriter" --condition=None`
[XPK] Task: `Grant roles/logging.logWriter` succeeded.
[XPK] Task: `Determine server supported GKE versions for default gke version` is implemented by `gcloud container get-server-config --project=rapatchiconsumer --region=us-central1 --flatten="channels" --filter="channels.channel=RAPID" --format="value(channels.defaultVersion)"`
...

Post Creation:

image

Have you verified use cases affected by goldens? Yes

Comment thread src/xpk/commands/cluster.py Outdated
Comment thread src/xpk/commands/cluster.py Outdated
Comment thread src/xpk/commands/managed_ml_diagnostics.py Outdated
Comment thread src/xpk/commands/managed_ml_diagnostics.py Outdated

@scaliby scaliby left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for addressing my feedback. LGTM!

@scaliby scaliby enabled auto-merge June 11, 2026 09:18
@scaliby scaliby disabled auto-merge June 11, 2026 09:18
@scaliby

scaliby commented Jun 11, 2026

Copy link
Copy Markdown
Member

Please address linter failures and resolve conflicts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants