Skip to content

confidential-devhub/workshop-skill

Repository files navigation

Confidential Containers ARO Diagnostic Skill

This skill helps diagnose and troubleshoot Confidential Containers (CoCo) deployments on Azure Red Hat OpenShift (ARO), particularly when following the workshop.

What It Does

The skill systematically checks your ARO cluster for common CoCo configuration issues:

  • ✅ Trustee operator installation and configuration
  • ✅ OpenShift Sandboxed Containers (OSC) operator setup
  • ✅ Attestation configuration and reference values
  • ✅ PCR8 hash consistency (most common failure point)
  • ✅ Image signature verification policies
  • ✅ Sealed secrets configuration
  • ✅ Pod-specific debugging for failed CoCo workloads

Files

  • coco-aro-diagnostics.md - Main Claude Code skill definition
  • coco-diagnostics.sh - Automated diagnostic helper script
  • README.md - This file

Installation

Option 1: Use as Claude Code Skill

  1. Copy coco-aro-diagnostics.md to your Claude Code skills directory:

    mkdir -p ~/.claude/skills
    cp coco-aro-diagnostics.md ~/.claude/skills/
  2. In Claude Code, invoke the skill:

    /coco-aro-diagnostics
    

Option 2: Run Diagnostic Script Directly

The helper script can run independently for quick checks:

# Full diagnostics
./coco-diagnostics.sh

# Check only Trustee operator
./coco-diagnostics.sh --trustee

# Check only OSC operator
./coco-diagnostics.sh --osc

# Check PCR8 consistency (most common issue)
./coco-diagnostics.sh --pcr8

# Diagnose specific pod
./coco-diagnostics.sh --pod default/my-coco-pod

Prerequisites

Required Tools

  • oc (OpenShift CLI) - installed and logged into cluster
  • jq - JSON processor
  • base64, gunzip - for decoding initdata
  • sha384sum - for PCR hash calculations

Authentication

You need access to the OpenShift cluster via one of:

  1. Kubeconfig file:

    export KUBECONFIG=/path/to/kubeconfig
  2. Username/password:

    oc login https://api.cluster.example.com:6443 -u username -p password
  3. Already logged in - just run the diagnostics

Permissions

Your user needs read access to:

  • trustee-operator-system namespace (Trustee)
  • openshift-sandboxed-containers-operator namespace (OSC)
  • Application namespaces (for pod debugging)

Usage Examples

Example 1: Full Cluster Diagnostic

./coco-diagnostics.sh --full

This runs all checks and provides a complete health report.

Example 2: Troubleshoot Attestation Failure

Most attestation failures are due to PCR8 mismatches:

# Check if PCR8 matches
./coco-diagnostics.sh --pcr8

If mismatch detected, the script shows exact commands to fix it.

Example 3: Debug Failing Pod

# Check why my CoCo pod is stuck
./coco-diagnostics.sh --pod fraud-detection/sealed-fraud-detection

Shows pod status, events, container states, and recent logs.

Example 4: Using the Claude Code Skill

In Claude Code:

You: /coco-aro-diagnostics

Claude: I'll help diagnose your CoCo deployment. First, I need to connect to your cluster.

Are you already logged into OpenShift, or should I use credentials?

You: Already logged in

Claude: [Runs systematic diagnostics and provides detailed report with fixes]

Common Issues & Quick Fixes

Issue 1: PCR8 Mismatch

Symptom: Pod stuck in Init state, Trustee logs show "reference value mismatch"

Fix:

# 1. Get initdata hash
INITDATA=$(oc get cm peer-pods-cm -n openshift-sandboxed-containers-operator -o jsonpath='{.data.INITDATA}')
HASH=$(echo "$INITDATA" | base64 -d | gunzip | sha384sum | awk '{print $1}')

# 2. Update reference values
REFVAL_CM=$(oc get kbsconfig -n trustee-operator-system -o json | jq -r '.items[0].spec.kbsRvpRefValuesName')
oc edit configmap "$REFVAL_CM" -n trustee-operator-system
# Update pcr8 value with $HASH

# 3. Restart Trustee
oc rollout restart deployment/trustee-deployment -n trustee-operator-system

Issue 2: Sealed Secret Not Working

Symptom: Pod logs show "Failed to fetch secret"

Fix:

# 1. Create secret in Trustee namespace
oc create secret generic my-secret --from-literal=key=value -n trustee-operator-system

# 2. Add to KbsConfig
oc patch kbsconfig trusteeconfig-kbs-config -n trustee-operator-system \
  --type=json -p='[{"op": "add", "path": "/spec/kbsSecretResources/-", "value": "my-secret"}]'

# 3. Restart Trustee
oc rollout restart deployment/trustee-deployment -n trustee-operator-system

Issue 3: Image Signature Verification Failed

Symptom: Pod fails to start, Trustee logs show policy violation

Fix:

# Check and update image policy
oc edit configmap trustee-image-policy -n trustee-operator-system

# Add your registry to allowed list, then restart
oc rollout restart deployment/trustee-deployment -n trustee-operator-system

Issue 4: Cannot Exec into CoCo Pod

Symptom: oc exec fails even though pod is running

This is expected! Default initdata disables exec for security. Use logs instead:

oc logs <pod-name> -n <namespace>

To enable exec (NOT recommended for production):

  1. Modify initdata in peer-pods-cm to allow exec
  2. Recalculate PCR8 and update Trustee reference values
  3. Restart peer-pods daemonset and Trustee deployment

Architecture Context

Understanding the architecture helps with troubleshooting:

┌─────────────────────────────────────────────────────────────┐
│                     Trusted Environment                      │
│  ┌────────────────────────────────────────────────────────┐ │
│  │  Trustee Operator (trustee-operator-system)            │ │
│  │  - Remote attestation service                          │ │
│  │  - Stores reference values (PCR hashes)                │ │
│  │  - Stores secrets                                      │ │
│  │  - Validates CoCo pods before releasing secrets        │ │
│  └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
                              ▲
                              │ Attestation
                              │ (HTTPS)
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                   Untrusted Environment                      │
│  ┌────────────────────────────────────────────────────────┐ │
│  │  OSC Operator (openshift-sandboxed-containers-operator)│ │
│  │  - Manages peer-pod VMs                                │ │
│  │  - Provides kata-remote runtime                        │ │
│  │  - Contains initdata (measured in PCR8)                │ │
│  └────────────────────────────────────────────────────────┘ │
│  ┌────────────────────────────────────────────────────────┐ │
│  │  CoCo Pod (runs in separate confidential VM)          │ │
│  │  - Uses kata-remote runtime                            │ │
│  │  - Performs attestation at startup                     │ │
│  │  - Fetches sealed secrets from Trustee                 │ │
│  └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Key Points:

  • Trustee (trusted) validates CoCo pods (untrusted) via attestation
  • Initdata is measured and becomes PCR8 - if it changes, PCR8 must be updated in Trustee
  • CoCo pods run in separate confidential VMs, not on worker nodes
  • Sealed secrets are "pointers" that get replaced with real secrets after attestation

Troubleshooting Workflow

  1. Connection - Ensure you can connect to the cluster
  2. Trustee Health - Check Trustee operator is running and configured
  3. OSC Health - Check OSC operator and peer-pods are working
  4. PCR8 Consistency - Verify initdata hash matches Trustee reference values (CRITICAL)
  5. Pod Specific - If a pod is failing, check its logs and events
  6. Attestation Flow - Trace the complete attestation process in Trustee logs

Workshop Reference

This skill is based on the official workshop:

Support

For issues with this skill:

  • Check the workshop documentation first
  • Review Trustee and OSC operator logs
  • Ensure PCR8 consistency (most common issue)
  • Verify initdata configuration

For workshop issues:

  • Refer to workshop GitHub issues
  • Check Red Hat documentation for Trustee and OSC operators

Development

To extend this skill:

  1. Add new checks to coco-diagnostics.sh
  2. Update diagnostic workflow in coco-aro-diagnostics.md
  3. Test against a real ARO cluster with CoCo deployed
  4. Submit improvements via PR

License

This skill is provided as-is for helping users debug Confidential Containers deployments on ARO.

About

This Claude skill helps you figure what went wrong in the CoCo ARO workshop :)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages