-
Notifications
You must be signed in to change notification settings - Fork 31
Add Claude AI test failure analysis to Slack notifications #3381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
robbycochran
wants to merge
4
commits into
master
Choose a base branch
from
add-test-analysis-job
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+511
−12
Open
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,115 @@ | ||
| --- | ||
| name: analyze-test-failures | ||
| description: Analyze test failure artifacts and generate root cause analysis report | ||
| --- | ||
|
|
||
| # Test Failure Analysis | ||
|
|
||
| Analyze test failures from CI artifacts and generate a concise root cause analysis for the oncall team. | ||
|
|
||
| ## Usage | ||
|
|
||
| ``` | ||
| /analyze-test-failures <artifacts-dir> <workflow-name> <failed-jobs> | ||
| ``` | ||
|
|
||
| **Arguments:** | ||
| - `artifacts-dir`: Directory containing test artifacts (default: test-artifacts/) | ||
| - `workflow-name`: Name of the workflow that failed (e.g., "Integration Tests") | ||
| - `failed-jobs`: Comma-separated list of failed job names | ||
|
|
||
| **Example:** | ||
| ``` | ||
| /analyze-test-failures test-artifacts/ "Integration Tests" "amd64-integration-tests,arm64-integration-tests" | ||
| ``` | ||
|
|
||
| ## What This Does | ||
|
|
||
| 1. **Find test reports**: Searches for JUnit XML files (integration-test-report-*.xml, junit.xml) | ||
| 2. **Parse failures**: Extracts test names, error messages, stack traces | ||
| 3. **Investigate code**: Reads failing test source and implementation code | ||
| 4. **Check git history**: Looks for recent changes that may have caused failures | ||
| 5. **Identify patterns**: Detects platform-specific issues (arch/OS) | ||
| 6. **Generate report**: Creates analysis-report.md with findings | ||
|
|
||
| ## Report Format | ||
|
|
||
| The generated `analysis-report.md` contains: | ||
|
|
||
| ```markdown | ||
| **🤖 AI Analysis** | ||
|
|
||
| **Root Cause**: [1-2 sentence summary with file:line references] | ||
|
|
||
| **Evidence**: | ||
| • [Specific code observations] | ||
| • [Patterns across failures] | ||
| • [Recent changes correlation] | ||
|
|
||
| **Affected Platforms**: [Architectures/OS if pattern found] | ||
|
|
||
| **Recommendations**: | ||
| • [Specific file:line to fix with suggested change] | ||
| • [Additional investigation needed] | ||
| • [Prevention strategy] | ||
|
|
||
| --- | ||
| **Statistics** | ||
| • Total Failures: [count] | ||
| • Total Errors: [count] | ||
| • Failed Jobs: [list] | ||
| ``` | ||
|
|
||
| ## Implementation | ||
|
|
||
| Start by finding and parsing test reports: | ||
|
|
||
| ```bash | ||
| # Find all XML test reports | ||
| find <artifacts-dir> -name "*.xml" -type f | ||
| ``` | ||
|
|
||
| For each failure: | ||
| - Read the test source code to understand intent | ||
| - Examine the implementation being tested | ||
| - Check `git log --oneline -20` for recent changes | ||
| - Look for patterns across different platforms | ||
|
|
||
| Generate the report focusing on **actionable insights** for the oncall engineer: | ||
| - File paths and line numbers for fixes | ||
| - Platform-specific patterns (endianness, timing, etc.) | ||
| - Links to similar past failures if found | ||
|
|
||
| Keep the analysis **under 500 words** and emphasize: | ||
| - What broke | ||
| - Why it broke | ||
| - How to fix it | ||
|
|
||
| ## CRITICAL: File Creation Step | ||
|
|
||
| You MUST execute this bash command to create the report file: | ||
|
|
||
| ```bash | ||
| cat > analysis-report.md <<'EOF' | ||
| **🤖 AI Analysis** | ||
|
|
||
| **Root Cause**: [your analysis here] | ||
|
|
||
| **Evidence**: | ||
| • [your findings] | ||
|
|
||
| **Affected Platforms**: [platforms] | ||
|
|
||
| **Recommendations**: | ||
| • [actionable fixes] | ||
|
|
||
| --- | ||
| **Statistics** | ||
| • Total Failures: [count] | ||
| • Failed Jobs: [jobs] | ||
| EOF | ||
| ``` | ||
|
|
||
| DO NOT just summarize your findings - you MUST create the actual file using the bash command above. | ||
|
|
||
| This is a required step. The workflow depends on analysis-report.md existing. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,232 @@ | ||
| # Test Failure Analysis with Claude | ||
|
|
||
| Automatically analyzes test failures using Claude AI and includes intelligent insights in Slack notifications. | ||
|
|
||
| ## Architecture | ||
|
|
||
| ``` | ||
| Integration Tests Run | ||
| ├── amd64-integration-tests (may fail) | ||
| ├── arm64-integration-tests (may fail) | ||
| ├── s390x-integration-tests (may fail) | ||
| ├── ppc64le-integration-tests (may fail) | ||
| │ | ||
| ├── collect-failures | ||
| │ └── Determine which jobs failed | ||
| │ | ||
| └── analyze-and-notify (reusable workflow) | ||
| ├── analyze-failures | ||
| │ ├── Download test artifacts | ||
| │ ├── Execute /analyze-test-failures skill | ||
| │ └── Upload analysis-report.md | ||
| │ | ||
| └── notify | ||
| ├── Download analysis-report.md | ||
| └── Post to Slack with AI insights | ||
| ``` | ||
|
|
||
| ## How It Works | ||
|
|
||
| ### 1. Test Failures | ||
| Any integration test job fails (e.g., `rhcos-arm64`, `cos-logs`) | ||
|
|
||
| ### 2. Collect Failures | ||
| The `collect-failures` job identifies which jobs failed and outputs the list | ||
|
|
||
| ### 3. Analyze Failures (Claude Skill) | ||
| Uses `claude-code-base-action` to execute the `/analyze-test-failures` skill: | ||
|
|
||
| **The skill (`.claude/commands/analyze-test-failures.md`):** | ||
| - Finds and parses JUnit XML test reports | ||
| - Reads failing test source code | ||
| - Examines implementation code being tested | ||
| - Checks git log for recent changes | ||
| - Identifies platform-specific patterns (arch/OS) | ||
| - Creates `analysis-report.md` with actionable insights | ||
|
|
||
| **Claude has access to:** | ||
| - `Skill` - Load and execute the analysis skill | ||
| - `Read` - View source files | ||
| - `Grep` - Search codebase | ||
| - `Glob` - Find files | ||
| - `Bash` - Execute git commands, create reports | ||
|
|
||
| ### 4. Notify | ||
| Posts to Slack (#team-acs-collector-oncall) with: | ||
| - AI-generated root cause analysis | ||
| - Evidence from code and logs | ||
| - Platform-specific patterns detected | ||
| - Actionable recommendations with file:line references | ||
|
|
||
| Falls back to simple notification if analysis fails. | ||
|
|
||
| ## Files | ||
|
|
||
| ### Workflows | ||
| - `.github/workflows/integration-tests.yml` - Main integration test workflow | ||
| - `.github/workflows/analyze-and-notify.yml` - Reusable analysis workflow | ||
|
|
||
| ### Skill | ||
| - `.claude/commands/analyze-test-failures.md` - Claude skill defining analysis logic | ||
|
|
||
| ## Example Output | ||
|
|
||
| **Slack message with AI analysis:** | ||
| ``` | ||
| @acs-collector-oncall | ||
|
|
||
| 🤖 AI Analysis | ||
|
|
||
| **Root Cause**: NetworkSignalHandler.cpp:245 missing ntohs() call | ||
| causing UDP checksum failures on ARM64 platforms. | ||
|
|
||
| **Evidence**: | ||
| • UDP test failures isolated to arm64 runners (rhcos-arm64, cos-arm64) | ||
| • Checksum comparison uses direct equality without byte order conversion | ||
| • Recent commit abc123f modified network packet handling | ||
| • Tests pass on amd64 where byte order matches | ||
|
|
||
| **Affected Platforms**: arm64 (rhcos-arm64, cos-arm64, ubuntu-arm) | ||
|
|
||
| **Recommendations**: | ||
| • Fix collector/lib/NetworkSignalHandler.cpp:245 - add ntohs() call | ||
| • Add endianness test to integration suite | ||
| • Review other protocol handlers for similar issues | ||
|
|
||
| --- | ||
| **Statistics** | ||
| • Total Failures: 2 | ||
| • Failed Jobs: rhcos-arm64, cos-arm64 | ||
| ``` | ||
|
|
||
| ## How It's Different from Manual Analysis | ||
|
|
||
| **Before:** Generic notification | ||
| ``` | ||
| @acs-collector-oncall | ||
| Integration tests failed. | ||
| ``` | ||
|
|
||
| **After:** Actionable analysis with Claude | ||
| - Specific file and line number to fix | ||
| - Root cause explanation based on code analysis | ||
| - Platform/architecture pattern detection | ||
| - Links recent git changes to failures | ||
| - Provides concrete next steps | ||
|
|
||
| ## Testing | ||
|
|
||
| ### Test on a PR | ||
|
|
||
| Add the label `test-oncall-workflow` to any PR to trigger the workflow. | ||
|
|
||
| **What happens:** | ||
| - Workflow runs with empty test artifacts | ||
| - Claude analyzes and generates a report | ||
| - Report is uploaded as artifact | ||
| - **Slack notification is skipped** (only runs on actual test failures) | ||
|
|
||
| **Use case:** Verify Claude analysis executes without spamming Slack. | ||
|
|
||
| **To verify it worked:** | ||
| 1. Check the workflow run in Actions tab | ||
| 2. Download the `failure-analysis` artifact to see the generated report | ||
|
|
||
| ### Test with Real Failures | ||
|
|
||
| The best test is observing the workflow on actual test failures: | ||
| 1. Wait for integration tests to fail naturally | ||
| 2. Check #team-acs-collector-oncall for the AI analysis | ||
| 3. Verify the analysis is helpful and actionable | ||
|
|
||
| ## Configuration | ||
|
|
||
| ### Vertex AI Region | ||
| Set in `.github/workflows/analyze-and-notify.yml`: | ||
| ```yaml | ||
| env: | ||
| CLOUD_ML_REGION: us-east5 | ||
| ``` | ||
|
|
||
| ### Required Secrets | ||
|
|
||
| Already configured: | ||
| - `GCP_CLAUDE_SERVICE_ACCOUNT_KEY` - Service account JSON for Vertex AI | ||
| - `GCP_CLAUDE_PROJECT_ID` - GCP project ID | ||
| - `SLACK_COLLECTOR_ONCALL_WEBHOOK` - Slack webhook URL | ||
|
|
||
| ### Allowed Tools | ||
|
|
||
| Claude has access to these tools for investigation: | ||
| ```yaml | ||
| allowed_tools: "Skill,Read,Grep,Glob,Bash" | ||
| ``` | ||
|
|
||
| ### Reusable Workflow Inputs | ||
|
|
||
| The `analyze-and-notify.yml` workflow accepts: | ||
| - `failed-jobs` - Comma-separated list of failed job names | ||
| - `workflow-name` - Name of the workflow that failed | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| ### No Analysis Report Generated | ||
|
|
||
| **Check:** | ||
| 1. Claude action step logs - did it execute successfully? | ||
| 2. "Check if analysis report was created" step - does file exist? | ||
| 3. Skill file exists at `.claude/commands/analyze-test-failures.md` | ||
| 4. `Skill` tool is in `allowed_tools` | ||
|
|
||
| ### Vertex AI Errors | ||
|
|
||
| **Common issues:** | ||
| - Model not available in configured region | ||
| - Service account lacks `roles/aiplatform.user` permission | ||
| - `GCP_CLAUDE_PROJECT_ID` secret not set correctly | ||
|
|
||
| **Solution:** | ||
| Check Claude action logs for specific error details. | ||
|
|
||
| ### No Slack Notification | ||
|
|
||
| **Check:** | ||
| 1. `SLACK_COLLECTOR_ONCALL_WEBHOOK` secret is set | ||
| 2. Notify job logs show download step succeeded | ||
| 3. Webhook URL is valid | ||
|
|
||
| ### Analysis Quality Issues | ||
|
|
||
| **If Claude's analysis is not helpful:** | ||
| 1. Check that test artifacts are being uploaded correctly | ||
| 2. Verify JUnit XML format is valid | ||
| 3. Update skill instructions in `.claude/commands/analyze-test-failures.md` | ||
| 4. The skill can be iterated on independently of the workflow | ||
|
|
||
| ## Local Development | ||
|
|
||
| ### Test the Skill Locally | ||
|
|
||
| ```bash | ||
| # Requires Claude CLI installed | ||
| claude /analyze-test-failures test-artifacts/ "Integration Tests" "rhcos-arm64,cos" | ||
| ``` | ||
|
|
||
| ### Update the Skill | ||
|
|
||
| Edit `.claude/commands/analyze-test-failures.md` to: | ||
| - Change analysis instructions | ||
| - Update report format | ||
| - Add new investigation steps | ||
| - Modify recommendations structure | ||
|
|
||
| Changes take effect on the next workflow run - no workflow YAML changes needed. | ||
|
|
||
| ## Future Enhancements | ||
|
|
||
| - [ ] Correlate failures with specific PR/commit | ||
| - [ ] Track failure patterns over time | ||
| - [ ] Link to similar historical failures | ||
| - [ ] Auto-create issues for recurring failures | ||
| - [ ] Support for other test frameworks beyond JUnit XML | ||
| - [ ] Integration with test retries/flakiness detection |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this in
.github/scripts? Also, is this just a description of whatanalyze-test-failures.mddoes? Do we need a 200+ lines of markdown to explain what a separate 100+ line markdown file does?