[i,l,-r] fix(slack): @Singleton producer, exclude requester, log root causes#8
Merged
Merged
Conversation
…ot causes Four small fixes shaken out of the first end-to-end test on the staged version: 1. produceProposalHandler is now @singleton Previously @dependent (CDI default for @produces with no scope), so every request that needed a ProposalHandler was rebuilding everything the factory constructs — including a fresh Firestore client and its gRPC ManagedChannel. Each abandoned channel surfaced as an ERROR- severity "ManagedChannel allocation site" stack trace in stderr when gRPC's leak detector eventually fired, polluting logs. Singleton- scope means one construction per JVM, no leak. 2. Filter the requester out of the expanded reviewer set AbstractProposalHandler / JoinOperation.propose drops the requester at the *principal* level (group ≠ user), so when the policy ACL names a group that the requester is in (the typical "team approves team" case used by 0-test-mpa-flow), the requester re-appears after GroupResolver.expand. We'd send them a DM about their own request. Drop them with a case-insensitive email match. 3. Log the root cause before re-throwing in onOperationProposed AbstractProposalHandler.propose catches AccessException | IOException and rethrows a generic AccessDeniedException("Creating a proposal failed", e), which surfaces as a 403 in the JIT UI with no detail. The first E2E test hit a Slack 'missing_scope' on users.lookupByEmail — the per-reviewer slack.dm.failed log captured the symptom, but the IOException("Slack DM delivery failed for every reviewer") that propagated up was emitted with no log of its own. Add explicit error logs at the three throw sites: - slack.fingerprint.failed (AccessException from GroupResolver) - slack.noReviewers (zero individuals after expansion) - slack.allDmsFailed (every per-reviewer DM failed) 4. Bump pom version to 2.3.0-wavemm.2 Forces a new App Engine version_id on the next deploy, so the bot token re-fetched from Secret Manager (rotated to add users:read.email scope) actually takes effect. App Engine doesn't allow mutating env vars on an existing version, and our version_id derives from the source ZIP hash — without the bump TF would fall into the same delete-and-recreate-with-same-id 409 trap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Four small fixes from the first end-to-end test on the staged App Engine version (`rev-c2f171f7a17f0f51`).
What was happening
User submitted MPA on `0-test-mpa-flow` → JIT UI showed `HTTP 403: error`. Logs revealed:
```
Failed to DM reviewer adam.rapley@wave.com for jit-group:production.gcp.0-test-mpa-flow:
users.lookupByEmail failed: missing_scope
```
Root cause was a missing Slack scope (`users:read.email`) on the bot — fixed out-of-band (added scope, reinstalled, rotated Secret Manager). But the bug surfaced three pieces of latent bad behaviour worth fixing.
Fixes
1. `produceProposalHandler` is now `@Singleton`
Was `@Dependent` (CDI default), so every request rebuilt the factory output — including a fresh `Firestore` client + gRPC `ManagedChannel`. Each abandoned channel surfaced as an ERROR-severity `ManagedChannel allocation site` stack trace in stderr when gRPC's leak detector fired. `@Singleton` = one construction per JVM, no leak.
2. Filter the requester out of the expanded reviewer set
`JoinOperation.propose` drops the requester at the principal level (group ≠ user). When the policy ACL names a group that the requester is in (`0-test-mpa-flow` is exactly this — security ↔ security), the requester re-appears after `GroupResolver.expand`. They'd get a DM about their own request. Filter post-expansion with case-insensitive email match.
3. Log root causes before re-throwing in `onOperationProposed`
`AbstractProposalHandler.propose` catches `AccessException | IOException` and re-throws a generic `AccessDeniedException("Creating a proposal failed", e)` → 403 in the UI with no detail. The first E2E test hit Slack 403 `missing_scope` — per-reviewer `slack.dm.failed` log captured the symptom, but the `IOException("Slack DM delivery failed for every reviewer")` that propagated up was silent. Add explicit error logs at the three throw sites:
4. Bump pom version to `2.3.0-wavemm.2`
Forces a new App Engine `version_id` on the next deploy, so the rotated bot token (with `users:read.email` scope added) actually takes effect. App Engine doesn't allow mutating env vars on an existing version + our `version_id` is derived from the source ZIP hash + same-id replace = 409 → we keep bumping the trailing patch.
Test plan
🤖 Generated with Claude Code