Fix provider fetch stability issues#143
Conversation
✅ AI Code Review CompletedReview finished. Check the PR for inline comments. 📋 View Logs | 🤖 Model: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e161afafeb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| limits = (try? container.decodeIfPresent([SelfServiceLimit].self, forKey: .limits)) ?? [] | ||
| let localLimits = (try? container.decodeIfPresent([SelfServiceLimit].self, forKey: .limits)) ?? [] | ||
| let upstreamLimits = (try? container.decodeIfPresent([SelfServiceLimit].self, forKey: .upstreamLimits)) ?? [] | ||
| limits = localLimits + upstreamLimits |
There was a problem hiding this comment.
Preserve upstream Codex windows when merging limits
When a codex-lb API-key response contains both per-key limits and account-level upstream_limits, concatenating them into one untagged list makes partitionSelfServiceLimits(...).base.first/last choose a single primary/secondary window from the mixed set. If the key has its own weekly or monthly budget, the dropdown can show that API-key budget instead of the upstream 5h/7d Codex account quota this change is trying to expose; keep the two sources separate or prefer upstream windows for the Codex quota display.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Provider fetch stability — solid core, two risky shortcuts
Honestly the OpenCodeZen rework is the best part of this PR and it's properly done. I traced every exit path of runOpenCodeStats:
withCheckedThrowingContinuationdoesn't observe Task cancellation, so a hungopencode statsused to leak the continuation until the process died on its own. The newfinish()(NSLock +didResumeguard) + theasyncAftertimeout thatterminate()s thenSIGKILLs the pid actually fixes that. The lock guarantees exactly oneresume, and the lateterminationHandlerafter a timeout is a clean no-op. No double-resume, no leak. Nice.- Draining
psstdout beforewaitUntilExit()removes a real pipe-buffer deadlock. Good catch. etimes->etimeis the right fix — macOSpshas noetimeskeyword, so the old code was silently parsing nothing on macOS.parsePSElapsedSecondshandles[[dd-]hh:]mm:sscorrectly.
The Z.AI retry/backoff + 30s fetchTimeout + isTransientNetworkError is also a clean, on-plan stability win.
That said, two changes trade an honest failure for a possibly-wrong result, and I can't sign off on those as-is.
Blocking
"default"GCP project (Gemini + Antigravity). Both providers now POST{"project":"default"}toretrieveUserQuotawhen the real project id is missing."default"isn't a real GCP project, and per the existing convention (and the comment you just softened) theprojectparam is required to get full model coverage incl. gemini-3. Best case the API rejects it and Antigravity — which used to returnnilcleanly — does extra work to fail anyway; worst case it returns a reduced/default quota set and the menu shows plausible-but-wrong numbers with no error. The previousthrow/nilwas the correct behavior. Either keep the hard failure or verify against the live API that"default"returns complete data (and surface a visible "partial data" state).- Codex
limits + upstream_limitsconcat.partitionSelfServiceLimitsnever dedupes, andbuildSelfServicePayloadreadsbase.first/base.last. If both arrays carry the same window you either render it twice (primary == secondary) or silently drop one source's value. Dedupe by window key before partitioning.
CI is red
SwiftLint is failing on duplicate_imports because of the duplicated import Foundation in ZaiCodingPlanProvider.swift. That one-line fix unblocks the lint job.
Non-blocking / FYI
- The new
parsePSElapsedSecondsandisTransientNetworkErrorare deterministic, edge-case-prone helpers and the repo already hasOpenCodeZenProviderTests/ZaiCodingPlanProviderTests— a couple of table-driven tests ("03:14","1:02:03","2-01:05:30", malformed; transient vs non-transient) would be cheap insurance. - OpenCodeZen's process-timeout/kill logic now overlaps with KiroProvider's task-group approach. Not blocking, but a shared
runProcessWithTimeouthelper would stop the two from drifting. getOpenAIAccountsprepending a config-derived API-key account is dedupe-safe against the same key inauth.json(token:<key>), but it renders with no email/accountId — confirm it won't show a confusing extra "Unknown" row next to a real OAuth Codex login.
Inline comments have the specifics.
| throw ProviderError.authenticationFailed("Missing project ID for account #\(accountIndex + 1)") | ||
| } | ||
| let configuredProjectId = account.projectId.trimmingCharacters(in: .whitespacesAndNewlines) | ||
| let projectId = configuredProjectId.isEmpty ? "default" : configuredProjectId |
There was a problem hiding this comment.
default project fallback: silent partial quota
"default" isn't a valid GCP project id, and this used to throw when the project was missing. Now you POST {"project":"default"} to retrieveUserQuota. The existing convention (and the comment you softened on line 286) says project is required for full model coverage including gemini-3 variants — so this can silently return a reduced quota set and the menu shows wrong-but-plausible numbers with no error.
Keep the hard failure when the project id is genuinely missing, or verify against the live API that "default" returns complete data and surface a visible "partial data" indicator. Either way please restore the accurate "required" wording in the comment.
| limits = (try? container.decodeIfPresent([SelfServiceLimit].self, forKey: .limits)) ?? [] | ||
| let localLimits = (try? container.decodeIfPresent([SelfServiceLimit].self, forKey: .limits)) ?? [] | ||
| let upstreamLimits = (try? container.decodeIfPresent([SelfServiceLimit].self, forKey: .upstreamLimits)) ?? [] | ||
| limits = localLimits + upstreamLimits |
There was a problem hiding this comment.
limits + upstream_limits can double-count windows
Concatenating limits and upstream_limits without dedupe is risky: partitionSelfServiceLimits sorts but never dedupes, and buildSelfServicePayload reads base.first/base.last. If both arrays describe the same window (e.g. both carry a 5h entry), you either render the same window twice as primary+secondary, or silently keep one source's value and drop the other's.
Dedupe on merge by a window key (limitWindow + modelFilter/limitType), preferring one source deterministically, before partitioning — or document why overlap is impossible.
| } else if !fallbackProjectId.isEmpty { | ||
| projectId = fallbackProjectId | ||
| } else { | ||
| projectId = "default" |
There was a problem hiding this comment.
default project fallback masks missing project
Same "default" concern as in GeminiCLIProvider. This path previously returned nil (no fallback). Substituting a literal "default" project id means the fallback either fails anyway after extra work, or returns a default-tier quota that gets shown as if it were real. If "default" is not verified to return complete data from retrieveUserQuota, keep returning nil here.
| ) | ||
| if refreshedAccessToken == nil { | ||
| logger.info("Gemini CLI: Primary OAuth client failed for account #\(accountIndex + 1); trying Gemini CLI OAuth client fallback") | ||
| refreshedAccessToken = await tokenManager.refreshGeminiAccessToken(refreshToken: account.refreshToken) |
There was a problem hiding this comment.
OAuth fallback can repeat identical request
This fallback only does something different when account.clientId is the plugin client. When the account already uses the default geminiClientId/geminiClientSecret, omitting them here re-issues the exact same OAuth request that just failed — one more round-trip + log line before throwing. Worth guarding with account.clientId != TokenManager.geminiClientId so it only retries when the credentials actually differ.
| @@ -1,4 +1,5 @@ | |||
| import Foundation | |||
| import Foundation | |||
| let components = timePart.split(separator: ":").compactMap { Int($0) } | ||
| guard components.count == timePart.split(separator: ":").count else { return nil } |
There was a problem hiding this comment.
redundant double split in elapsed parser
timePart.split(separator: ":") is computed twice just to validate all segments parsed. Split once into a local and compare counts:
let rawComponents = timePart.split(separator: ":")
let components = rawComponents.compactMap { Int($0) }
guard components.count == rawComponents.count else { return nil }
Summary
opencode.jsoncso custom Codex-compatible endpoints likecodex.2631.eucan be queried correctly, and decodeupstream_limitsusage windows.opencode statstimeout/terminate/kill path and fixing stale process cleanup for macOSps etimeoutput.Verification
make setupgit diff --checkxcodebuild -project CopilotMonitor/CopilotMonitor.xcodeproj -scheme CopilotMonitor -configuration Release -destination 'generic/platform=macOS' -derivedDataPath build/DerivedData-pr-provider-fetch-stability CODE_SIGNING_ALLOWED=NO buildx86_64 arm64, version/build2.11.1/2.11.1.opencodebar-cli provider zai_coding_plan --jsonopencodebar-cli provider chatgpt --jsonopencodebar-cli provider gemini --jsonopencodebar-cli provider antigravity --jsonopencodebar-cli provider opencode_zen --json