Skip to content

Fix provider fetch stability issues#143

Open
Daltonganger wants to merge 1 commit into
opgginc:mainfrom
Daltonganger:fix/provider-fetch-stability
Open

Fix provider fetch stability issues#143
Daltonganger wants to merge 1 commit into
opgginc:mainfrom
Daltonganger:fix/provider-fetch-stability

Conversation

@Daltonganger

Copy link
Copy Markdown
Contributor

Summary

  • Read OpenAI provider API keys from opencode.jsonc so custom Codex-compatible endpoints like codex.2631.eu can be queried correctly, and decode upstream_limits usage windows.
  • Add Gemini CLI and Antigravity fallbacks for missing project IDs and retry Gemini refresh with the standard Gemini CLI OAuth client when the stored client fails.
  • Prevent OpenCode Zen refreshes from hanging indefinitely by adding an internal opencode stats timeout/terminate/kill path and fixing stale process cleanup for macOS ps etime output.
  • Retry transient Z.AI network/TLS failures instead of surfacing intermittent URLSession TLS errors immediately.

Verification

  • make setup
  • git diff --check
  • xcodebuild -project CopilotMonitor/CopilotMonitor.xcodeproj -scheme CopilotMonitor -configuration Release -destination 'generic/platform=macOS' -derivedDataPath build/DerivedData-pr-provider-fetch-stability CODE_SIGNING_ALLOWED=NO build
  • Built product verification: app and embedded CLI are universal x86_64 arm64, version/build 2.11.1 / 2.11.1.
  • Built CLI smoke checks:
    • opencodebar-cli provider zai_coding_plan --json
    • opencodebar-cli provider chatgpt --json
    • opencodebar-cli provider gemini --json
    • opencodebar-cli provider antigravity --json
    • opencodebar-cli provider opencode_zen --json

@op-gg-ai-devops-refugee

op-gg-ai-devops-refugee Bot commented Jun 14, 2026

Copy link
Copy Markdown

✅ AI Code Review Completed

Review finished. Check the PR for inline comments.


📋 View Logs | 🤖 Model: anthropic/claude-opus-4-8|high

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e161afafeb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

limits = (try? container.decodeIfPresent([SelfServiceLimit].self, forKey: .limits)) ?? []
let localLimits = (try? container.decodeIfPresent([SelfServiceLimit].self, forKey: .limits)) ?? []
let upstreamLimits = (try? container.decodeIfPresent([SelfServiceLimit].self, forKey: .upstreamLimits)) ?? []
limits = localLimits + upstreamLimits

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve upstream Codex windows when merging limits

When a codex-lb API-key response contains both per-key limits and account-level upstream_limits, concatenating them into one untagged list makes partitionSelfServiceLimits(...).base.first/last choose a single primary/secondary window from the mixed set. If the key has its own weekly or monthly budget, the dropdown can show that API-key budget instead of the upstream 5h/7d Codex account quota this change is trying to expose; keep the two sources separate or prefer upstream windows for the Codex quota display.

Useful? React with 👍 / 👎.

@op-gg-ai-devops-refugee op-gg-ai-devops-refugee Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provider fetch stability — solid core, two risky shortcuts

Honestly the OpenCodeZen rework is the best part of this PR and it's properly done. I traced every exit path of runOpenCodeStats:

  • withCheckedThrowingContinuation doesn't observe Task cancellation, so a hung opencode stats used to leak the continuation until the process died on its own. The new finish() (NSLock + didResume guard) + the asyncAfter timeout that terminate()s then SIGKILLs the pid actually fixes that. The lock guarantees exactly one resume, and the late terminationHandler after a timeout is a clean no-op. No double-resume, no leak. Nice.
  • Draining ps stdout before waitUntilExit() removes a real pipe-buffer deadlock. Good catch.
  • etimes -> etime is the right fix — macOS ps has no etimes keyword, so the old code was silently parsing nothing on macOS. parsePSElapsedSeconds handles [[dd-]hh:]mm:ss correctly.

The Z.AI retry/backoff + 30s fetchTimeout + isTransientNetworkError is also a clean, on-plan stability win.

That said, two changes trade an honest failure for a possibly-wrong result, and I can't sign off on those as-is.

Blocking

  1. "default" GCP project (Gemini + Antigravity). Both providers now POST {"project":"default"} to retrieveUserQuota when the real project id is missing. "default" isn't a real GCP project, and per the existing convention (and the comment you just softened) the project param is required to get full model coverage incl. gemini-3. Best case the API rejects it and Antigravity — which used to return nil cleanly — does extra work to fail anyway; worst case it returns a reduced/default quota set and the menu shows plausible-but-wrong numbers with no error. The previous throw/nil was the correct behavior. Either keep the hard failure or verify against the live API that "default" returns complete data (and surface a visible "partial data" state).
  2. Codex limits + upstream_limits concat. partitionSelfServiceLimits never dedupes, and buildSelfServicePayload reads base.first/base.last. If both arrays carry the same window you either render it twice (primary == secondary) or silently drop one source's value. Dedupe by window key before partitioning.

CI is red

SwiftLint is failing on duplicate_imports because of the duplicated import Foundation in ZaiCodingPlanProvider.swift. That one-line fix unblocks the lint job.

Non-blocking / FYI

  • The new parsePSElapsedSeconds and isTransientNetworkError are deterministic, edge-case-prone helpers and the repo already has OpenCodeZenProviderTests/ZaiCodingPlanProviderTests — a couple of table-driven tests ("03:14", "1:02:03", "2-01:05:30", malformed; transient vs non-transient) would be cheap insurance.
  • OpenCodeZen's process-timeout/kill logic now overlaps with KiroProvider's task-group approach. Not blocking, but a shared runProcessWithTimeout helper would stop the two from drifting.
  • getOpenAIAccounts prepending a config-derived API-key account is dedupe-safe against the same key in auth.json (token:<key>), but it renders with no email/accountId — confirm it won't show a confusing extra "Unknown" row next to a real OAuth Codex login.

Inline comments have the specifics.

throw ProviderError.authenticationFailed("Missing project ID for account #\(accountIndex + 1)")
}
let configuredProjectId = account.projectId.trimmingCharacters(in: .whitespacesAndNewlines)
let projectId = configuredProjectId.isEmpty ? "default" : configuredProjectId

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major: Bug default project fallback: silent partial quota
"default" isn't a valid GCP project id, and this used to throw when the project was missing. Now you POST {"project":"default"} to retrieveUserQuota. The existing convention (and the comment you softened on line 286) says project is required for full model coverage including gemini-3 variants — so this can silently return a reduced quota set and the menu shows wrong-but-plausible numbers with no error.

Keep the hard failure when the project id is genuinely missing, or verify against the live API that "default" returns complete data and surface a visible "partial data" indicator. Either way please restore the accurate "required" wording in the comment.

limits = (try? container.decodeIfPresent([SelfServiceLimit].self, forKey: .limits)) ?? []
let localLimits = (try? container.decodeIfPresent([SelfServiceLimit].self, forKey: .limits)) ?? []
let upstreamLimits = (try? container.decodeIfPresent([SelfServiceLimit].self, forKey: .upstreamLimits)) ?? []
limits = localLimits + upstreamLimits

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major: Bug limits + upstream_limits can double-count windows
Concatenating limits and upstream_limits without dedupe is risky: partitionSelfServiceLimits sorts but never dedupes, and buildSelfServicePayload reads base.first/base.last. If both arrays describe the same window (e.g. both carry a 5h entry), you either render the same window twice as primary+secondary, or silently keep one source's value and drop the other's.

Dedupe on merge by a window key (limitWindow + modelFilter/limitType), preferring one source deterministically, before partitioning — or document why overlap is impossible.

} else if !fallbackProjectId.isEmpty {
projectId = fallbackProjectId
} else {
projectId = "default"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major: Bug default project fallback masks missing project
Same "default" concern as in GeminiCLIProvider. This path previously returned nil (no fallback). Substituting a literal "default" project id means the fallback either fails anyway after extra work, or returns a default-tier quota that gets shown as if it were real. If "default" is not verified to return complete data from retrieveUserQuota, keep returning nil here.

)
if refreshedAccessToken == nil {
logger.info("Gemini CLI: Primary OAuth client failed for account #\(accountIndex + 1); trying Gemini CLI OAuth client fallback")
refreshedAccessToken = await tokenManager.refreshGeminiAccessToken(refreshToken: account.refreshToken)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Perf OAuth fallback can repeat identical request
This fallback only does something different when account.clientId is the plugin client. When the account already uses the default geminiClientId/geminiClientSecret, omitting them here re-issues the exact same OAuth request that just failed — one more round-trip + log line before throwing. Worth guarding with account.clientId != TokenManager.geminiClientId so it only retries when the credentials actually differ.

@@ -1,4 +1,5 @@
import Foundation
import Foundation

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Style duplicate import fails SwiftLint
Duplicate import Foundation. This is what's failing the SwiftLint CI job (duplicate_imports). Drop this line to go green.

Comment on lines +393 to +394
let components = timePart.split(separator: ":").compactMap { Int($0) }
guard components.count == timePart.split(separator: ":").count else { return nil }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Style redundant double split in elapsed parser
timePart.split(separator: ":") is computed twice just to validate all segments parsed. Split once into a local and compare counts:

let rawComponents = timePart.split(separator: ":")
let components = rawComponents.compactMap { Int($0) }
guard components.count == rawComponents.count else { return nil }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant