Skip to content

perf(startup): static typed command registry + keychain-free startup#1387

Open
sang-neo03 wants to merge 6 commits into
mainfrom
feat/startup-perf
Open

perf(startup): static typed command registry + keychain-free startup#1387
sang-neo03 wants to merge 6 commits into
mainfrom
feat/startup-perf

Conversation

@sang-neo03

@sang-neo03 sang-neo03 commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Summary

Speeds up lark-cli cold start on all three platforms by removing two per-invocation startup costs:

  1. parsing the embedded 1.9 MB meta_data.json into map[string]interface{} on every run, and
  2. decrypting the app secret just to render help text / register identity flags.

Changes

  1. Static typed registry — the embedded command spec is generated at build time into static Go data (internal/registry/metastatic, produced by scripts/fetch_meta.py) instead of parsing JSON at startup. Command tree is byte-identical to the old parse path (verified via cmd/command_tree_dump_test.go); the embedded baseline reads with zero heap allocation; the remote overlay is still decoded at runtime. meta_data.json is no longer embedded.
  2. affordance through the typed registrymetaschema.Method now carries the affordance overlay, so schema --format json keeps emitting _meta.affordance (the old untyped map preserved it implicitly).
  3. Keychain-free startup — brand and strict mode are read via a secret-free ResolveMeta path, so building the command tree no longer decrypts the app secret. On macOS this removes a Keychain (securityd IPC) read on every invocation. The secret is still decrypted on demand when a command actually calls the API (auth status etc. unchanged).
  4. Cleanups — drop an unused var; refresh a stale doc comment.

Behavior change

  • schema --format json object property order is now alphabetical (the natural-order machinery tied to raw meta_data.json bytes was removed). JSON Schema / MCP tool-spec property order is semantically insignificant; --format pretty is unaffected. Asserted in assembler_test.go.

Test plan

  • go test ./... green (registry, schema, service, cmdutil, credential, shortcuts).
  • Zero-allocation static read asserted (TestStaticReadZeroAlloc).
  • Command tree byte-identical verified via dump comparison (8092 lines).
  • Cross-platform measurements above (macOS native / Linux Xeon / Windows Xeon).

Summary by CodeRabbit

Release Notes

  • Performance

    • Command registry is now statically compiled at build time for more efficient initialization and zero-allocation metadata access.
  • Refactor

    • Registry metadata handling migrated to use type-safe definitions for improved internal consistency.
    • Configuration and brand resolution optimized to avoid unnecessary credential operations.

Render the embedded command spec as static Go data (metaschema types + metastatic generator and stub) to eliminate the startup JSON parse and allocation. Behind the larkmeta build tag; the runtime still uses the JSON path (not yet wired). A validation test confirms the static data matches the JSON parse (13 services / 70 resources / 215 methods / 4233 fields) and that reads allocate nothing. Generated meta_data_gen.go is gitignored, like meta_data.json.
Migrate the startup command-tree build path to read compile-time static
Go structs (metastatic.Registry, behind -tags larkmeta) instead of parsing
the 1.9MB meta_data.json into map[string]interface{} (~170K heap objects)
on every invocation.

- typed.go: typed baseline + remote-overlay layer. Reads static data under
  -tags larkmeta (zero parse / zero alloc); falls back to parsing the
  embedded meta_data.json once for builds without the tag (dev/test). A lazy
  typed->map shim (ServiceToMap/MethodToMap) keeps execution-path consumers
  working unchanged.
- service.go: registration chain and NewCmdServiceMethod read typed structs
  directly; map wrappers convert via MapToService/MapToMethod for existing
  tests; method RunE materializes opts.Spec/opts.Method lazily, never at
  startup.
- loader.go/remote.go: baseline and cached remote overlay decode into the
  typed shape; runtime refresh preserved.
- build pipeline: Makefile, goreleaser and pkg-pr-new run the generator and
  build with -tags larkmeta so shipped binaries carry the static data.

Reading the full static registry is 0 B/op, 0 allocs/op; the built command
tree is byte-identical to the JSON-parsed tree (verified via a canonical
tree dump in both modes).
The startup baseline now comes solely from the generated static Go registry
(metastatic.Registry), wired into the stub-declared Registry via a package-level
var plus an init() struct-header copy. No build tag, no committed generated
file, and zero startup allocation is preserved.

- gen.go emits a tag-free `var registryData` + `func init()` instead of a
  //go:build larkmeta top-level `var Registry`; stub.go declares Registry
  unconditionally so the package always compiles
- fetch_meta.py regenerates the static registry after fetching, so every build
  and CI step that fetches also produces it (no separate gen step, no CI change)
- remove the //go:embed meta_data.json baseline and the JSON parse fallback;
  meta_data.json is now only the build-time input to the generator
- EmbeddedSpec/EmbeddedServiceNames read the static baseline; drop the schema
  key-order machinery so envelope field order is alphabetical (JSON Schema
  property order is not semantic; parameterOrder for positional args is intact)
- drop -tags larkmeta from Makefile, .goreleaser.yml, and build-pkg-pr-new.sh

Command tree is byte-identical (8092 lines). registry/schema/cmd unit tests,
the zero-alloc bench, and the e2e dry-run suite all pass.
The static-meta migration modeled only the method fields it explicitly
declared, but the schema assembler still reads method["affordance"] to
build _meta.affordance. metaschema.Method had no Affordance field and
MethodToMap did not backfill it, so any method whose meta_data.json entry
carries an affordance overlay would silently lose _meta.affordance in
`schema --format json` (the embedded-JSON loader preserved it via the
untyped map). Local meta currently has zero affordance entries, so no
existing test caught it.

Model affordance in the typed registry:
- metaschema.Method gains Affordance plus Affordance/AffordanceExample types
- gen.go emits the affordance literal (static, zero-alloc)
- MethodToMap rebuilds method["affordance"] so parseAffordance keeps working

schema JSON mode reads EmbeddedSpec (static baseline via ServiceToMap),
which bypasses the remote overlay for determinism, so the remote/wire and
MapToMethod paths -- never affordance consumers -- are left untouched.

Add TestBuildMeta_AffordanceThroughTypedRegistry covering
Method -> MethodToMap -> buildMeta -> _meta.affordance.
… doc

typedInitialized was only assigned (in resetTyped) and never read, so
staticcheck's U1000 (golangci `unused`) flagged it and would block the
lint gate. Remove the field and its assignment.

Also refresh the OrderedProps comment: the static-registry migration
removed the meta_data.json natural-order machinery, so Order is now
populated alphabetically (orderedKeys). The comment still claimed
natural-order preservation. No behavior change.
… secret

Building the command tree resolved brand (auth login --domain help) and
strict mode (per-command identity-flag registration via ResolveStrictMode)
through f.Config(), which decrypts the app secret. On macOS that is a
Keychain (securityd IPC) read on every invocation -- even --help/--version/
schema/completion, which never use the secret.

brand and strict mode are plain config.json fields, so route them through a
secret-free metadata path:
- credential: add (*DefaultAccountProvider).ResolveMeta and a cached
  (*CredentialProvider).ResolveMeta returning brand + SupportedIdentities,
  via an optional metaResolver capability; external providers fall back to
  ResolveAccount (they do not touch the keychain).
- cmdutil: ResolveStrictMode now uses ResolveMeta; add Factory.ConfigBrand.
- auth login help text and shortcut registration read ConfigBrand.

The app secret is still decrypted on demand when a command actually calls
the API (auth status etc. unchanged). Measured on macOS (GOMAXPROCS=1):
lark-cli --help 35.4ms -> 21.4ms (-40%); no behavior change on Linux/Windows
(local secret backends are cheap).
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR migrates the service registry from untyped map[string]interface{} handling to a statically-generated typed schema. It introduces brand-only credential metadata resolution for keychain-free configuration access, refactors registry initialization and caching to use typed data, and updates service command registration and schema assembly accordingly.

Changes

Static Registry and Typed Metadata

Layer / File(s) Summary
Metaschema types and code generation foundation
internal/registry/metaschema/types.go, internal/registry/metastatic/gen.go, internal/registry/metastatic/stub.go, internal/registry/metastatic_validate_test.go, .gitignore, Makefile, .goreleaser.yml, scripts/build-pkg-pr-new.sh, scripts/fetch_meta.py
New metaschema package defines the Go type hierarchy for meta_data.json registry (Service, Resource, Method, Field, Affordance). Generator in gen.go reads the JSON and emits deterministic meta_data_gen.go with typed literals. Stub variable provides always-compiling baseline. Tests validate populated registry and zero-allocation deep reads. Build scripts and configuration updated to document the generation step.
Credential metadata resolution for brand-only config access
internal/credential/credential_provider.go, internal/credential/default_provider.go
New ResolveMeta(ctx) methods on CredentialProvider and DefaultAccountProvider enable keychain-free brand and strict-mode identity resolution. Caches metadata on first use and includes fallback chain from external providers to default account resolver, returning ok=false when unavailable.
Factory ConfigBrand initialization and integration
internal/cmdutil/factory.go, internal/cmdutil/factory_default.go, internal/cmdutil/testing.go
Factory now includes ConfigBrand lazy accessor that invokes Credential.ResolveMeta without app-secret decryption. NewDefault initializes brand-aware registry setup early. Test factory provides ConfigBrand returning config brand when available. ResolveStrictMode refactored to use metadata-provided identity support.
Commands and shortcuts using brand-only configuration
cmd/auth/login.go, shortcuts/register.go, shortcuts/register_brand_guard_test.go
NewCmdAuthLogin and shortcut registration switched from f.Config() to f.ConfigBrand(), enabling brand-aware command behaviors without full config resolution. Test helper updated to provide ConfigBrand returning the supplied brand.
Typed registry service lookup and caching
internal/registry/typed.go
New module implements thread-safe typed service registry with embedded baseline (metastatic.Registry) overlaid by remote cache. Provides TypedService(name) and TypedServices() accessors. Includes ServiceToMap/MethodToMap shims for legacy map-based consumers, MapToService conversion for tests, and loadCachedTyped to deserialize remote JSON cache with deterministic field ordering.
Registry loader using typed data sources
internal/registry/loader.go
Refactored to delegate service lookup to typed accessors (TypedService, TypedServices) instead of maintaining embedded/merged maps. Exposes EmbeddedSpec and EmbeddedServiceNames for baseline access. InitWithBrand uses typed cache helpers for overlay/refresh decisions.
Remote overlay tests updated for typed registry
internal/registry/remote.go, internal/registry/remote_test.go
doSyncFetch now calls loadCachedTyped() instead of overlayMergedServices. Tests replaced with typed variants (TestRemoteOverlayTyped, TestRemoteOverlayDoesNotPolluteFollowingInit) validating typed overlay isolation. Corrupted cache test uses typed loading. Embedded-services detection checks metastatic.Registry.Services.
Service command registration refactored to typed metadata
cmd/service/service.go, cmd/service/service_test.go
RegisterServiceCommandsWithContext and NewCmdServiceMethodWithContext now accept and iterate typed metaschema.Service/metaschema.Method objects. File-field detection refactored to typed path (detectFileFieldsTyped). Lazy conversion to maps at execution time preserves downstream runtime logic. Supported identities and risk metadata derived from typed AccessTokens and method fields.
Schema assembly: remove key-order tracking, use alphabetical ordering
internal/schema/assembler.go, internal/schema/assembler_test.go, internal/schema/types.go
Removed currentMethodOrder state and context tracking that preserved meta_data.json insertion order. orderedKeys now always returns alphabetically-sorted keys. Tests updated to validate alphabetical property ordering and affordance round-tripping through typed registry conversion. Documentation updated to reflect deterministic alphabetical behavior.
Command tree dump test for static artifact validation
cmd/command_tree_dump_test.go
New test serializes Cobra command tree deterministically: walks hierarchy, sorts annotations/flags, escapes special characters, writes to LARK_TREE_DUMP path. Runs with disabled remote meta and empty config dir for reproducibility.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • larksuite/cli#1048: Schema assembler changes removing key-order indexing and affordance handling are directly tied to deterministic MCP schema envelope generation from embedded metadata.
  • larksuite/cli#280: Registry initialization/reset test state (resetInit, embeddedVersion, remote test overlay behavior) shares test-isolation machinery with the main PR's registry refactor.

Suggested labels

feature, size/L, domain/base

Suggested reviewers

  • liangshuo-1
  • zhengzhijiej-tech
  • YangJunzhou-01

Poem

🐰 A rabbit hops through schemas bright,
From maps to types—what a sight!
No secrets touched, just brands so clear,
Typed metadata saves the day here!
Alphabets sorted, forever stable,
The registry's now truly able. 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 44.12% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main performance improvements: static typed command registry and keychain-free startup, matching the primary changes across all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description covers summary, changes, behavior change, and test plan with sufficient detail and structure.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/startup-perf

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the size/XL Architecture-level or global-impact change label Jun 10, 2026
@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 52.98329% with 197 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.86%. Comparing base (8f5504c) to head (3b6aa7d).
⚠️ Report is 22 commits behind head on main.

Files with missing lines Patch % Lines
internal/registry/typed.go 52.69% 146 Missing and 3 partials ⚠️
internal/credential/credential_provider.go 0.00% 19 Missing ⚠️
internal/credential/default_provider.go 0.00% 8 Missing ⚠️
internal/registry/loader.go 50.00% 8 Missing ⚠️
internal/cmdutil/factory_default.go 0.00% 4 Missing and 1 partial ⚠️
cmd/service/service.go 90.00% 4 Missing ⚠️
internal/cmdutil/testing.go 20.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1387      +/-   ##
==========================================
+ Coverage   71.47%   71.86%   +0.38%     
==========================================
  Files         688      700      +12     
  Lines       65482    66406     +924     
==========================================
+ Hits        46806    47724     +918     
+ Misses      15031    14997      -34     
- Partials     3645     3685      +40     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

Copy link
Copy Markdown

🚀 PR Preview Install Guide

🧰 CLI update

npm i -g https://pkg.pr.new/larksuite/cli/@larksuite/cli@3b6aa7dc6a74df7537aa20eed57b9e77d1c0bdc9

🧩 Skill update

npx skills add larksuite/cli#feat/startup-perf -y -g

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
cmd/command_tree_dump_test.go (1)

71-87: Handle LARK_TREE_DUMP output safely (vfs + validate.SafeOutputPath) in TestDumpCommandTree

  • cmd/command_tree_dump_test.go writes os.WriteFile(out, ...) where out := os.Getenv("LARK_TREE_DUMP"), with no validate.SafeOutputPath (and uses os.* instead of vfs.*).
  • This matches the existing “manual debug output” pattern in shortcuts/register_test.go (TestGenerateShortcutsJSON uses os.Getenv("SHORTCUTS_OUTPUT") + os.WriteFile without validate.SafeOutputPath).

Decide whether TestDumpCommandTree should follow the stricter safety approach used for --output paths elsewhere (e.g., validate.SafeOutputPath + vfs.*), or explicitly treat these env-controlled dump/tests as an exemption for now.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/command_tree_dump_test.go` around lines 71 - 87, The test
TestDumpCommandTree currently writes the environment-controlled path directly
with os.WriteFile, which bypasses path safety checks and the virtual FS; change
it to validate the output path with validate.SafeOutputPath(out) and write using
the project's vfs utilities (e.g., vfs.WriteFile or the repo's vfs API) instead
of os.WriteFile so the test uses the same safe-output and VFS semantics as other
--output code paths; update imports accordingly and keep the rest of
TestDumpCommandTree and dumpCommandTree unchanged.

Source: Coding guidelines

internal/registry/metastatic_validate_test.go (1)

85-90: ⚡ Quick win

Skip benchmark when static registry is not generated.

Unlike the tests, the benchmark doesn’t guard the bare-checkout case. Adding a skip keeps benchmark numbers meaningful.

Suggested patch
 func BenchmarkReadStaticRegistry(b *testing.B) {
+	if len(metastatic.Registry.Services) == 0 {
+		b.Skip("static registry empty; run `make fetch_meta` to generate it")
+	}
 	b.ReportAllocs()
 	for i := 0; i < b.N; i++ {
 		sinkInt = deepReadStatic()
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/registry/metastatic_validate_test.go` around lines 85 - 90,
BenchmarkReadStaticRegistry must skip when the static registry isn't generated;
add a pre-check at the top of BenchmarkReadStaticRegistry that detects whether
the generated static registry exists and calls b.Skip with a clear message if it
does not. Implement a small helper (e.g. staticRegistryGenerated or
ensureStaticRegistryGenerated) that does the existence check (file/marker or
other project-specific indicator) and returns a bool or calls b.Skip, then use
it before invoking deepReadStatic so the benchmark is only run when the static
registry is present.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/registry/metastatic/stub.go`:
- Line 12: Update the top-of-file stub comment in
internal/registry/metastatic/stub.go to reference the current generation command
used by the build flow: replace the outdated "make gen_meta" text with "make
fetch_meta" so contributors running the project on fresh checkouts use the
correct command.

In `@internal/registry/typed.go`:
- Around line 97-126: The function typedServiceNames has a race: it releases
typedMu.RLock() before iterating remoteOverrides, allowing setRemoteOverrides to
mutate it; fix by keeping the RLock for the entire read/snapshot phase — acquire
typedMu.RLock(), check typedNamesCache, compute seen by iterating
baselineServices() and remoteOverrides while still holding the RLock, build the
names slice (or copy map keys) under the RLock, then RUnlock(), acquire
typedMu.Lock(), re-check typedNamesCache (in case it was populated meanwhile)
and only then set typedNamesCache = names and Unlock(); refer to
typedServiceNames, typedMu, typedNamesCache, remoteOverrides and
setRemoteOverrides when making the change.

---

Nitpick comments:
In `@cmd/command_tree_dump_test.go`:
- Around line 71-87: The test TestDumpCommandTree currently writes the
environment-controlled path directly with os.WriteFile, which bypasses path
safety checks and the virtual FS; change it to validate the output path with
validate.SafeOutputPath(out) and write using the project's vfs utilities (e.g.,
vfs.WriteFile or the repo's vfs API) instead of os.WriteFile so the test uses
the same safe-output and VFS semantics as other --output code paths; update
imports accordingly and keep the rest of TestDumpCommandTree and dumpCommandTree
unchanged.

In `@internal/registry/metastatic_validate_test.go`:
- Around line 85-90: BenchmarkReadStaticRegistry must skip when the static
registry isn't generated; add a pre-check at the top of
BenchmarkReadStaticRegistry that detects whether the generated static registry
exists and calls b.Skip with a clear message if it does not. Implement a small
helper (e.g. staticRegistryGenerated or ensureStaticRegistryGenerated) that does
the existence check (file/marker or other project-specific indicator) and
returns a bool or calls b.Skip, then use it before invoking deepReadStatic so
the benchmark is only run when the static registry is present.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6236f1e2-6fd2-4282-9ff3-d4c4d605c5d6

📥 Commits

Reviewing files that changed from the base of the PR and between 8e667db and 3b6aa7d.

📒 Files selected for processing (29)
  • .gitignore
  • .goreleaser.yml
  • Makefile
  • cmd/auth/login.go
  • cmd/command_tree_dump_test.go
  • cmd/service/service.go
  • cmd/service/service_test.go
  • internal/cmdutil/factory.go
  • internal/cmdutil/factory_default.go
  • internal/cmdutil/testing.go
  • internal/credential/credential_provider.go
  • internal/credential/default_provider.go
  • internal/registry/loader.go
  • internal/registry/loader_embedded.go
  • internal/registry/meta_data_default.json
  • internal/registry/metaschema/types.go
  • internal/registry/metastatic/gen.go
  • internal/registry/metastatic/stub.go
  • internal/registry/metastatic_validate_test.go
  • internal/registry/remote.go
  • internal/registry/remote_test.go
  • internal/registry/typed.go
  • internal/schema/assembler.go
  • internal/schema/assembler_test.go
  • internal/schema/types.go
  • scripts/build-pkg-pr-new.sh
  • scripts/fetch_meta.py
  • shortcuts/register.go
  • shortcuts/register_brand_guard_test.go
💤 Files with no reviewable changes (2)
  • internal/registry/meta_data_default.json
  • internal/registry/loader_embedded.go

// value) so the package always compiles, and populated by meta_data_gen.go's
// init() when that generated file is present. On a fresh checkout the generated
// file is absent — it is gitignored and produced at build time by
// `make gen_meta` — so Registry stays empty. This keeps the "heavy spec is

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use the current generation command in the stub comment.

The comment references make gen_meta, but the project’s build flow uses make fetch_meta. This can mislead contributors on fresh checkouts.

Suggested patch
-// `make gen_meta` — so Registry stays empty. This keeps the "heavy spec is
+// `make fetch_meta` — so Registry stays empty. This keeps the "heavy spec is
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// `make gen_meta` — so Registry stays empty. This keeps the "heavy spec is
// `make fetch_meta` — so Registry stays empty. This keeps the "heavy spec is
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/registry/metastatic/stub.go` at line 12, Update the top-of-file stub
comment in internal/registry/metastatic/stub.go to reference the current
generation command used by the build flow: replace the outdated "make gen_meta"
text with "make fetch_meta" so contributors running the project on fresh
checkouts use the correct command.

Comment on lines +97 to +126
func typedServiceNames() []string {
typedMu.RLock()
if typedNamesCache != nil {
out := typedNamesCache
typedMu.RUnlock()
return out
}
typedMu.RUnlock()

seen := make(map[string]bool)
for _, s := range baselineServices() {
seen[s.Name] = true
}
typedMu.RLock()
for name := range remoteOverrides {
seen[name] = true
}
typedMu.RUnlock()

names := make([]string, 0, len(seen))
for n := range seen {
names = append(names, n)
}
sort.Strings(names)

typedMu.Lock()
typedNamesCache = names
typedMu.Unlock()
return names
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Race window between cache check and overlay iteration.

Between releasing RLock at line 104 and re-acquiring it at line 110, setRemoteOverrides can run on another goroutine, modifying remoteOverrides. This creates a window where:

  1. Cache is nil → release lock
  2. Another goroutine calls setRemoteOverrides, adding/modifying services
  3. This goroutine iterates remoteOverrides with partial/stale view

The result is a names list that may be inconsistent with the actual overlay state.

🔒 Suggested fix: hold RLock through the entire computation
 func typedServiceNames() []string {
 	typedMu.RLock()
 	if typedNamesCache != nil {
 		out := typedNamesCache
 		typedMu.RUnlock()
 		return out
 	}
-	typedMu.RUnlock()
 
 	seen := make(map[string]bool)
 	for _, s := range baselineServices() {
 		seen[s.Name] = true
 	}
-	typedMu.RLock()
 	for name := range remoteOverrides {
 		seen[name] = true
 	}
 	typedMu.RUnlock()
 
 	names := make([]string, 0, len(seen))
 	for n := range seen {
 		names = append(names, n)
 	}
 	sort.Strings(names)
 
 	typedMu.Lock()
+	// Double-check: another goroutine may have populated the cache
+	if typedNamesCache != nil {
+		typedMu.Unlock()
+		return typedNamesCache
+	}
 	typedNamesCache = names
 	typedMu.Unlock()
 	return names
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/registry/typed.go` around lines 97 - 126, The function
typedServiceNames has a race: it releases typedMu.RLock() before iterating
remoteOverrides, allowing setRemoteOverrides to mutate it; fix by keeping the
RLock for the entire read/snapshot phase — acquire typedMu.RLock(), check
typedNamesCache, compute seen by iterating baselineServices() and
remoteOverrides while still holding the RLock, build the names slice (or copy
map keys) under the RLock, then RUnlock(), acquire typedMu.Lock(), re-check
typedNamesCache (in case it was populated meanwhile) and only then set
typedNamesCache = names and Unlock(); refer to typedServiceNames, typedMu,
typedNamesCache, remoteOverrides and setRemoteOverrides when making the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Architecture-level or global-impact change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant