LCORE-1426: BYOK Config refactoring#1843
Conversation
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (33)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
🧰 Additional context used📓 Path-based instructions (8)tests/e2e/**/*.{py,feature}📄 CodeRabbit inference engine (AGENTS.md)
Files:
src/**/*.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
src/**/configuration.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
src/models/**/*.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
tests/**/*.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
tests/**/conftest.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
src/app/**/*.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
src/constants.py📄 CodeRabbit inference engine (AGENTS.md)
Files:
🧠 Learnings (21)📓 Common learnings📚 Learning: 2026-06-03T11:15:38.136ZApplied to files:
📚 Learning: 2026-05-20T08:09:30.641ZApplied to files:
📚 Learning: 2025-09-02T11:09:40.404ZApplied to files:
📚 Learning: 2025-09-02T11:15:02.411ZApplied to files:
📚 Learning: 2026-05-06T06:57:52.173ZApplied to files:
📚 Learning: 2026-05-20T08:09:36.724ZApplied to files:
📚 Learning: 2026-04-13T13:39:59.316ZApplied to files:
📚 Learning: 2025-08-18T10:58:14.951ZApplied to files:
📚 Learning: 2026-03-17T11:34:53.242ZApplied to files:
📚 Learning: 2026-01-12T10:58:40.230ZApplied to files:
📚 Learning: 2026-02-25T07:46:33.545ZApplied to files:
📚 Learning: 2025-12-18T10:21:09.038ZApplied to files:
📚 Learning: 2026-04-06T20:18:07.852ZApplied to files:
📚 Learning: 2026-05-20T08:09:43.391ZApplied to files:
📚 Learning: 2026-05-20T08:09:34.319ZApplied to files:
📚 Learning: 2026-05-06T06:57:52.173ZApplied to files:
📚 Learning: 2026-04-15T18:54:09.157ZApplied to files:
📚 Learning: 2026-02-23T14:56:59.186ZApplied to files:
📚 Learning: 2026-04-07T14:44:42.022ZApplied to files:
📚 Learning: 2026-04-20T15:09:48.726ZApplied to files:
🪛 GitHub Actions: OpenAPI (Spectral) / 0_spectral.txtdocs/openapi.json[error] 1-1: OpenAPI schema is out of date. 'diff -u docs/openapi.json /tmp/openapi-generated.json' failed; regenerate by running: 'uv run scripts/generate_openapi_schema.py docs/openapi.json'. 🪛 GitHub Actions: OpenAPI (Spectral) / spectraldocs/openapi.json[error] 1-1: CI failed because docs/openapi.json is out of date. Diff detected between docs/openapi.json and generated schema. Regenerate with: uv run python scripts/generate_openapi_schema.py docs/openapi.json 🪛 markdownlint-cli2 (0.22.1)docs/rag_guide.md[warning] 42-42: Heading levels should only increment by one level at a time (MD001, heading-increment) 🛑 Comments failed to post (3)
🔇 Additional comments (31)
WalkthroughRefactors the RAG configuration into a unified nested ChangesUnified RAG Configuration Refactor
Estimated code review effort:
✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
|
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/models/config.py (1)
1765-2053: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winComplete the new model docstrings to match the repo standard.
These new config models only have one-line summaries. Please expand
RagStore,RetrievalStrategyConfiguration,RetrievalConfiguration,ByokConfiguration, andRagConfigurationwith full Google-style class docstrings, especially anAttributessection.As per coding guidelines,
src/models/**/*.py: Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/models/config.py` around lines 1765 - 2053, Update the one-line class docstrings for RagStore, RetrievalStrategyConfiguration, RetrievalConfiguration, ByokConfiguration, and RagConfiguration to full Google-style docstrings: include a short summary, a longer description, and an Attributes section listing each model field (e.g., RagStore: rag_id, backend, embedding_model, embedding_dimension, vector_db_id, db_path, score_multiplier; RetrievalStrategyConfiguration: sources, max_chunks; RetrievalConfiguration: inline, tool; ByokConfiguration: max_chunks, stores; RagConfiguration: byok, okp, retrieval) with types and brief descriptions; ensure the docstrings follow the repo guideline (Google-style) and include the required sections (Parameters/Returns/Raises where applicable and an Attributes section for the class fields).src/utils/vector_search.py (1)
574-595:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPass
rag.okp.max_chunksinto the Solr query itself.
limitis only applied after the query returns. Because_build_query_params()still uses its defaultk, anyrag.okp.max_chunksvalue above that default can never be returned, so the new config is only partially honored.Suggested fix
- params = _build_query_params(solr) + params = _build_query_params(solr, k=limit)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/vector_search.py` around lines 574 - 595, The code reads configuration.rag.okp.max_chunks into limit but doesn't pass it into the Solr query, so increase the returned chunk count by injecting that limit into the query params: update the call site around _build_query_params(solr) to supply the configured limit (configuration.rag.okp.max_chunks / local variable limit) into the params (e.g., set params["k"] or the appropriate param name used by Solr) before calling client.vector_io.query(vector_store_id=vector_store_id, query=query, params=params), ensuring _build_query_params or the params dict reflects the desired max_chunks.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/openapi.json`:
- Around line 6762-6781: The checked-in OpenAPI artifact is stale: regenerate
the OpenAPI JSON from the current source models so the "rag" schema block (the
byok/okp/retrieval sections shown in the diff) and the other affected sections
(around the noted ranges) match the generated contract; run the project's
OpenAPI generation script/command used in CI, replace the existing openapi.json
with the regenerated output, and commit the updated artifact so the schema in
the repo matches the canonical models.
In `@src/llama_stack_configuration.py`:
- Around line 20-23: Add a module-level validation after
BACKEND_TO_LLAMA_STACK_PROVIDER that imports DEFAULT_RAG_BACKEND and checks it
exists as a key in BACKEND_TO_LLAMA_STACK_PROVIDER; if not, raise a clear
ValueError (or AssertionError) explaining the invalid default and listing
supported keys. Reference the symbols BACKEND_TO_LLAMA_STACK_PROVIDER and
DEFAULT_RAG_BACKEND and perform the check at import time so misconfiguration
surfaces early.
In `@src/models/config.py`:
- Around line 1775-1779: The backend Field currently accepts any string
(backend) and should be constrained at the schema boundary: replace the
unconstrained str with a stricter type or add a Pydantic validator that checks
the value against the canonical supported-backend set (e.g.
constants.SUPPORTED_RAG_BACKENDS or a new enum/typing.Literal list) and raise a
ValidationError if it is not present; keep the default as
constants.DEFAULT_RAG_BACKEND and ensure the model-level validator or Enum
update also updates the generated schema so invalid YAML fails fast.
- Around line 2029-2053: Add a Pydantic validation on RagConfiguration that
checks retrieval.inline.sources and retrieval.tool.sources against the declared
set of allowed IDs (collect byok.stores[].rag_id plus constants.OKP_RAG_ID); if
any source id is not in that set raise a clear ValueError listing the unknown
ids so configuration loading fails fast; implement this as a class validator
(e.g., `@root_validator` or `@field_validator`) inside RagConfiguration referencing
the retrieval, byok, and constants.OKP_RAG_ID symbols and include both inline
and tool checks in one pass.
In `@src/utils/vector_search.py`:
- Around line 472-484: The current code filters incoming vector_store_ids
against configuration.configuration.rag.retrieval.inline.sources before mapping
user-facing rag_ids to llama-stack ids, which causes raw llama-stack
vector_db_ids to be dropped; instead, first call resolve_vector_store_ids(...)
to translate incoming vector_store_ids (or the default
configuration.configuration.rag.retrieval.inline.sources when None) into
llama-stack ids using resolve_vector_store_ids, then compute rag_ids_to_query /
vector_store_ids_to_query by intersecting the resolved ids with
configuration.configuration.rag.retrieval.inline.sources so inline-enabled BYOK
stores passed as raw ids are preserved (update references: vector_store_ids,
rag_ids_to_query, resolve_vector_store_ids,
configuration.configuration.rag.retrieval.inline.sources).
---
Outside diff comments:
In `@src/models/config.py`:
- Around line 1765-2053: Update the one-line class docstrings for RagStore,
RetrievalStrategyConfiguration, RetrievalConfiguration, ByokConfiguration, and
RagConfiguration to full Google-style docstrings: include a short summary, a
longer description, and an Attributes section listing each model field (e.g.,
RagStore: rag_id, backend, embedding_model, embedding_dimension, vector_db_id,
db_path, score_multiplier; RetrievalStrategyConfiguration: sources, max_chunks;
RetrievalConfiguration: inline, tool; ByokConfiguration: max_chunks, stores;
RagConfiguration: byok, okp, retrieval) with types and brief descriptions;
ensure the docstrings follow the repo guideline (Google-style) and include the
required sections (Parameters/Returns/Raises where applicable and an Attributes
section for the class fields).
In `@src/utils/vector_search.py`:
- Around line 574-595: The code reads configuration.rag.okp.max_chunks into
limit but doesn't pass it into the Solr query, so increase the returned chunk
count by injecting that limit into the query params: update the call site around
_build_query_params(solr) to supply the configured limit
(configuration.rag.okp.max_chunks / local variable limit) into the params (e.g.,
set params["k"] or the appropriate param name used by Solr) before calling
client.vector_io.query(vector_store_id=vector_store_id, query=query,
params=params), ensuring _build_query_params or the params dict reflects the
desired max_chunks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 497ca0e6-3b47-447b-89c3-24a6a5d13a3f
📒 Files selected for processing (32)
docs/byok_guide.mddocs/openapi.jsondocs/rag_guide.mdexamples/lightspeed-stack-byok-okp-rag.yamlexamples/quota-limiter-configuration-sqlite.yamlsrc/app/endpoints/rags.pysrc/client.pysrc/configuration.pysrc/constants.pysrc/llama_stack_configuration.pysrc/models/api/responses/successful/configuration.pysrc/models/config.pysrc/utils/responses.pysrc/utils/vector_search.pytests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yamltests/e2e/configuration/library-mode/lightspeed-stack-inline-rag.yamltests/e2e/configuration/library-mode/lightspeed-stack.yamltests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yamltests/e2e/configuration/server-mode/lightspeed-stack-inline-rag.yamltests/e2e/configuration/server-mode/lightspeed-stack.yamltests/integration/endpoints/test_query_byok_integration.pytests/integration/endpoints/test_responses_byok_integration.pytests/integration/endpoints/test_streaming_query_byok_integration.pytests/unit/app/endpoints/test_rags.pytests/unit/models/config/test_byok_rag.pytests/unit/models/config/test_dump_configuration.pytests/unit/models/config/test_rag_configuration.pytests/unit/telemetry/conftest.pytests/unit/test_configuration.pytests/unit/test_llama_stack_configuration.pytests/unit/utils/test_responses.pytests/unit/utils/test_vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: build-pr
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: unit_tests (3.12)
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (7)
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Llama Stack imports: Usefrom llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Useasync deffor I/O operations and external API calls
Use standard log levels with clear purposes:debug()for diagnostic info,info()for program execution,warning()for unexpected events,error()for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes must use ABC with@abstractmethoddecorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes
Files:
src/client.pysrc/models/api/responses/successful/configuration.pysrc/app/endpoints/rags.pysrc/configuration.pysrc/constants.pysrc/llama_stack_configuration.pysrc/utils/responses.pysrc/utils/vector_search.pysrc/models/config.py
src/**/configuration.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/configuration.py: All config models must extendConfigurationBasewithextra="forbid"to reject unknown fields
Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Files:
src/models/api/responses/successful/configuration.pysrc/configuration.py
src/models/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Pydantic models must use
@model_validatorand@field_validatorfor validation and complete type annotations for all attributes, avoidingAnytype
Files:
src/models/api/responses/successful/configuration.pysrc/models/config.py
src/app/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/app/**/*.py: FastAPI dependencies: Import fromfastapimodule forAPIRouter,HTTPException,Request,status,Depends
Use FastAPIHTTPExceptionwith appropriate status codes for API endpoints and handleAPIConnectionErrorfrom Llama Stack
Files:
src/app/endpoints/rags.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async tests
Files:
tests/unit/app/endpoints/test_rags.pytests/unit/models/config/test_rag_configuration.pytests/unit/telemetry/conftest.pytests/unit/models/config/test_byok_rag.pytests/unit/test_llama_stack_configuration.pytests/unit/utils/test_vector_search.pytests/integration/endpoints/test_streaming_query_byok_integration.pytests/unit/utils/test_responses.pytests/unit/models/config/test_dump_configuration.pytests/integration/endpoints/test_query_byok_integration.pytests/unit/test_configuration.pytests/integration/endpoints/test_responses_byok_integration.py
src/constants.py
📄 CodeRabbit inference engine (AGENTS.md)
Use
constants.pyfor shared constants with descriptive comments and type hints usingFinal[type]
Files:
src/constants.py
tests/**/conftest.py
📄 CodeRabbit inference engine (AGENTS.md)
Use
conftest.pyfor shared pytest fixtures andpytest-mockfor AsyncMock objects
Files:
tests/unit/telemetry/conftest.py
🧠 Learnings (13)
📚 Learning: 2026-05-20T08:09:30.641Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:30.641Z
Learning: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set `provider_shield_id` to the *guard model identifier* (e.g., `meta-llama/Llama-Guard-3-8B`). Do not use a chat/generative model id (e.g., `openai/gpt-4o-mini`): a chat-model id (or `native_override`) indicates only an override landed and does **not** mean the safety shield is actually gating queries. Ensure any E2E coverage for the related implementation (JIRA/E2E tests) exercises a real Llama Guard model to verify that the shield is effective.
Applied to files:
examples/quota-limiter-configuration-sqlite.yamltests/e2e/configuration/server-mode/lightspeed-stack.yamltests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yamltests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yamlexamples/lightspeed-stack-byok-okp-rag.yamltests/e2e/configuration/server-mode/lightspeed-stack-inline-rag.yamltests/e2e/configuration/library-mode/lightspeed-stack.yamltests/e2e/configuration/library-mode/lightspeed-stack-inline-rag.yaml
📚 Learning: 2026-05-20T08:09:36.724Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: src/client.py:104-108
Timestamp: 2026-05-20T08:09:36.724Z
Learning: In the lightspeed-stack repo, the synthesized `run.yaml` file handling in `src/client.py` (`_synthesize_library_config`) uses a fixed `/tmp` path intentionally in the PoC (PR `#1580`). The durable production requirements are tracked in spec doc R10 (docs/design/llama-stack-config-merge/llama-stack-config-merge.md): persistent known path overwritten each boot, file mode 0600 set via explicit create flag (not umask), and a `--synthesized-config-output` CLI flag for debugging. The PoC code is scheduled for removal pre-merge; the implementation JIRA "Unified llama_stack.config schema + synthesizer" inherits R10's requirements.
Applied to files:
tests/e2e/configuration/server-mode/lightspeed-stack.yamltests/e2e/configuration/library-mode/lightspeed-stack.yaml
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.
Applied to files:
tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yamltests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
📚 Learning: 2025-09-02T11:15:02.411Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/test_list.txt:2-3
Timestamp: 2025-09-02T11:15:02.411Z
Learning: In the lightspeed-stack e2e tests, the Authorized tag is intentionally omitted from noop authentication tests because they are designed to test against the default lightspeed-stack.yaml configuration rather than the specialized noop-with-token configuration.
Applied to files:
tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/api/responses/successful/configuration.pysrc/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/api/responses/successful/configuration.pysrc/models/config.py
📚 Learning: 2026-03-17T11:34:53.242Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1335
File: docs/config.md:411-411
Timestamp: 2026-03-17T11:34:53.242Z
Learning: In the lightspeed-stack project (`src/models/config.py`, `docs/config.md`), the internal Solr filter `is_chunk:true` (defined as `SOLR_CHUNK_FILTER_QUERY` in `src/constants.py`) is always injected by the system for OKP searches and is intentionally hidden from users. The `chunk_filter_query` field in `OkpConfiguration` is user-facing and additive-only, but the documentation must NOT mention the internal `is_chunk:true` behavior — this is a deliberate design decision by the maintainers.
Applied to files:
docs/rag_guide.mddocs/openapi.jsonsrc/utils/vector_search.pytests/unit/utils/test_vector_search.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.
Applied to files:
src/app/endpoints/rags.py
📚 Learning: 2026-05-06T06:57:52.173Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-05-06T06:57:52.173Z
Learning: Applies to src/constants.py : Use `constants.py` for shared constants with descriptive comments and type hints using `Final[type]`
Applied to files:
src/constants.py
📚 Learning: 2026-05-20T08:09:43.391Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: src/llama_stack_configuration.py:651-683
Timestamp: 2026-05-20T08:09:43.391Z
Learning: In `src/llama_stack_configuration.py`, the `apply_high_level_inference` function currently emits `provider_id: p_type` (underscore form, e.g. `sentence_transformers`) directly from the high-level type key, which collides with Llama Stack's hyphenated provider IDs (e.g. `sentence-transformers`). This is a known PoC divergence documented in the spike doc ("Findings discovered during PoC") and tracked in the implementation JIRA "Unified llama_stack.config schema + synthesizer". Decision S5 mandates that each backend-specific synthesizer translates LCORE's canonical type Literal vocabulary to the target backend's expected shape (hyphenated provider_id for Llama Stack; model-string prefixes for Pydantic AI). The PoC code will be removed before merge; the fix belongs in the implementation ticket.
Applied to files:
src/constants.pysrc/llama_stack_configuration.pytests/unit/test_llama_stack_configuration.py
📚 Learning: 2026-04-15T18:54:09.157Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:769-773
Timestamp: 2026-04-15T18:54:09.157Z
Learning: In lightspeed-core/lightspeed-stack (src/models/requests.py and related files), schema-level field size limits (e.g., max_length=65_536, max_length=32_768) are intentionally written as inline numeric literals, not extracted to constants.py. constants.py is reserved for configurable runtime defaults (e.g., DEFAULT_RH_IDENTITY_MAX_HEADER_SIZE, DEFAULT_MAX_FILE_UPLOAD_SIZE). Do not flag inline literals in field validators or Pydantic Field constraints as needing extraction to constants.py.
Applied to files:
src/constants.py
📚 Learning: 2025-12-18T10:21:09.038Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:09.038Z
Learning: In Llama Stack version 0.3.x, telemetry provider configuration is not supported under the `providers` section in run.yaml configuration files. Telemetry can be enabled with just `telemetry.enabled: true` without requiring an explicit provider block.
Applied to files:
src/llama_stack_configuration.py
📚 Learning: 2026-02-23T14:56:59.186Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.
Applied to files:
src/utils/responses.py
🪛 GitHub Actions: OpenAPI (Spectral) / 0_spectral.txt
docs/openapi.json
[error] 1-1: OpenAPI schema is out of date. 'diff -u docs/openapi.json /tmp/openapi-generated.json' failed, indicating the generated schema differs. Regenerate with: 'uv run python scripts/generate_openapi_schema.py docs/openapi.json'.
🪛 GitHub Actions: OpenAPI (Spectral) / spectral
docs/openapi.json
[error] 1-1: CI check failed: docs/openapi.json is out of date compared to the generated schema. Regenerate with: uv run scripts/generate_openapi_schema.py docs/openapi.json
🪛 markdownlint-cli2 (0.22.1)
docs/rag_guide.md
[warning] 42-42: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3
(MD001, heading-increment)
🔇 Additional comments (66)
src/models/api/responses/successful/configuration.py (1)
82-94: LGTM!tests/unit/models/config/test_byok_rag.py (4)
15-37: LGTM!
40-64: LGTM!
67-173: LGTM!
176-201: LGTM!tests/unit/models/config/test_rag_configuration.py (5)
20-36: LGTM!
38-61: LGTM!
63-83: LGTM!
85-176: LGTM!
178-202: LGTM!tests/unit/models/config/test_dump_configuration.py (3)
46-266: LGTM!
901-1135: LGTM!
389-2105: LGTM!src/configuration.py (4)
478-482: LGTM!
492-517: LGTM!
520-535: LGTM!
538-549: LGTM!tests/unit/telemetry/conftest.py (3)
10-36: LGTM!
295-316: LGTM!
391-412: LGTM!tests/unit/test_configuration.py (7)
1004-1015: LGTM!
1032-1043: LGTM!
1062-1075: LGTM!
1094-1116: LGTM!
1159-1172: LGTM!
1193-1213: LGTM!
1414-1449: LGTM!Also applies to: 1702-1743, 1903-1937, 2121-2170, 2335-2378, 2607-2656, 2852-2902, 3067-3117, 3281-3332, 3518-3559, 3748-3789
src/app/endpoints/rags.py (3)
27-27: LGTM!
110-129: LGTM!
168-170: LGTM!tests/unit/app/endpoints/test_rags.py (3)
272-296: LGTM!
383-388: LGTM!
391-395: LGTM!tests/unit/utils/test_vector_search.py (9)
521-522: LGTM!
606-609: LGTM!
670-671: LGTM!
690-691: LGTM!
723-731: LGTM!
755-764: LGTM!
862-873: LGTM!
917-927: LGTM!
540-542: ⚖️ Poor tradeoffTests’ mixed
configuration.rag.*vsrag.*access is intentionalThe test setup matches
src/utils/vector_search.py: it reads inline BYOK sources/stores fromconfiguration.configuration.rag.*, while it reads BYOKmax_chunksfromconfiguration.rag.byok.max_chunks, so the apparent inconsistency is expected.src/llama_stack_configuration.py (2)
344-363: LGTM!
602-613: LGTM!src/client.py (2)
94-94: LGTM!
97-101: LGTM!examples/lightspeed-stack-byok-okp-rag.yaml (1)
38-78: LGTM!examples/quota-limiter-configuration-sqlite.yaml (1)
36-48: LGTM!tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml (1)
30-43: LGTM!tests/e2e/configuration/library-mode/lightspeed-stack-inline-rag.yaml (1)
29-42: LGTM!tests/e2e/configuration/library-mode/lightspeed-stack.yaml (1)
23-36: LGTM!tests/unit/test_llama_stack_configuration.py (3)
258-318: LGTM!
538-549: LGTM!
573-614: LGTM!tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml (1)
36-49: LGTM!tests/e2e/configuration/server-mode/lightspeed-stack-inline-rag.yaml (1)
30-43: LGTM!tests/e2e/configuration/server-mode/lightspeed-stack.yaml (1)
24-37: LGTM!tests/integration/endpoints/test_query_byok_integration.py (3)
246-271: LGTM!
274-301: LGTM!
429-455: LGTM!Also applies to: 498-526, 578-618, 621-662, 768-787, 886-915, 971-999, 1074-1105, 1144-1144, 1164-1197, 1249-1249, 1272-1300
tests/integration/endpoints/test_responses_byok_integration.py (1)
103-154: LGTM!Also applies to: 157-196, 205-268, 277-350, 359-411, 420-501, 510-571, 575-658, 662-719
tests/integration/endpoints/test_streaming_query_byok_integration.py (1)
232-252: LGTM!Also applies to: 255-278, 287-320, 323-382, 386-456, 460-491, 495-533, 537-581, 590-623, 627-665, 674-738, 747-819, 828-907, 916-984, 988-1078, 1082-1147
docs/byok_guide.md (2)
81-119: LGTM!
193-193: ⚡ Quick winEmbedding model download command aligns with
rag-contenttooling (uv, notpdm).File:
docs/byok_guide.md(line 193)uv run python ./scripts/download_embeddings_model.py -l ./embeddings_model/ -r sentence-transformers/all-mpnet-base-v2The
rag-contentREADME also instructs usinguv run python ./scripts/download_embeddings_model.pywith the same-l/-rpattern, so thepdm→uvtool change is consistent.docs/rag_guide.md (2)
37-68: LGTM!
350-361: LGTM!
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
docs/rag_guide.md (1)
102-110:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFAISS example uses the removed
byok_rag/rag_typeschema.Same inconsistency as in
byok_guide.md: this example and the others at lines 277-288 (rag.inline/rag.tool+ top-levelokp:), 318-327, and 389-402, plus the prose at lines 92, 98, and 371, still document the legacy flat schema, while your chunk-volume table at 356-361 already uses the newrag.byok.max_chunks/rag.retrieval.*.max_chunkspaths. The enrichment insrc/llama_stack_configuration.pyonly readsrag.byok.stores,rag.retrieval.inline.sources,rag.retrieval.tool.sources, andrag.okp, andbackend: faissreplacesrag_type: inline::faiss. Please migrate these examples to the unified schema.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/rag_guide.md` around lines 102 - 110, Update the YAML examples and accompanying prose to the unified schema: replace the legacy top-level byok_rag / rag_type entries with rag.byok.stores entries (use backend: faiss instead of rag_type: inline::faiss), move per-source settings into rag.retrieval.inline.sources or rag.retrieval.tool.sources as appropriate, and use rag.byok.max_chunks and rag.retrieval.*.max_chunks for chunk-volume limits; also ensure rag.okp remains as the top-level OKP config referenced by the enrichment code. Locate the example blocks that currently show byok_rag/rag_type (and the prose referencing those keys) and convert them to the new keys (rag.byok.stores, rag.retrieval.inline.sources, rag.retrieval.tool.sources, rag.okp, backend: faiss) and update any comments so they reflect generated vector_db_id/db_path fields and the new max_chunks paths.src/llama_stack_configuration.py (1)
315-322: 🧹 Nitpick | 🔵 Trivial | 💤 Low valueDocstring is stale relative to the implementation.
The Returns section states each entry's
provider_idis"byok_<vector_db_id>"andprovider_typeis "set from the RAG item", but the code setsprovider_id = f"byok_{rag_id}"(line 344) and derivesprovider_typefromBACKEND_TO_LLAMA_STACK_PROVIDER[backend](lines 350-356).📝 Suggested wording fix
- Each appended entry has `provider_id` set to "byok_<vector_db_id>", - `provider_type` set from the RAG item, and a `config` with `persistence` - referencing the corresponding backend. + Each appended entry has `provider_id` set to "byok_<rag_id>", + `provider_type` resolved from the entry's `backend` via + `BACKEND_TO_LLAMA_STACK_PROVIDER`, and a `config` with `persistence` + referencing the corresponding storage backend.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/llama_stack_configuration.py` around lines 315 - 322, The Returns docstring in llama_stack_configuration.py is outdated: it claims provider_id is "byok_<vector_db_id>" and provider_type is "set from the RAG item", but the implementation builds provider_id using rag_id (provider_id = f"byok_{rag_id}") and derives provider_type via BACKEND_TO_LLAMA_STACK_PROVIDER[backend]; update the Returns description to state provider_id is "byok_<rag_id>" and that provider_type is resolved from BACKEND_TO_LLAMA_STACK_PROVIDER using the backend value so the docstring matches the logic in the function that appends entries from byok_rag.docs/byok_guide.md (1)
206-279:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUpdate BYOK guide examples to match the new nested
ragschema
docs/byok_guide.mdStep 4/5 still shows the legacy top-levelbyok_raglist withrag_typeand top-levelokp, but the code/tested config shape uses nestedrag.byok.storesand abackendfield (tests build vector_io providers from BYOK entries containingbackendand emitprovider_type == "inline::faiss"). Copying the current docs wouldn’t match the new paths used bygenerate_configuration(rag.byok.stores,rag.retrieval.inline.sources,rag.retrieval.tool.sources,rag.okp).
- Migrate all Step 4/Step 5 snippets, field reference table, and prose around lines ~202/261-262 to the new keys.
📝 Illustrative migration for the Step 4/Step 5 snippets
-byok_rag: - - rag_id: my-docs - rag_type: inline::faiss - embedding_model: sentence-transformers/all-mpnet-base-v2 - embedding_dimension: 768 - vector_db_id: vs_8c94967b-81cc-4028-a294-9cfac6fd9ae2 - db_path: /path/to/vector_db/faiss_store.db - score_multiplier: 1.0 +rag: + byok: + stores: + - rag_id: my-docs + backend: faiss + embedding_model: sentence-transformers/all-mpnet-base-v2 + embedding_dimension: 768 + vector_db_id: vs_8c94967b-81cc-4028-a294-9cfac6fd9ae2 + db_path: /path/to/vector_db/faiss_store.db + score_multiplier: 1.0 + retrieval: + inline: + sources: + - my-docs + - okp + tool: + sources: + - my-docs + - okp + okp: + offline: true🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/byok_guide.md` around lines 206 - 279, Update the BYOK docs to match the new nested schema: replace top-level byok_rag with rag.byok.stores entries that include backend and provider_type (e.g., "provider_type: inline::faiss") and update all examples and prose to use rag.retrieval.inline.sources and rag.retrieval.tool.sources for Step 5 and rag.okp for OKP settings; ensure the field reference table, YAML examples, and any mentions of rag_type/okp at top-level are migrated to the new keys so generate_configuration will consume the documented shape.src/models/config.py (1)
1960-1964:⚠️ Potential issue | 🟠 Major | ⚡ Quick winTool
max_chunksfalls back to the inline default on partial config.If a user sets
rag.retrieval.tool.sourcesbut omitsmax_chunks, Pydantic instantiatesRetrievalStrategyConfigurationfrom that partial dict and line 1961 appliesDEFAULT_INLINE_RAG_MAX_CHUNKS. Thetoolfield’sdefault_factoryonly runs when the wholetoolsection is absent, so configured tool RAG silently gets the wrong result limit.src/utils/responses.pythen usesconfiguration.rag.retrieval.tool.max_chunksdirectly when building thefile_searchtool, so this is observable at runtime.Proposed fix
class RetrievalConfiguration(ConfigurationBase): """Configuration for inline and tool retrieval strategies.""" + + `@model_validator`(mode="before") + `@classmethod` + def apply_strategy_defaults(cls, data: Any) -> Any: + """Apply per-strategy max_chunks defaults when nested config is partial.""" + if not isinstance(data, dict): + return data + + inline = data.get("inline") + if isinstance(inline, dict): + inline.setdefault("max_chunks", constants.DEFAULT_INLINE_RAG_MAX_CHUNKS) + + tool = data.get("tool") + if isinstance(tool, dict): + tool.setdefault("max_chunks", constants.DEFAULT_TOOL_RAG_MAX_CHUNKS) + + return data inline: RetrievalStrategyConfiguration = Field( default_factory=lambda: RetrievalStrategyConfiguration( max_chunks=constants.DEFAULT_INLINE_RAG_MAX_CHUNKS, ),Also applies to: 1978-1984
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/models/config.py` around lines 1960 - 1964, The field RetrievalStrategyConfiguration.max_chunks currently defaults to constants.DEFAULT_INLINE_RAG_MAX_CHUNKS which causes a partial tool config to silently inherit the inline default; change max_chunks to be optional (e.g. Optional[PositiveInt] with default None) so omission is distinguishable from an explicit value, do the same for the analogous fields reported around lines 1978-1984, and update the consumer in src/utils/responses.py (the file_search builder) to treat a None max_chunks as “unspecified” and explicitly fall back to the appropriate parent/global default only when the entire tool section is absent (rather than using DEFAULT_INLINE_RAG_MAX_CHUNKS on partial configs).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/constants.py`:
- Around line 171-176: SUPPORTED_RAG_BACKENDS is duplicated work and can drift
from BACKEND_TO_LLAMA_STACK_PROVIDER; change SUPPORTED_RAG_BACKENDS to be
derived from the single source BACKEND_TO_LLAMA_STACK_PROVIDER in
llama_stack_configuration.py (i.e., import BACKEND_TO_LLAMA_STACK_PROVIDER and
set SUPPORTED_RAG_BACKENDS = frozenset(BACKEND_TO_LLAMA_STACK_PROVIDER.keys())
or otherwise centralize the canonical mapping so both the model validator and
enrichment layer read the same data). Ensure DEFAULT_RAG_BACKEND remains valid
against the derived set.
---
Outside diff comments:
In `@docs/byok_guide.md`:
- Around line 206-279: Update the BYOK docs to match the new nested schema:
replace top-level byok_rag with rag.byok.stores entries that include backend and
provider_type (e.g., "provider_type: inline::faiss") and update all examples and
prose to use rag.retrieval.inline.sources and rag.retrieval.tool.sources for
Step 5 and rag.okp for OKP settings; ensure the field reference table, YAML
examples, and any mentions of rag_type/okp at top-level are migrated to the new
keys so generate_configuration will consume the documented shape.
In `@docs/rag_guide.md`:
- Around line 102-110: Update the YAML examples and accompanying prose to the
unified schema: replace the legacy top-level byok_rag / rag_type entries with
rag.byok.stores entries (use backend: faiss instead of rag_type: inline::faiss),
move per-source settings into rag.retrieval.inline.sources or
rag.retrieval.tool.sources as appropriate, and use rag.byok.max_chunks and
rag.retrieval.*.max_chunks for chunk-volume limits; also ensure rag.okp remains
as the top-level OKP config referenced by the enrichment code. Locate the
example blocks that currently show byok_rag/rag_type (and the prose referencing
those keys) and convert them to the new keys (rag.byok.stores,
rag.retrieval.inline.sources, rag.retrieval.tool.sources, rag.okp, backend:
faiss) and update any comments so they reflect generated vector_db_id/db_path
fields and the new max_chunks paths.
In `@src/llama_stack_configuration.py`:
- Around line 315-322: The Returns docstring in llama_stack_configuration.py is
outdated: it claims provider_id is "byok_<vector_db_id>" and provider_type is
"set from the RAG item", but the implementation builds provider_id using rag_id
(provider_id = f"byok_{rag_id}") and derives provider_type via
BACKEND_TO_LLAMA_STACK_PROVIDER[backend]; update the Returns description to
state provider_id is "byok_<rag_id>" and that provider_type is resolved from
BACKEND_TO_LLAMA_STACK_PROVIDER using the backend value so the docstring matches
the logic in the function that appends entries from byok_rag.
In `@src/models/config.py`:
- Around line 1960-1964: The field RetrievalStrategyConfiguration.max_chunks
currently defaults to constants.DEFAULT_INLINE_RAG_MAX_CHUNKS which causes a
partial tool config to silently inherit the inline default; change max_chunks to
be optional (e.g. Optional[PositiveInt] with default None) so omission is
distinguishable from an explicit value, do the same for the analogous fields
reported around lines 1978-1984, and update the consumer in
src/utils/responses.py (the file_search builder) to treat a None max_chunks as
“unspecified” and explicitly fall back to the appropriate parent/global default
only when the entire tool section is absent (rather than using
DEFAULT_INLINE_RAG_MAX_CHUNKS on partial configs).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 2edf3be1-2f33-48d5-872a-24b0829cb044
📒 Files selected for processing (9)
docs/byok_guide.mddocs/rag_guide.mdsrc/constants.pysrc/llama_stack_configuration.pysrc/models/config.pytests/e2e/features/inline_rag.featuretests/unit/models/config/test_byok_rag.pytests/unit/models/config/test_rag_configuration.pytests/unit/utils/test_responses.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: unit_tests (3.12)
- GitHub Check: build-pr
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (5)
tests/e2e/**/*.{py,feature}
📄 CodeRabbit inference engine (AGENTS.md)
Use behave (BDD) framework for end-to-end testing with Gherkin feature files
Files:
tests/e2e/features/inline_rag.feature
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async tests
Files:
tests/unit/models/config/test_rag_configuration.pytests/unit/models/config/test_byok_rag.pytests/unit/utils/test_responses.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Llama Stack imports: Usefrom llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Useasync deffor I/O operations and external API calls
Use standard log levels with clear purposes:debug()for diagnostic info,info()for program execution,warning()for unexpected events,error()for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes must use ABC with@abstractmethoddecorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes
Files:
src/constants.pysrc/llama_stack_configuration.pysrc/models/config.py
src/constants.py
📄 CodeRabbit inference engine (AGENTS.md)
Use
constants.pyfor shared constants with descriptive comments and type hints usingFinal[type]
Files:
src/constants.py
src/models/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Pydantic models must use
@model_validatorand@field_validatorfor validation and complete type annotations for all attributes, avoidingAnytype
Files:
src/models/config.py
🧠 Learnings (11)
📓 Common learnings
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1843
File: src/utils/vector_search.py:472-484
Timestamp: 2026-06-03T11:15:38.136Z
Learning: In `src/utils/vector_search.py`, the `_fetch_byok_rag` function's `vector_store_ids` parameter always contains user-facing `rag_id`s, never raw llama-stack `vector_db_id`s. The API contract enforces rag_ids only, so filtering `vector_store_ids` against `configuration.configuration.rag.retrieval.inline.sources` (also rag_ids) before calling `resolve_vector_store_ids` is intentional and correct.
📚 Learning: 2026-06-03T11:15:38.136Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1843
File: src/utils/vector_search.py:472-484
Timestamp: 2026-06-03T11:15:38.136Z
Learning: In `src/utils/vector_search.py`, the `_fetch_byok_rag` function's `vector_store_ids` parameter always contains user-facing `rag_id`s, never raw llama-stack `vector_db_id`s. The API contract enforces rag_ids only, so filtering `vector_store_ids` against `configuration.configuration.rag.retrieval.inline.sources` (also rag_ids) before calling `resolve_vector_store_ids` is intentional and correct.
Applied to files:
tests/unit/models/config/test_rag_configuration.pydocs/byok_guide.mddocs/rag_guide.mdsrc/constants.pysrc/llama_stack_configuration.pytests/unit/models/config/test_byok_rag.pytests/unit/utils/test_responses.pysrc/models/config.py
📚 Learning: 2025-12-18T10:21:09.038Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:09.038Z
Learning: In Llama Stack version 0.3.x, telemetry provider configuration is not supported under the `providers` section in run.yaml configuration files. Telemetry can be enabled with just `telemetry.enabled: true` without requiring an explicit provider block.
Applied to files:
docs/byok_guide.md
📚 Learning: 2025-08-18T10:58:14.951Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:47-47
Timestamp: 2025-08-18T10:58:14.951Z
Learning: psycopg2-binary is required by some llama-stack providers in the lightspeed-stack project, so it cannot be replaced with psycopg v3 or moved to optional dependencies without breaking llama-stack functionality.
Applied to files:
docs/byok_guide.mddocs/rag_guide.md
📚 Learning: 2026-05-20T08:09:36.724Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: src/client.py:104-108
Timestamp: 2026-05-20T08:09:36.724Z
Learning: In the lightspeed-stack repo, the synthesized `run.yaml` file handling in `src/client.py` (`_synthesize_library_config`) uses a fixed `/tmp` path intentionally in the PoC (PR `#1580`). The durable production requirements are tracked in spec doc R10 (docs/design/llama-stack-config-merge/llama-stack-config-merge.md): persistent known path overwritten each boot, file mode 0600 set via explicit create flag (not umask), and a `--synthesized-config-output` CLI flag for debugging. The PoC code is scheduled for removal pre-merge; the implementation JIRA "Unified llama_stack.config schema + synthesizer" inherits R10's requirements.
Applied to files:
docs/byok_guide.md
📚 Learning: 2026-05-06T06:57:52.173Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-05-06T06:57:52.173Z
Learning: Applies to src/constants.py : Use `constants.py` for shared constants with descriptive comments and type hints using `Final[type]`
Applied to files:
src/constants.py
📚 Learning: 2026-05-20T08:09:43.391Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: src/llama_stack_configuration.py:651-683
Timestamp: 2026-05-20T08:09:43.391Z
Learning: In `src/llama_stack_configuration.py`, the `apply_high_level_inference` function currently emits `provider_id: p_type` (underscore form, e.g. `sentence_transformers`) directly from the high-level type key, which collides with Llama Stack's hyphenated provider IDs (e.g. `sentence-transformers`). This is a known PoC divergence documented in the spike doc ("Findings discovered during PoC") and tracked in the implementation JIRA "Unified llama_stack.config schema + synthesizer". Decision S5 mandates that each backend-specific synthesizer translates LCORE's canonical type Literal vocabulary to the target backend's expected shape (hyphenated provider_id for Llama Stack; model-string prefixes for Pydantic AI). The PoC code will be removed before merge; the fix belongs in the implementation ticket.
Applied to files:
src/llama_stack_configuration.pysrc/models/config.py
📚 Learning: 2026-05-20T08:09:34.319Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:34.319Z
Learning: In the lightspeed-stack project (LCORE-836 unified config), the `provider_shield_id` for a Llama Guard safety shield entry must be a guard model identifier (e.g., `meta-llama/Llama-Guard-3-8B`), not a chat model id like `openai/gpt-4o-mini`. Using a chat model id there only means the native_override key landed — it does not mean the safety shield is actually gating queries. E2E tests for the implementation JIRA must exercise a real Llama Guard model to validate the shield.
Applied to files:
src/llama_stack_configuration.py
📚 Learning: 2026-04-15T18:54:09.157Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:769-773
Timestamp: 2026-04-15T18:54:09.157Z
Learning: In lightspeed-core/lightspeed-stack (src/models/requests.py and related files), schema-level field size limits (e.g., max_length=65_536, max_length=32_768) are intentionally written as inline numeric literals, not extracted to constants.py. constants.py is reserved for configurable runtime defaults (e.g., DEFAULT_RH_IDENTITY_MAX_HEADER_SIZE, DEFAULT_MAX_FILE_UPLOAD_SIZE). Do not flag inline literals in field validators or Pydantic Field constraints as needing extraction to constants.py.
Applied to files:
src/models/config.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/config.py
🔇 Additional comments (2)
src/llama_stack_configuration.py (2)
25-29: LGTM!
607-619: LGTM!
| # BYOK RAG | ||
| # Default RAG type for bring-your-own-knowledge RAG configurations, that type | ||
| # needs to be supported by Llama Stack | ||
| DEFAULT_RAG_TYPE: Final[str] = "inline::faiss" | ||
| # Backends that have enrichment support in llama_stack_configuration.py | ||
| SUPPORTED_RAG_BACKENDS: Final[frozenset[str]] = frozenset({"faiss"}) | ||
|
|
||
| # Default RAG backend for bring-your-own-knowledge RAG configurations | ||
| DEFAULT_RAG_BACKEND: Final[str] = "faiss" |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | 💤 Low value
Dual source of truth for supported backends.
SUPPORTED_RAG_BACKENDS here and BACKEND_TO_LLAMA_STACK_PROVIDER in src/llama_stack_configuration.py independently enumerate the supported backends (both currently {"faiss"}). The comment even states this set mirrors "enrichment support in llama_stack_configuration.py". If one is updated (e.g. enabling pgvector) without the other, the model validator and the enrichment layer will disagree. Consider deriving the set from a single mapping to keep them in lockstep.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/constants.py` around lines 171 - 176, SUPPORTED_RAG_BACKENDS is
duplicated work and can drift from BACKEND_TO_LLAMA_STACK_PROVIDER; change
SUPPORTED_RAG_BACKENDS to be derived from the single source
BACKEND_TO_LLAMA_STACK_PROVIDER in llama_stack_configuration.py (i.e., import
BACKEND_TO_LLAMA_STACK_PROVIDER and set SUPPORTED_RAG_BACKENDS =
frozenset(BACKEND_TO_LLAMA_STACK_PROVIDER.keys()) or otherwise centralize the
canonical mapping so both the model validator and enrichment layer read the same
data). Ensure DEFAULT_RAG_BACKEND remains valid against the derived set.
| - referenced_documents: Documents referenced in BYOK RAG results | ||
| """ | ||
| limit = max_chunks if max_chunks is not None else constants.BYOK_RAG_MAX_CHUNKS | ||
| limit = max_chunks if max_chunks is not None else configuration.rag.byok.max_chunks |
There was a problem hiding this comment.
It's IMO confusing which value the limit is going to get in the end. I'd remove the default for the max_chunks parameter and always passed it from config, which is set to the default unless specified manually.
| solr: Structured Solr inline RAG request from the API (optional). | ||
| max_chunks: Maximum number of chunks to return. If None, uses | ||
| constants.OKP_RAG_MAX_CHUNKS. | ||
| rag.okp.max_chunks from configuration. |
There was a problem hiding this comment.
There's no max_chunks arg here.
There was a problem hiding this comment.
Addressed in new commit TY!
Restructure the RAG configuration to consolidate byok_rag, rag, and okp top-level sections into a single rag section with byok, okp, and retrieval sub-sections. Make chunk limits user-configurable instead of hardcoded constants. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use explicit default= keyword in Field() for PositiveInt fields so type checkers recognize the defaults. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…w diagram - Use explicit default= keyword in Field() for pyright/mypy compatibility - Fix byok.max_chunks description: caps total across all stores, not per-store - Add inline RAG chunk flow Mermaid diagram to RAG guide - Regenerate OpenAPI schema Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add BYOK content prioritization subsection (score_multiplier + reranker) - Add inline RAG chunk flow diagram to both guides - Replace hardcoded constants references with config paths - Add chunk limits table - Replace pdm run with uv run Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add SUPPORTED_BACKENDS dict to validate backend types during enrichment instead of hardcoding inline:: prefix - Restore inline comments in reranker auto-enable validator - Add backward compatibility note to RAG guide Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add SUPPORTED_RAG_BACKENDS constant and field validator on RagStore.backend to reject unsupported backends at config load time - Add model validator on RagConfiguration to reject retrieval source IDs not declared in byok.stores - Add assert that DEFAULT_RAG_BACKEND has a BACKEND_TO_LLAMA_STACK_PROVIDER mapping - Remove llama-stack references from docs, use run.yaml instead - Add backward compatibility note to RAG guide - Update tests for new validators Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace assert with runtime ValueError for bandit compliance - Fix backend="rag" → backend="faiss" in test helper - Add pylint disable for no-member on Pydantic model validator Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rag to _fetch_okp_rag Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2209ac9 to
472397b
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/app/endpoints/rags.py (1)
111-125: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winUse Google-style docstring sections in this function.
The function docstring is currently NumPy-style; please convert it to Google format (
Args,Returns,Raises) to match repo standards.As per coding guidelines: "Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/app/endpoints/rags.py` around lines 111 - 125, Update the function docstring that "Resolves a user-facing rag_id to the llama-stack vector_db_id" to use Google-style sections: replace NumPy-style "Parameters"/"Returns" with "Args:", "Returns:", and add a "Raises:" section (use "Raises: None" if it doesn't raise) to match repository docstring conventions; keep the existing short description and parameter details (rag_id, byok_rags) and ensure types and return description remain accurate.src/utils/vector_search.py (1)
569-589:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPass
rag.okp.max_chunksinto the Solr query.
limit = configuration.rag.okp.max_chunksonly affects the post-query slice right now._build_query_params(solr)still falls back toconstants.SOLR_VECTOR_SEARCH_DEFAULT_K, so the new config value never changes how many OKP chunks are actually fetched.Suggested fix
- params = _build_query_params(solr) + params = _build_query_params(solr, k=limit)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/vector_search.py` around lines 569 - 589, The Solr query currently ignores configuration.rag.okp.max_chunks because _build_query_params(solr) falls back to constants.SOLR_VECTOR_SEARCH_DEFAULT_K; pass the configured limit into the query parameters so the client.vector_io.query actually requests that many chunks. Either update _build_query_params to accept a limit (e.g., _build_query_params(solr, limit)) and use configuration.rag.okp.max_chunks when building params, or set params['k'] = limit (or the appropriate param key) after params = _build_query_params(solr) and before calling client.vector_io.query; reference the limit variable, _build_query_params, params, client.vector_io.query, and constants.SOLR_VECTOR_SEARCH_DEFAULT_K when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/byok_guide.md`:
- Around line 223-231: The table documents deprecated flat RAG keys (rag_id,
rag_type, embedding_model, embedding_dimension, vector_db_id, db_path,
score_multiplier); update these to the new nested schema using
rag.byok.stores[*] entries and the rag.backend field (replace rag_type →
rag.backend, move rag_id/vector_db_id/db_path/etc. under rag.byok.stores with
per-store keys, and adjust embedding_model/embedding_dimension/score_multiplier
to per-store fields). Ensure all examples and the related occurrences (also at
the later section mentioned) use rag.byok.stores and rag.backend rather than the
old top-level keys so the config matches the new contract.
In `@docs/rag_guide.md`:
- Line 42: Change the heading "Inline RAG chunk flow" from an H3 to H2 to fix
the markdown lint MD001; locate the heading line with the text "Inline RAG chunk
flow" in docs/rag_guide.md and update its markdown header marker to the H2 style
(i.e., use two hash symbols) so the document heading levels are sequential.
In `@src/models/config.py`:
- Around line 1996-2000: Add a post-validation check on the model that declares
the stores: list[RagStore] field to reject duplicate RagStore.rag_id values;
implement a Pydantic validator (e.g., `@validator`("stores") or a `@root_validator`
on that model) that collects rag_id from each RagStore, detects duplicates, and
raises a ValueError listing the duplicated rag_id(s) so validation fails if the
same rag_id appears more than once; reference the stores field and the
RagStore.rag_id attribute when locating where to add this check.
---
Outside diff comments:
In `@src/app/endpoints/rags.py`:
- Around line 111-125: Update the function docstring that "Resolves a
user-facing rag_id to the llama-stack vector_db_id" to use Google-style
sections: replace NumPy-style "Parameters"/"Returns" with "Args:", "Returns:",
and add a "Raises:" section (use "Raises: None" if it doesn't raise) to match
repository docstring conventions; keep the existing short description and
parameter details (rag_id, byok_rags) and ensure types and return description
remain accurate.
In `@src/utils/vector_search.py`:
- Around line 569-589: The Solr query currently ignores
configuration.rag.okp.max_chunks because _build_query_params(solr) falls back to
constants.SOLR_VECTOR_SEARCH_DEFAULT_K; pass the configured limit into the query
parameters so the client.vector_io.query actually requests that many chunks.
Either update _build_query_params to accept a limit (e.g.,
_build_query_params(solr, limit)) and use configuration.rag.okp.max_chunks when
building params, or set params['k'] = limit (or the appropriate param key) after
params = _build_query_params(solr) and before calling client.vector_io.query;
reference the limit variable, _build_query_params, params,
client.vector_io.query, and constants.SOLR_VECTOR_SEARCH_DEFAULT_K when making
the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7791912b-70fc-45af-b294-8bbec271970e
📒 Files selected for processing (33)
docs/byok_guide.mddocs/openapi.jsondocs/rag_guide.mdexamples/lightspeed-stack-byok-okp-rag.yamlexamples/quota-limiter-configuration-sqlite.yamlsrc/app/endpoints/rags.pysrc/client.pysrc/configuration.pysrc/constants.pysrc/llama_stack_configuration.pysrc/models/api/responses/successful/configuration.pysrc/models/config.pysrc/utils/responses.pysrc/utils/vector_search.pytests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yamltests/e2e/configuration/library-mode/lightspeed-stack-inline-rag.yamltests/e2e/configuration/library-mode/lightspeed-stack.yamltests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yamltests/e2e/configuration/server-mode/lightspeed-stack-inline-rag.yamltests/e2e/configuration/server-mode/lightspeed-stack.yamltests/e2e/features/inline_rag.featuretests/integration/endpoints/test_query_byok_integration.pytests/integration/endpoints/test_responses_byok_integration.pytests/integration/endpoints/test_streaming_query_byok_integration.pytests/unit/app/endpoints/test_rags.pytests/unit/models/config/test_byok_rag.pytests/unit/models/config/test_dump_configuration.pytests/unit/models/config/test_rag_configuration.pytests/unit/telemetry/conftest.pytests/unit/test_configuration.pytests/unit/test_llama_stack_configuration.pytests/unit/utils/test_responses.pytests/unit/utils/test_vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: build-pr
- GitHub Check: unit_tests (3.12)
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (8)
tests/e2e/**/*.{py,feature}
📄 CodeRabbit inference engine (AGENTS.md)
Use behave (BDD) framework for end-to-end testing with Gherkin feature files
Files:
tests/e2e/features/inline_rag.feature
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Llama Stack imports: Usefrom llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Useasync deffor I/O operations and external API calls
Use standard log levels with clear purposes:debug()for diagnostic info,info()for program execution,warning()for unexpected events,error()for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes:Configuration,Error/Exception,Resolver,Interface
Abstract classes must use ABC with@abstractmethoddecorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes
Files:
src/client.pysrc/models/api/responses/successful/configuration.pysrc/configuration.pysrc/app/endpoints/rags.pysrc/llama_stack_configuration.pysrc/constants.pysrc/utils/responses.pysrc/models/config.pysrc/utils/vector_search.py
src/**/configuration.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/configuration.py: All config models must extendConfigurationBasewithextra="forbid"to reject unknown fields
Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Files:
src/models/api/responses/successful/configuration.pysrc/configuration.py
src/models/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Pydantic models must use
@model_validatorand@field_validatorfor validation and complete type annotations for all attributes, avoidingAnytype
Files:
src/models/api/responses/successful/configuration.pysrc/models/config.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Usepytest.mark.asynciomarker for async tests
Files:
tests/unit/telemetry/conftest.pytests/unit/models/config/test_byok_rag.pytests/unit/test_llama_stack_configuration.pytests/integration/endpoints/test_responses_byok_integration.pytests/unit/utils/test_vector_search.pytests/unit/app/endpoints/test_rags.pytests/unit/utils/test_responses.pytests/unit/models/config/test_rag_configuration.pytests/integration/endpoints/test_query_byok_integration.pytests/unit/test_configuration.pytests/integration/endpoints/test_streaming_query_byok_integration.pytests/unit/models/config/test_dump_configuration.py
tests/**/conftest.py
📄 CodeRabbit inference engine (AGENTS.md)
Use
conftest.pyfor shared pytest fixtures andpytest-mockfor AsyncMock objects
Files:
tests/unit/telemetry/conftest.py
src/app/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/app/**/*.py: FastAPI dependencies: Import fromfastapimodule forAPIRouter,HTTPException,Request,status,Depends
Use FastAPIHTTPExceptionwith appropriate status codes for API endpoints and handleAPIConnectionErrorfrom Llama Stack
Files:
src/app/endpoints/rags.py
src/constants.py
📄 CodeRabbit inference engine (AGENTS.md)
Use
constants.pyfor shared constants with descriptive comments and type hints usingFinal[type]
Files:
src/constants.py
🧠 Learnings (21)
📓 Common learnings
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1843
File: src/utils/vector_search.py:472-484
Timestamp: 2026-06-03T11:15:38.136Z
Learning: In `src/utils/vector_search.py`, the `_fetch_byok_rag` function's `vector_store_ids` parameter always contains user-facing `rag_id`s, never raw llama-stack `vector_db_id`s. The API contract enforces rag_ids only, so filtering `vector_store_ids` against `configuration.configuration.rag.retrieval.inline.sources` (also rag_ids) before calling `resolve_vector_store_ids` is intentional and correct.
📚 Learning: 2026-06-03T11:15:38.136Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1843
File: src/utils/vector_search.py:472-484
Timestamp: 2026-06-03T11:15:38.136Z
Learning: In `src/utils/vector_search.py`, the `_fetch_byok_rag` function's `vector_store_ids` parameter always contains user-facing `rag_id`s, never raw llama-stack `vector_db_id`s. The API contract enforces rag_ids only, so filtering `vector_store_ids` against `configuration.configuration.rag.retrieval.inline.sources` (also rag_ids) before calling `resolve_vector_store_ids` is intentional and correct.
Applied to files:
tests/e2e/configuration/server-mode/lightspeed-stack-inline-rag.yamltests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yamlexamples/quota-limiter-configuration-sqlite.yamltests/e2e/configuration/server-mode/lightspeed-stack.yamltests/e2e/configuration/library-mode/lightspeed-stack.yamltests/e2e/configuration/library-mode/lightspeed-stack-inline-rag.yamldocs/rag_guide.mdexamples/lightspeed-stack-byok-okp-rag.yamlsrc/models/api/responses/successful/configuration.pysrc/configuration.pydocs/byok_guide.mdsrc/app/endpoints/rags.pytests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yamlsrc/llama_stack_configuration.pysrc/constants.pysrc/utils/responses.pytests/unit/models/config/test_byok_rag.pytests/unit/test_llama_stack_configuration.pytests/integration/endpoints/test_responses_byok_integration.pytests/unit/utils/test_vector_search.pytests/unit/app/endpoints/test_rags.pytests/unit/utils/test_responses.pytests/unit/models/config/test_rag_configuration.pytests/integration/endpoints/test_query_byok_integration.pysrc/models/config.pytests/unit/test_configuration.pydocs/openapi.jsontests/integration/endpoints/test_streaming_query_byok_integration.pytests/unit/models/config/test_dump_configuration.pysrc/utils/vector_search.py
📚 Learning: 2026-05-20T08:09:30.641Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:30.641Z
Learning: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set `provider_shield_id` to the *guard model identifier* (e.g., `meta-llama/Llama-Guard-3-8B`). Do not use a chat/generative model id (e.g., `openai/gpt-4o-mini`): a chat-model id (or `native_override`) indicates only an override landed and does **not** mean the safety shield is actually gating queries. Ensure any E2E coverage for the related implementation (JIRA/E2E tests) exercises a real Llama Guard model to verify that the shield is effective.
Applied to files:
tests/e2e/configuration/server-mode/lightspeed-stack-inline-rag.yamltests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yamlexamples/quota-limiter-configuration-sqlite.yamltests/e2e/configuration/server-mode/lightspeed-stack.yamltests/e2e/configuration/library-mode/lightspeed-stack.yamltests/e2e/configuration/library-mode/lightspeed-stack-inline-rag.yamlexamples/lightspeed-stack-byok-okp-rag.yamltests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.
Applied to files:
tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yamltests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
📚 Learning: 2025-09-02T11:15:02.411Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/test_list.txt:2-3
Timestamp: 2025-09-02T11:15:02.411Z
Learning: In the lightspeed-stack e2e tests, the Authorized tag is intentionally omitted from noop authentication tests because they are designed to test against the default lightspeed-stack.yaml configuration rather than the specialized noop-with-token configuration.
Applied to files:
tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yamltests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
📚 Learning: 2026-05-06T06:57:52.173Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-05-06T06:57:52.173Z
Learning: Applies to src/**/*.py : Llama Stack imports: Use `from llama_stack_client import AsyncLlamaStackClient`
Applied to files:
src/client.py
📚 Learning: 2026-05-20T08:09:36.724Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: src/client.py:104-108
Timestamp: 2026-05-20T08:09:36.724Z
Learning: In the lightspeed-stack repo, the synthesized `run.yaml` file handling in `src/client.py` (`_synthesize_library_config`) uses a fixed `/tmp` path intentionally in the PoC (PR `#1580`). The durable production requirements are tracked in spec doc R10 (docs/design/llama-stack-config-merge/llama-stack-config-merge.md): persistent known path overwritten each boot, file mode 0600 set via explicit create flag (not umask), and a `--synthesized-config-output` CLI flag for debugging. The PoC code is scheduled for removal pre-merge; the implementation JIRA "Unified llama_stack.config schema + synthesizer" inherits R10's requirements.
Applied to files:
tests/e2e/configuration/server-mode/lightspeed-stack.yamltests/e2e/configuration/library-mode/lightspeed-stack.yamldocs/byok_guide.mdtests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
📚 Learning: 2026-04-13T13:39:59.316Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1490
File: tests/e2e/features/environment.py:206-211
Timestamp: 2026-04-13T13:39:59.316Z
Learning: In lightspeed-stack e2e tests (tests/e2e/features/), `context.feature_config` is intentionally set inside Background/step functions (scenario-scoped Behave layer). The `after_scenario` restore logic in `environment.py` only restores config when `context.scenario_lightspeed_override_active` is True, which is only set by `configure_service` when an actual config switch occurs. The module-level `_active_lightspeed_stack_config_basename` in `tests/e2e/features/steps/common.py` prevents re-applying the same config in subsequent scenarios (making `scenario_lightspeed_override_active` stay False). This means the ephemeral nature of step-set context attributes is intentional — the design ensures config restore happens exactly once per actual switch, not redundantly on every scenario.
Applied to files:
tests/e2e/configuration/server-mode/lightspeed-stack.yaml
📚 Learning: 2025-08-18T10:58:14.951Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:47-47
Timestamp: 2025-08-18T10:58:14.951Z
Learning: psycopg2-binary is required by some llama-stack providers in the lightspeed-stack project, so it cannot be replaced with psycopg v3 or moved to optional dependencies without breaking llama-stack functionality.
Applied to files:
docs/rag_guide.mddocs/byok_guide.md
📚 Learning: 2026-03-17T11:34:53.242Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1335
File: docs/config.md:411-411
Timestamp: 2026-03-17T11:34:53.242Z
Learning: In the lightspeed-stack project (`src/models/config.py`, `docs/config.md`), the internal Solr filter `is_chunk:true` (defined as `SOLR_CHUNK_FILTER_QUERY` in `src/constants.py`) is always injected by the system for OKP searches and is intentionally hidden from users. The `chunk_filter_query` field in `OkpConfiguration` is user-facing and additive-only, but the documentation must NOT mention the internal `is_chunk:true` behavior — this is a deliberate design decision by the maintainers.
Applied to files:
docs/rag_guide.mdtests/unit/utils/test_vector_search.pydocs/openapi.jsonsrc/utils/vector_search.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/api/responses/successful/configuration.pysrc/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/api/responses/successful/configuration.pysrc/models/config.py
📚 Learning: 2025-12-18T10:21:09.038Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:09.038Z
Learning: In Llama Stack version 0.3.x, telemetry provider configuration is not supported under the `providers` section in run.yaml configuration files. Telemetry can be enabled with just `telemetry.enabled: true` without requiring an explicit provider block.
Applied to files:
docs/byok_guide.md
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.
Applied to files:
src/app/endpoints/rags.py
📚 Learning: 2026-05-20T08:09:43.391Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: src/llama_stack_configuration.py:651-683
Timestamp: 2026-05-20T08:09:43.391Z
Learning: In `src/llama_stack_configuration.py`, the `apply_high_level_inference` function currently emits `provider_id: p_type` (underscore form, e.g. `sentence_transformers`) directly from the high-level type key, which collides with Llama Stack's hyphenated provider IDs (e.g. `sentence-transformers`). This is a known PoC divergence documented in the spike doc ("Findings discovered during PoC") and tracked in the implementation JIRA "Unified llama_stack.config schema + synthesizer". Decision S5 mandates that each backend-specific synthesizer translates LCORE's canonical type Literal vocabulary to the target backend's expected shape (hyphenated provider_id for Llama Stack; model-string prefixes for Pydantic AI). The PoC code will be removed before merge; the fix belongs in the implementation ticket.
Applied to files:
src/llama_stack_configuration.pysrc/constants.pytests/unit/test_llama_stack_configuration.pysrc/models/config.pysrc/utils/vector_search.py
📚 Learning: 2026-05-20T08:09:34.319Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:34.319Z
Learning: In the lightspeed-stack project (LCORE-836 unified config), the `provider_shield_id` for a Llama Guard safety shield entry must be a guard model identifier (e.g., `meta-llama/Llama-Guard-3-8B`), not a chat model id like `openai/gpt-4o-mini`. Using a chat model id there only means the native_override key landed — it does not mean the safety shield is actually gating queries. E2E tests for the implementation JIRA must exercise a real Llama Guard model to validate the shield.
Applied to files:
src/llama_stack_configuration.pysrc/utils/vector_search.py
📚 Learning: 2026-05-06T06:57:52.173Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-05-06T06:57:52.173Z
Learning: Applies to src/constants.py : Use `constants.py` for shared constants with descriptive comments and type hints using `Final[type]`
Applied to files:
src/constants.py
📚 Learning: 2026-04-15T18:54:09.157Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:769-773
Timestamp: 2026-04-15T18:54:09.157Z
Learning: In lightspeed-core/lightspeed-stack (src/models/requests.py and related files), schema-level field size limits (e.g., max_length=65_536, max_length=32_768) are intentionally written as inline numeric literals, not extracted to constants.py. constants.py is reserved for configurable runtime defaults (e.g., DEFAULT_RH_IDENTITY_MAX_HEADER_SIZE, DEFAULT_MAX_FILE_UPLOAD_SIZE). Do not flag inline literals in field validators or Pydantic Field constraints as needing extraction to constants.py.
Applied to files:
src/constants.pysrc/models/config.py
📚 Learning: 2026-02-23T14:56:59.186Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.
Applied to files:
src/utils/responses.py
📚 Learning: 2026-04-07T14:44:42.022Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1469
File: src/models/config.py:1928-1933
Timestamp: 2026-04-07T14:44:42.022Z
Learning: In lightspeed-core/lightspeed-stack, `allow_verbose_infer` (previously `customization.allow_verbose_infer`, now `rlsapi_v1.allow_verbose_infer`) is only used internally by the `rlsapi_v1` `/infer` endpoint and has a single known consumer (the PR author). Backward compatibility for this config field relocation is intentionally not required and should not be flagged in future reviews.
Applied to files:
src/utils/vector_search.py
📚 Learning: 2026-04-20T15:09:48.726Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:48.726Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, the `_get_rh_identity_context = get_rh_identity_context` alias is a deliberate, temporary backward-compatibility shim introduced in PR `#1548` (part 1/3 of Splunk HEC telemetry work). It is planned for removal in part 3 once the responses endpoint is fully wired up and no tests/consumers reference the underscore-prefixed name. Do not flag this alias as unnecessary or dead code until part 3 is merged.
Applied to files:
src/utils/vector_search.py
🪛 GitHub Actions: OpenAPI (Spectral) / 0_spectral.txt
docs/openapi.json
[error] 1-1: OpenAPI schema is out of date. 'diff -u docs/openapi.json /tmp/openapi-generated.json' failed; regenerate by running: 'uv run scripts/generate_openapi_schema.py docs/openapi.json'.
🪛 GitHub Actions: OpenAPI (Spectral) / spectral
docs/openapi.json
[error] 1-1: CI failed because docs/openapi.json is out of date. Diff detected between docs/openapi.json and generated schema. Regenerate with: uv run python scripts/generate_openapi_schema.py docs/openapi.json
🪛 markdownlint-cli2 (0.22.1)
docs/rag_guide.md
[warning] 42-42: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3
(MD001, heading-increment)
🔇 Additional comments (31)
examples/lightspeed-stack-byok-okp-rag.yaml (1)
39-79: LGTM!examples/quota-limiter-configuration-sqlite.yaml (1)
36-48: LGTM!tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml (1)
31-43: LGTM!tests/e2e/configuration/library-mode/lightspeed-stack-inline-rag.yaml (1)
30-42: LGTM!tests/e2e/configuration/library-mode/lightspeed-stack.yaml (1)
24-36: LGTM!tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml (1)
37-49: LGTM!tests/e2e/configuration/server-mode/lightspeed-stack-inline-rag.yaml (1)
31-43: LGTM!tests/e2e/configuration/server-mode/lightspeed-stack.yaml (1)
25-37: LGTM!tests/e2e/features/inline_rag.feature (1)
25-25: LGTM!Also applies to: 37-37, 46-46, 58-58, 69-69
tests/unit/app/endpoints/test_rags.py (1)
272-293: LGTM!Also applies to: 386-387, 394-395
tests/integration/endpoints/test_query_byok_integration.py (1)
259-259: LGTM!Also applies to: 267-270, 289-300, 454-456, 525-527, 785-788, 914-916, 998-1000, 1095-1097, 1105-1105, 1144-1144, 1190-1192, 1197-1197, 1249-1249, 1292-1295, 1300-1300
tests/integration/endpoints/test_responses_byok_integration.py (1)
119-120: LGTM!Also applies to: 172-173, 222-223, 230-233, 294-304, 384-386, 446-448, 529-531, 537-537, 563-565, 600-602, 607-607, 647-649, 681-684, 689-689
tests/integration/endpoints/test_streaming_query_byok_integration.py (1)
241-241: LGTM!Also applies to: 249-251, 266-267, 274-277, 348-350, 412-414, 692-695, 775-777, 854-856, 936-938, 945-945, 976-978, 1014-1016, 1023-1023, 1067-1069, 1102-1105, 1112-1112
tests/unit/models/config/test_byok_rag.py (1)
9-214: LGTM!tests/unit/models/config/test_dump_configuration.py (1)
14-27: LGTM!Also applies to: 119-254, 478-628, 738-886, 948-1123, 1208-1341, 1509-1644, 1733-1868, 1957-2092
tests/unit/models/config/test_rag_configuration.py (1)
10-17: LGTM!Also applies to: 20-83, 91-97, 99-112, 114-126, 128-141, 143-177, 185-206, 217-233
tests/unit/telemetry/conftest.py (1)
15-32: LGTM!Also applies to: 295-316, 391-412
tests/unit/test_configuration.py (1)
1004-1210: LGTM!Also applies to: 1415-3790
tests/unit/test_llama_stack_configuration.py (1)
265-265: LGTM!Also applies to: 282-282, 312-312, 540-540, 576-589
tests/unit/utils/test_responses.py (1)
59-59: LGTM!Also applies to: 1709-1715, 1779-1785, 1803-1809, 1837-1843, 1864-1872, 1895-1900, 1917-1920, 1939-1944, 1968-1974, 3466-3466, 3491-3491, 3528-3528, 3587-3587
tests/unit/utils/test_vector_search.py (1)
25-25: LGTM!Also applies to: 521-523, 540-543, 580-583, 606-609, 624-630, 670-672, 690-692, 706-707, 716-717, 730-731, 748-749, 763-764, 778-779, 797-803, 821-827, 862-868, 917-923
src/constants.py (1)
172-176: LGTM!Also applies to: 193-197
src/models/config.py (1)
1765-1791: LGTM!Also applies to: 1950-1994, 2003-2084, 2294-2299, 2419-2447
src/models/api/responses/successful/configuration.py (1)
82-94: LGTM!docs/openapi.json (1)
6762-6781: Already covered: regenerate the checked-in OpenAPI artifact.CI is still reporting
docs/openapi.jsonas out of date relative to the generated schema, so these contract edits will keep drifting until the file is regenerated from the source models.Also applies to: 11828-11849, 12097-12100, 12183-12202, 14751-14762, 17761-17838, 18745-18784
src/app/endpoints/rags.py (1)
27-27: LGTM!Also applies to: 169-170
src/configuration.py (1)
478-482: LGTM!Also applies to: 504-517, 532-535, 549-549
src/llama_stack_configuration.py (1)
20-29: LGTM!Also applies to: 350-365, 608-619
src/client.py (1)
94-101: LGTM!src/utils/responses.py (1)
110-110: LGTM!Also applies to: 242-256, 626-676, 691-696, 1735-1737, 1830-1832
src/utils/vector_search.py (1)
253-253: LGTM!Also applies to: 461-481, 653-657
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/app/endpoints/rags.py (1)
111-125: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winUse Google-style docstring sections in this function.
The function docstring is currently NumPy-style; please convert it to Google format (
Args,Returns,Raises) to match repo standards.As per coding guidelines: "Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/app/endpoints/rags.py` around lines 111 - 125, Update the function docstring that "Resolves a user-facing rag_id to the llama-stack vector_db_id" to use Google-style sections: replace NumPy-style "Parameters"/"Returns" with "Args:", "Returns:", and add a "Raises:" section (use "Raises: None" if it doesn't raise) to match repository docstring conventions; keep the existing short description and parameter details (rag_id, byok_rags) and ensure types and return description remain accurate.src/utils/vector_search.py (1)
569-589:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPass
rag.okp.max_chunksinto the Solr query.
limit = configuration.rag.okp.max_chunksonly affects the post-query slice right now._build_query_params(solr)still falls back toconstants.SOLR_VECTOR_SEARCH_DEFAULT_K, so the new config value never changes how many OKP chunks are actually fetched.Suggested fix
- params = _build_query_params(solr) + params = _build_query_params(solr, k=limit)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/utils/vector_search.py` around lines 569 - 589, The Solr query currently ignores configuration.rag.okp.max_chunks because _build_query_params(solr) falls back to constants.SOLR_VECTOR_SEARCH_DEFAULT_K; pass the configured limit into the query parameters so the client.vector_io.query actually requests that many chunks. Either update _build_query_params to accept a limit (e.g., _build_query_params(solr, limit)) and use configuration.rag.okp.max_chunks when building params, or set params['k'] = limit (or the appropriate param key) after params = _build_query_params(solr) and before calling client.vector_io.query; reference the limit variable, _build_query_params, params, client.vector_io.query, and constants.SOLR_VECTOR_SEARCH_DEFAULT_K when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/byok_guide.md`:
- Around line 223-231: The table documents deprecated flat RAG keys (rag_id,
rag_type, embedding_model, embedding_dimension, vector_db_id, db_path,
score_multiplier); update these to the new nested schema using
rag.byok.stores[*] entries and the rag.backend field (replace rag_type →
rag.backend, move rag_id/vector_db_id/db_path/etc. under rag.byok.stores with
per-store keys, and adjust embedding_model/embedding_dimension/score_multiplier
to per-store fields). Ensure all examples and the related occurrences (also at
the later section mentioned) use rag.byok.stores and rag.backend rather than the
old top-level keys so the config matches the new contract.
In `@docs/rag_guide.md`:
- Line 42: Change the heading "Inline RAG chunk flow" from an H3 to H2 to fix
the markdown lint MD001; locate the heading line with the text "Inline RAG chunk
flow" in docs/rag_guide.md and update its markdown header marker to the H2 style
(i.e., use two hash symbols) so the document heading levels are sequential.
In `@src/models/config.py`:
- Around line 1996-2000: Add a post-validation check on the model that declares
the stores: list[RagStore] field to reject duplicate RagStore.rag_id values;
implement a Pydantic validator (e.g., `@validator`("stores") or a `@root_validator`
on that model) that collects rag_id from each RagStore, detects duplicates, and
raises a ValueError listing the duplicated rag_id(s) so validation fails if the
same rag_id appears more than once; reference the stores field and the
RagStore.rag_id attribute when locating where to add this check.
---
Outside diff comments:
In `@src/app/endpoints/rags.py`:
- Around line 111-125: Update the function docstring that "Resolves a
user-facing rag_id to the llama-stack vector_db_id" to use Google-style
sections: replace NumPy-style "Parameters"/"Returns" with "Args:", "Returns:",
and add a "Raises:" section (use "Raises: None" if it doesn't raise) to match
repository docstring conventions; keep the existing short description and
parameter details (rag_id, byok_rags) and ensure types and return description
remain accurate.
In `@src/utils/vector_search.py`:
- Around line 569-589: The Solr query currently ignores
configuration.rag.okp.max_chunks because _build_query_params(solr) falls back to
constants.SOLR_VECTOR_SEARCH_DEFAULT_K; pass the configured limit into the query
parameters so the client.vector_io.query actually requests that many chunks.
Either update _build_query_params to accept a limit (e.g.,
_build_query_params(solr, limit)) and use configuration.rag.okp.max_chunks when
building params, or set params['k'] = limit (or the appropriate param key) after
params = _build_query_params(solr) and before calling client.vector_io.query;
reference the limit variable, _build_query_params, params,
client.vector_io.query, and constants.SOLR_VECTOR_SEARCH_DEFAULT_K when making
the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7791912b-70fc-45af-b294-8bbec271970e
📒 Files selected for processing (33)
docs/byok_guide.mddocs/openapi.jsondocs/rag_guide.mdexamples/lightspeed-stack-byok-okp-rag.yamlexamples/quota-limiter-configuration-sqlite.yamlsrc/app/endpoints/rags.pysrc/client.pysrc/configuration.pysrc/constants.pysrc/llama_stack_configuration.pysrc/models/api/responses/successful/configuration.pysrc/models/config.pysrc/utils/responses.pysrc/utils/vector_search.pytests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yamltests/e2e/configuration/library-mode/lightspeed-stack-inline-rag.yamltests/e2e/configuration/library-mode/lightspeed-stack.yamltests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yamltests/e2e/configuration/server-mode/lightspeed-stack-inline-rag.yamltests/e2e/configuration/server-mode/lightspeed-stack.yamltests/e2e/features/inline_rag.featuretests/integration/endpoints/test_query_byok_integration.pytests/integration/endpoints/test_responses_byok_integration.pytests/integration/endpoints/test_streaming_query_byok_integration.pytests/unit/app/endpoints/test_rags.pytests/unit/models/config/test_byok_rag.pytests/unit/models/config/test_dump_configuration.pytests/unit/models/config/test_rag_configuration.pytests/unit/telemetry/conftest.pytests/unit/test_configuration.pytests/unit/test_llama_stack_configuration.pytests/unit/utils/test_responses.pytests/unit/utils/test_vector_search.py
📜 Review details
🔇 Additional comments (31)
examples/lightspeed-stack-byok-okp-rag.yaml (1)
39-79: LGTM!examples/quota-limiter-configuration-sqlite.yaml (1)
36-48: LGTM!tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml (1)
31-43: LGTM!tests/e2e/configuration/library-mode/lightspeed-stack-inline-rag.yaml (1)
30-42: LGTM!tests/e2e/configuration/library-mode/lightspeed-stack.yaml (1)
24-36: LGTM!tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml (1)
37-49: LGTM!tests/e2e/configuration/server-mode/lightspeed-stack-inline-rag.yaml (1)
31-43: LGTM!tests/e2e/configuration/server-mode/lightspeed-stack.yaml (1)
25-37: LGTM!tests/e2e/features/inline_rag.feature (1)
25-25: LGTM!Also applies to: 37-37, 46-46, 58-58, 69-69
tests/unit/app/endpoints/test_rags.py (1)
272-293: LGTM!Also applies to: 386-387, 394-395
tests/integration/endpoints/test_query_byok_integration.py (1)
259-259: LGTM!Also applies to: 267-270, 289-300, 454-456, 525-527, 785-788, 914-916, 998-1000, 1095-1097, 1105-1105, 1144-1144, 1190-1192, 1197-1197, 1249-1249, 1292-1295, 1300-1300
tests/integration/endpoints/test_responses_byok_integration.py (1)
119-120: LGTM!Also applies to: 172-173, 222-223, 230-233, 294-304, 384-386, 446-448, 529-531, 537-537, 563-565, 600-602, 607-607, 647-649, 681-684, 689-689
tests/integration/endpoints/test_streaming_query_byok_integration.py (1)
241-241: LGTM!Also applies to: 249-251, 266-267, 274-277, 348-350, 412-414, 692-695, 775-777, 854-856, 936-938, 945-945, 976-978, 1014-1016, 1023-1023, 1067-1069, 1102-1105, 1112-1112
tests/unit/models/config/test_byok_rag.py (1)
9-214: LGTM!tests/unit/models/config/test_dump_configuration.py (1)
14-27: LGTM!Also applies to: 119-254, 478-628, 738-886, 948-1123, 1208-1341, 1509-1644, 1733-1868, 1957-2092
tests/unit/models/config/test_rag_configuration.py (1)
10-17: LGTM!Also applies to: 20-83, 91-97, 99-112, 114-126, 128-141, 143-177, 185-206, 217-233
tests/unit/telemetry/conftest.py (1)
15-32: LGTM!Also applies to: 295-316, 391-412
tests/unit/test_configuration.py (1)
1004-1210: LGTM!Also applies to: 1415-3790
tests/unit/test_llama_stack_configuration.py (1)
265-265: LGTM!Also applies to: 282-282, 312-312, 540-540, 576-589
tests/unit/utils/test_responses.py (1)
59-59: LGTM!Also applies to: 1709-1715, 1779-1785, 1803-1809, 1837-1843, 1864-1872, 1895-1900, 1917-1920, 1939-1944, 1968-1974, 3466-3466, 3491-3491, 3528-3528, 3587-3587
tests/unit/utils/test_vector_search.py (1)
25-25: LGTM!Also applies to: 521-523, 540-543, 580-583, 606-609, 624-630, 670-672, 690-692, 706-707, 716-717, 730-731, 748-749, 763-764, 778-779, 797-803, 821-827, 862-868, 917-923
src/constants.py (1)
172-176: LGTM!Also applies to: 193-197
src/models/config.py (1)
1765-1791: LGTM!Also applies to: 1950-1994, 2003-2084, 2294-2299, 2419-2447
src/models/api/responses/successful/configuration.py (1)
82-94: LGTM!docs/openapi.json (1)
6762-6781: Already covered: regenerate the checked-in OpenAPI artifact.CI is still reporting
docs/openapi.jsonas out of date relative to the generated schema, so these contract edits will keep drifting until the file is regenerated from the source models.Also applies to: 11828-11849, 12097-12100, 12183-12202, 14751-14762, 17761-17838, 18745-18784
src/app/endpoints/rags.py (1)
27-27: LGTM!Also applies to: 169-170
src/configuration.py (1)
478-482: LGTM!Also applies to: 504-517, 532-535, 549-549
src/llama_stack_configuration.py (1)
20-29: LGTM!Also applies to: 350-365, 608-619
src/client.py (1)
94-101: LGTM!src/utils/responses.py (1)
110-110: LGTM!Also applies to: 242-256, 626-676, 691-696, 1735-1737, 1830-1832
src/utils/vector_search.py (1)
253-253: LGTM!Also applies to: 461-481, 653-657
🛑 Comments failed to post (3)
docs/byok_guide.md (1)
223-231:
⚠️ Potential issue | 🟠 Major | ⚡ Quick winUpdate these sections to the new nested RAG schema.
These changed sections still document deprecated keys (
byok_rag,rag_type). The new contract usesrag.byok.storesandbackend, so current snippets can lead users to invalid configuration.Suggested doc update
-**`byok_rag` field reference:** +**`rag.byok.stores` field reference:** -| `rag_type` | No | `inline::faiss` | Vector store provider type | +| `backend` | No | `faiss` | Vector store backend type | -A configuration combining a local FAISS store (via `byok_rag`) with a remote pgvector store ... +A configuration combining a local FAISS store (via `rag.byok.stores`) with a remote pgvector store ...Also applies to: 374-375
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/byok_guide.md` around lines 223 - 231, The table documents deprecated flat RAG keys (rag_id, rag_type, embedding_model, embedding_dimension, vector_db_id, db_path, score_multiplier); update these to the new nested schema using rag.byok.stores[*] entries and the rag.backend field (replace rag_type → rag.backend, move rag_id/vector_db_id/db_path/etc. under rag.byok.stores with per-store keys, and adjust embedding_model/embedding_dimension/score_multiplier to per-store fields). Ensure all examples and the related occurrences (also at the later section mentioned) use rag.byok.stores and rag.backend rather than the old top-level keys so the config matches the new contract.docs/rag_guide.md (1)
42-42:
⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix heading level jump to satisfy markdown linting.
This heading jumps from H1 to H3 and triggers MD001. Use H2 here.
Suggested fix
-### Inline RAG chunk flow +## Inline RAG chunk flow📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.## Inline RAG chunk flow🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 42-42: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3(MD001, heading-increment)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/rag_guide.md` at line 42, Change the heading "Inline RAG chunk flow" from an H3 to H2 to fix the markdown lint MD001; locate the heading line with the text "Inline RAG chunk flow" in docs/rag_guide.md and update its markdown header marker to the H2 style (i.e., use two hash symbols) so the document heading levels are sequential.src/models/config.py (1)
1996-2000:
⚠️ Potential issue | 🟠 Major | ⚡ Quick winReject duplicate BYOK
rag_idvalues.
retrieval.inline.sources/retrieval.tool.sourcesare now keyed bybyok.stores[].rag_id, butstoresstill allows the samerag_idto appear multiple times. That makes one configured source name ambiguous at runtime while still passingvalidate_retrieval_sources, because the validator collapses IDs into a set. Add a post-validator that fails whenrag_idis repeated.Proposed fix
class ByokConfiguration(ConfigurationBase): """BYOK (Bring Your Own Knowledge) configuration.""" @@ stores: list[RagStore] = Field( default_factory=list, title="BYOK RAG stores", description="List of BYOK RAG store configurations.", ) + + `@model_validator`(mode="after") + def validate_unique_rag_ids(self) -> Self: + """Reject duplicate BYOK store IDs.""" + seen: set[str] = set() + duplicates: set[str] = set() + for store in self.stores: + if store.rag_id in seen: + duplicates.add(store.rag_id) + seen.add(store.rag_id) + if duplicates: + raise ValueError( + f"byok.stores contains duplicate rag_id values: {sorted(duplicates)}" + ) + return selfBased on learnings,
src/utils/vector_search.pyonly receives user-facingrag_ids, sorag_idis the join key between configuration and runtime lookup.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/models/config.py` around lines 1996 - 2000, Add a post-validation check on the model that declares the stores: list[RagStore] field to reject duplicate RagStore.rag_id values; implement a Pydantic validator (e.g., `@validator`("stores") or a `@root_validator` on that model) that collects rag_id from each RagStore, detects duplicates, and raises a ValueError listing the duplicated rag_id(s) so validation fails if the same rag_id appears more than once; reference the stores field and the RagStore.rag_id attribute when locating where to add this check.
| } | ||
| """ | ||
|
|
||
| @flaky |
There was a problem hiding this comment.
the problem here is not with flakiness, adding this tag just hides the real issue .
this is in the llama-stack logs
INFO 2026-06-04 12:39:03,513 uvicorn.access:481 server: ::1:52598 - "GET /v1/health HTTP/1.1" 200
llama-stack | HTTP Error 429 thrown while requesting HEAD https://huggingface.co/sentence-transformers/all-mpnet-base-v2/resolve/main/./modules.json
llama-stack | WARNING 2026-06-04 12:39:11,340 huggingface_hub.utils._http:446 uncategorized: HTTP Error 429 thrown while requesting
llama-stack | HEAD https://huggingface.co/sentence-transformers/all-mpnet-base-v2/resolve/main/./modules.json
llama-stack | Retrying in 1s [Retry 1/5].
llama-stack | WARNING 2026-06-04 12:39:11,341 huggingface_hub.utils._http:486 uncategorized: Retrying in 1s [Retry 1/5].
llama-stack | HTTP Error 429 thrown while requesting HEAD https://huggingface.co/sentence-transformers/all-mpnet-base-v2/resolve/main/./modules.json
llama-stack | WARNING 2026-06-04 12:39:12,362 huggingface_hub.utils._http:446 uncategorized: HTTP Error 429 thrown while requesting
llama-stack | HEAD https://huggingface.co/sentence-transformers/all-mpnet-base-v2/resolve/main/./modules.json
llama-stack | Retrying in 2s [Retry 2/5].
|
/hold |
Summary
Unifies all RAG-related configuration under a single
ragsection inlightspeed-stack.yaml, replacing the three separate top-level sections (byok_rag,rag,okp). Hardcoded chunk limit constants are now user-configurable fields with sensible defaults.Key changes:
byok_raglist →rag.byok.stores(withrag.byok.max_chunks)rag.inline/rag.toollists →rag.retrieval.inline.sources/rag.retrieval.tool.sources(each withmax_chunks)okpsection →rag.okp(withmax_chunks)rag_typefield renamed tobackend(e.g.faissinstead ofinline::faiss)BACKEND_TO_LLAMA_STACK_PROVIDERmapping validates backends during enrichmentBYOK_RAG_MAX_CHUNKS,OKP_RAG_MAX_CHUNKS, etc.) replaced with config defaultsmax_chunksparameter from_fetch_byok_rag(was always passed the config value)_fetch_solr_rag→_fetch_okp_ragfor consistency with the OKP naming conventionType of change
Tools used to create PR
Related Tickets & Documents
Checklist before requesting a review
Testing
uv run make test-unit)uv run make verify)uv run make schema)🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
Documentation
Configuration Changes
backendfield for consistency.