Skip to content

LCORE-1426: BYOK Config refactoring#1843

Open
are-ces wants to merge 11 commits into
lightspeed-core:mainfrom
are-ces:lcore-1426-byok-config-refactoring
Open

LCORE-1426: BYOK Config refactoring#1843
are-ces wants to merge 11 commits into
lightspeed-core:mainfrom
are-ces:lcore-1426-byok-config-refactoring

Conversation

@are-ces
Copy link
Copy Markdown
Contributor

@are-ces are-ces commented Jun 3, 2026

Summary

Unifies all RAG-related configuration under a single rag section in lightspeed-stack.yaml, replacing the three separate top-level sections (byok_rag, rag, okp). Hardcoded chunk limit constants are now user-configurable fields with sensible defaults.

Key changes:

  • byok_rag list → rag.byok.stores (with rag.byok.max_chunks)
  • rag.inline / rag.tool lists → rag.retrieval.inline.sources / rag.retrieval.tool.sources (each with max_chunks)
  • okp section → rag.okp (with max_chunks)
  • rag_type field renamed to backend (e.g. faiss instead of inline::faiss)
  • BACKEND_TO_LLAMA_STACK_PROVIDER mapping validates backends during enrichment
  • Chunk limit constants (BYOK_RAG_MAX_CHUNKS, OKP_RAG_MAX_CHUNKS, etc.) replaced with config defaults
  • Removed redundant max_chunks parameter from _fetch_byok_rag (was always passed the config value)
  • Renamed _fetch_solr_rag_fetch_okp_rag for consistency with the OKP naming convention
  • Updated BYOK and RAG guides with new config structure, chunk flow diagram, and prioritization docs

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

  • Assisted-by: Claude Code (Claude Opus 4.6)
  • Generated by: Claude Opus 4.6

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • All unit tests updated and passing (uv run make test-unit)
  • All integration test fixtures updated to new config shape
  • Linters and type checkers pass (uv run make verify)
  • OpenAPI schema regenerated (uv run make schema)
  • YAML example configs validated against new Pydantic models

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Added nested RAG configuration structure for improved organization of BYOK stores, OKP settings, and retrieval strategies (inline and tool).
    • Introduced configurable chunk limits per RAG strategy and source type.
  • Documentation

    • Enhanced BYOK content prioritization guidance with flow diagrams.
    • Updated RAG configuration guide with new nested structure and chunk flow documentation.
    • Clarified pgvector configuration requirements.
  • Configuration Changes

    • RAG BYOK stores now use backend field for consistency.
    • Updated OpenAPI specification to reflect unified RAG configuration model.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 3, 2026

Review Change Stack

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7791912b-70fc-45af-b294-8bbec271970e

📥 Commits

Reviewing files that changed from the base of the PR and between 2209ac9 and 472397b.

📒 Files selected for processing (33)
  • docs/byok_guide.md
  • docs/openapi.json
  • docs/rag_guide.md
  • examples/lightspeed-stack-byok-okp-rag.yaml
  • examples/quota-limiter-configuration-sqlite.yaml
  • src/app/endpoints/rags.py
  • src/client.py
  • src/configuration.py
  • src/constants.py
  • src/llama_stack_configuration.py
  • src/models/api/responses/successful/configuration.py
  • src/models/config.py
  • src/utils/responses.py
  • src/utils/vector_search.py
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-inline-rag.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-inline-rag.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack.yaml
  • tests/e2e/features/inline_rag.feature
  • tests/integration/endpoints/test_query_byok_integration.py
  • tests/integration/endpoints/test_responses_byok_integration.py
  • tests/integration/endpoints/test_streaming_query_byok_integration.py
  • tests/unit/app/endpoints/test_rags.py
  • tests/unit/models/config/test_byok_rag.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/models/config/test_rag_configuration.py
  • tests/unit/telemetry/conftest.py
  • tests/unit/test_configuration.py
  • tests/unit/test_llama_stack_configuration.py
  • tests/unit/utils/test_responses.py
  • tests/unit/utils/test_vector_search.py
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: build-pr
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (8)
tests/e2e/**/*.{py,feature}

📄 CodeRabbit inference engine (AGENTS.md)

Use behave (BDD) framework for end-to-end testing with Gherkin feature files

Files:

  • tests/e2e/features/inline_rag.feature
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

  • src/client.py
  • src/models/api/responses/successful/configuration.py
  • src/configuration.py
  • src/app/endpoints/rags.py
  • src/llama_stack_configuration.py
  • src/constants.py
  • src/utils/responses.py
  • src/models/config.py
  • src/utils/vector_search.py
src/**/configuration.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/configuration.py: All config models must extend ConfigurationBase with extra="forbid" to reject unknown fields
Use @field_validator and @model_validator for custom validation in Pydantic models

Files:

  • src/models/api/responses/successful/configuration.py
  • src/configuration.py
src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models must use @model_validator and @field_validator for validation and complete type annotations for all attributes, avoiding Any type

Files:

  • src/models/api/responses/successful/configuration.py
  • src/models/config.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

  • tests/unit/telemetry/conftest.py
  • tests/unit/models/config/test_byok_rag.py
  • tests/unit/test_llama_stack_configuration.py
  • tests/integration/endpoints/test_responses_byok_integration.py
  • tests/unit/utils/test_vector_search.py
  • tests/unit/app/endpoints/test_rags.py
  • tests/unit/utils/test_responses.py
  • tests/unit/models/config/test_rag_configuration.py
  • tests/integration/endpoints/test_query_byok_integration.py
  • tests/unit/test_configuration.py
  • tests/integration/endpoints/test_streaming_query_byok_integration.py
  • tests/unit/models/config/test_dump_configuration.py
tests/**/conftest.py

📄 CodeRabbit inference engine (AGENTS.md)

Use conftest.py for shared pytest fixtures and pytest-mock for AsyncMock objects

Files:

  • tests/unit/telemetry/conftest.py
src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/app/**/*.py: FastAPI dependencies: Import from fastapi module for APIRouter, HTTPException, Request, status, Depends
Use FastAPI HTTPException with appropriate status codes for API endpoints and handle APIConnectionError from Llama Stack

Files:

  • src/app/endpoints/rags.py
src/constants.py

📄 CodeRabbit inference engine (AGENTS.md)

Use constants.py for shared constants with descriptive comments and type hints using Final[type]

Files:

  • src/constants.py
🧠 Learnings (21)
📓 Common learnings
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1843
File: src/utils/vector_search.py:472-484
Timestamp: 2026-06-03T11:15:38.136Z
Learning: In `src/utils/vector_search.py`, the `_fetch_byok_rag` function's `vector_store_ids` parameter always contains user-facing `rag_id`s, never raw llama-stack `vector_db_id`s. The API contract enforces rag_ids only, so filtering `vector_store_ids` against `configuration.configuration.rag.retrieval.inline.sources` (also rag_ids) before calling `resolve_vector_store_ids` is intentional and correct.
📚 Learning: 2026-06-03T11:15:38.136Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1843
File: src/utils/vector_search.py:472-484
Timestamp: 2026-06-03T11:15:38.136Z
Learning: In `src/utils/vector_search.py`, the `_fetch_byok_rag` function's `vector_store_ids` parameter always contains user-facing `rag_id`s, never raw llama-stack `vector_db_id`s. The API contract enforces rag_ids only, so filtering `vector_store_ids` against `configuration.configuration.rag.retrieval.inline.sources` (also rag_ids) before calling `resolve_vector_store_ids` is intentional and correct.

Applied to files:

  • tests/e2e/configuration/server-mode/lightspeed-stack-inline-rag.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
  • examples/quota-limiter-configuration-sqlite.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-inline-rag.yaml
  • docs/rag_guide.md
  • examples/lightspeed-stack-byok-okp-rag.yaml
  • src/models/api/responses/successful/configuration.py
  • src/configuration.py
  • docs/byok_guide.md
  • src/app/endpoints/rags.py
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
  • src/llama_stack_configuration.py
  • src/constants.py
  • src/utils/responses.py
  • tests/unit/models/config/test_byok_rag.py
  • tests/unit/test_llama_stack_configuration.py
  • tests/integration/endpoints/test_responses_byok_integration.py
  • tests/unit/utils/test_vector_search.py
  • tests/unit/app/endpoints/test_rags.py
  • tests/unit/utils/test_responses.py
  • tests/unit/models/config/test_rag_configuration.py
  • tests/integration/endpoints/test_query_byok_integration.py
  • src/models/config.py
  • tests/unit/test_configuration.py
  • docs/openapi.json
  • tests/integration/endpoints/test_streaming_query_byok_integration.py
  • tests/unit/models/config/test_dump_configuration.py
  • src/utils/vector_search.py
📚 Learning: 2026-05-20T08:09:30.641Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:30.641Z
Learning: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set `provider_shield_id` to the *guard model identifier* (e.g., `meta-llama/Llama-Guard-3-8B`). Do not use a chat/generative model id (e.g., `openai/gpt-4o-mini`): a chat-model id (or `native_override`) indicates only an override landed and does **not** mean the safety shield is actually gating queries. Ensure any E2E coverage for the related implementation (JIRA/E2E tests) exercises a real Llama Guard model to verify that the shield is effective.

Applied to files:

  • tests/e2e/configuration/server-mode/lightspeed-stack-inline-rag.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
  • examples/quota-limiter-configuration-sqlite.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-inline-rag.yaml
  • examples/lightspeed-stack-byok-okp-rag.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.

Applied to files:

  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
📚 Learning: 2025-09-02T11:15:02.411Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/test_list.txt:2-3
Timestamp: 2025-09-02T11:15:02.411Z
Learning: In the lightspeed-stack e2e tests, the Authorized tag is intentionally omitted from noop authentication tests because they are designed to test against the default lightspeed-stack.yaml configuration rather than the specialized noop-with-token configuration.

Applied to files:

  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
📚 Learning: 2026-05-06T06:57:52.173Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-05-06T06:57:52.173Z
Learning: Applies to src/**/*.py : Llama Stack imports: Use `from llama_stack_client import AsyncLlamaStackClient`

Applied to files:

  • src/client.py
📚 Learning: 2026-05-20T08:09:36.724Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: src/client.py:104-108
Timestamp: 2026-05-20T08:09:36.724Z
Learning: In the lightspeed-stack repo, the synthesized `run.yaml` file handling in `src/client.py` (`_synthesize_library_config`) uses a fixed `/tmp` path intentionally in the PoC (PR `#1580`). The durable production requirements are tracked in spec doc R10 (docs/design/llama-stack-config-merge/llama-stack-config-merge.md): persistent known path overwritten each boot, file mode 0600 set via explicit create flag (not umask), and a `--synthesized-config-output` CLI flag for debugging. The PoC code is scheduled for removal pre-merge; the implementation JIRA "Unified llama_stack.config schema + synthesizer" inherits R10's requirements.

Applied to files:

  • tests/e2e/configuration/server-mode/lightspeed-stack.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack.yaml
  • docs/byok_guide.md
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
📚 Learning: 2026-04-13T13:39:59.316Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1490
File: tests/e2e/features/environment.py:206-211
Timestamp: 2026-04-13T13:39:59.316Z
Learning: In lightspeed-stack e2e tests (tests/e2e/features/), `context.feature_config` is intentionally set inside Background/step functions (scenario-scoped Behave layer). The `after_scenario` restore logic in `environment.py` only restores config when `context.scenario_lightspeed_override_active` is True, which is only set by `configure_service` when an actual config switch occurs. The module-level `_active_lightspeed_stack_config_basename` in `tests/e2e/features/steps/common.py` prevents re-applying the same config in subsequent scenarios (making `scenario_lightspeed_override_active` stay False). This means the ephemeral nature of step-set context attributes is intentional — the design ensures config restore happens exactly once per actual switch, not redundantly on every scenario.

Applied to files:

  • tests/e2e/configuration/server-mode/lightspeed-stack.yaml
📚 Learning: 2025-08-18T10:58:14.951Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:47-47
Timestamp: 2025-08-18T10:58:14.951Z
Learning: psycopg2-binary is required by some llama-stack providers in the lightspeed-stack project, so it cannot be replaced with psycopg v3 or moved to optional dependencies without breaking llama-stack functionality.

Applied to files:

  • docs/rag_guide.md
  • docs/byok_guide.md
📚 Learning: 2026-03-17T11:34:53.242Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1335
File: docs/config.md:411-411
Timestamp: 2026-03-17T11:34:53.242Z
Learning: In the lightspeed-stack project (`src/models/config.py`, `docs/config.md`), the internal Solr filter `is_chunk:true` (defined as `SOLR_CHUNK_FILTER_QUERY` in `src/constants.py`) is always injected by the system for OKP searches and is intentionally hidden from users. The `chunk_filter_query` field in `OkpConfiguration` is user-facing and additive-only, but the documentation must NOT mention the internal `is_chunk:true` behavior — this is a deliberate design decision by the maintainers.

Applied to files:

  • docs/rag_guide.md
  • tests/unit/utils/test_vector_search.py
  • docs/openapi.json
  • src/utils/vector_search.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/api/responses/successful/configuration.py
  • src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/api/responses/successful/configuration.py
  • src/models/config.py
📚 Learning: 2025-12-18T10:21:09.038Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:09.038Z
Learning: In Llama Stack version 0.3.x, telemetry provider configuration is not supported under the `providers` section in run.yaml configuration files. Telemetry can be enabled with just `telemetry.enabled: true` without requiring an explicit provider block.

Applied to files:

  • docs/byok_guide.md
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/rags.py
📚 Learning: 2026-05-20T08:09:43.391Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: src/llama_stack_configuration.py:651-683
Timestamp: 2026-05-20T08:09:43.391Z
Learning: In `src/llama_stack_configuration.py`, the `apply_high_level_inference` function currently emits `provider_id: p_type` (underscore form, e.g. `sentence_transformers`) directly from the high-level type key, which collides with Llama Stack's hyphenated provider IDs (e.g. `sentence-transformers`). This is a known PoC divergence documented in the spike doc ("Findings discovered during PoC") and tracked in the implementation JIRA "Unified llama_stack.config schema + synthesizer". Decision S5 mandates that each backend-specific synthesizer translates LCORE's canonical type Literal vocabulary to the target backend's expected shape (hyphenated provider_id for Llama Stack; model-string prefixes for Pydantic AI). The PoC code will be removed before merge; the fix belongs in the implementation ticket.

Applied to files:

  • src/llama_stack_configuration.py
  • src/constants.py
  • tests/unit/test_llama_stack_configuration.py
  • src/models/config.py
  • src/utils/vector_search.py
📚 Learning: 2026-05-20T08:09:34.319Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:34.319Z
Learning: In the lightspeed-stack project (LCORE-836 unified config), the `provider_shield_id` for a Llama Guard safety shield entry must be a guard model identifier (e.g., `meta-llama/Llama-Guard-3-8B`), not a chat model id like `openai/gpt-4o-mini`. Using a chat model id there only means the native_override key landed — it does not mean the safety shield is actually gating queries. E2E tests for the implementation JIRA must exercise a real Llama Guard model to validate the shield.

Applied to files:

  • src/llama_stack_configuration.py
  • src/utils/vector_search.py
📚 Learning: 2026-05-06T06:57:52.173Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-05-06T06:57:52.173Z
Learning: Applies to src/constants.py : Use `constants.py` for shared constants with descriptive comments and type hints using `Final[type]`

Applied to files:

  • src/constants.py
📚 Learning: 2026-04-15T18:54:09.157Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:769-773
Timestamp: 2026-04-15T18:54:09.157Z
Learning: In lightspeed-core/lightspeed-stack (src/models/requests.py and related files), schema-level field size limits (e.g., max_length=65_536, max_length=32_768) are intentionally written as inline numeric literals, not extracted to constants.py. constants.py is reserved for configurable runtime defaults (e.g., DEFAULT_RH_IDENTITY_MAX_HEADER_SIZE, DEFAULT_MAX_FILE_UPLOAD_SIZE). Do not flag inline literals in field validators or Pydantic Field constraints as needing extraction to constants.py.

Applied to files:

  • src/constants.py
  • src/models/config.py
📚 Learning: 2026-02-23T14:56:59.186Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.

Applied to files:

  • src/utils/responses.py
📚 Learning: 2026-04-07T14:44:42.022Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1469
File: src/models/config.py:1928-1933
Timestamp: 2026-04-07T14:44:42.022Z
Learning: In lightspeed-core/lightspeed-stack, `allow_verbose_infer` (previously `customization.allow_verbose_infer`, now `rlsapi_v1.allow_verbose_infer`) is only used internally by the `rlsapi_v1` `/infer` endpoint and has a single known consumer (the PR author). Backward compatibility for this config field relocation is intentionally not required and should not be flagged in future reviews.

Applied to files:

  • src/utils/vector_search.py
📚 Learning: 2026-04-20T15:09:48.726Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:48.726Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, the `_get_rh_identity_context = get_rh_identity_context` alias is a deliberate, temporary backward-compatibility shim introduced in PR `#1548` (part 1/3 of Splunk HEC telemetry work). It is planned for removal in part 3 once the responses endpoint is fully wired up and no tests/consumers reference the underscore-prefixed name. Do not flag this alias as unnecessary or dead code until part 3 is merged.

Applied to files:

  • src/utils/vector_search.py
🪛 GitHub Actions: OpenAPI (Spectral) / 0_spectral.txt
docs/openapi.json

[error] 1-1: OpenAPI schema is out of date. 'diff -u docs/openapi.json /tmp/openapi-generated.json' failed; regenerate by running: 'uv run scripts/generate_openapi_schema.py docs/openapi.json'.

🪛 GitHub Actions: OpenAPI (Spectral) / spectral
docs/openapi.json

[error] 1-1: CI failed because docs/openapi.json is out of date. Diff detected between docs/openapi.json and generated schema. Regenerate with: uv run python scripts/generate_openapi_schema.py docs/openapi.json

🪛 markdownlint-cli2 (0.22.1)
docs/rag_guide.md

[warning] 42-42: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

🛑 Comments failed to post (3)
docs/byok_guide.md (1)

223-231: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Update these sections to the new nested RAG schema.

These changed sections still document deprecated keys (byok_rag, rag_type). The new contract uses rag.byok.stores and backend, so current snippets can lead users to invalid configuration.

Suggested doc update
-**`byok_rag` field reference:**
+**`rag.byok.stores` field reference:**

-| `rag_type`            | No       | `inline::faiss`                           | Vector store provider type                                                                |
+| `backend`             | No       | `faiss`                                   | Vector store backend type                                                                 |

-A configuration combining a local FAISS store (via `byok_rag`) with a remote pgvector store ...
+A configuration combining a local FAISS store (via `rag.byok.stores`) with a remote pgvector store ...

Also applies to: 374-375

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/byok_guide.md` around lines 223 - 231, The table documents deprecated
flat RAG keys (rag_id, rag_type, embedding_model, embedding_dimension,
vector_db_id, db_path, score_multiplier); update these to the new nested schema
using rag.byok.stores[*] entries and the rag.backend field (replace rag_type →
rag.backend, move rag_id/vector_db_id/db_path/etc. under rag.byok.stores with
per-store keys, and adjust embedding_model/embedding_dimension/score_multiplier
to per-store fields). Ensure all examples and the related occurrences (also at
the later section mentioned) use rag.byok.stores and rag.backend rather than the
old top-level keys so the config matches the new contract.
docs/rag_guide.md (1)

42-42: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix heading level jump to satisfy markdown linting.

This heading jumps from H1 to H3 and triggers MD001. Use H2 here.

Suggested fix
-### Inline RAG chunk flow
+## Inline RAG chunk flow
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

## Inline RAG chunk flow
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 42-42: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/rag_guide.md` at line 42, Change the heading "Inline RAG chunk flow"
from an H3 to H2 to fix the markdown lint MD001; locate the heading line with
the text "Inline RAG chunk flow" in docs/rag_guide.md and update its markdown
header marker to the H2 style (i.e., use two hash symbols) so the document
heading levels are sequential.
src/models/config.py (1)

1996-2000: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject duplicate BYOK rag_id values.

retrieval.inline.sources / retrieval.tool.sources are now keyed by byok.stores[].rag_id, but stores still allows the same rag_id to appear multiple times. That makes one configured source name ambiguous at runtime while still passing validate_retrieval_sources, because the validator collapses IDs into a set. Add a post-validator that fails when rag_id is repeated.

Proposed fix
 class ByokConfiguration(ConfigurationBase):
     """BYOK (Bring Your Own Knowledge) configuration."""
@@
     stores: list[RagStore] = Field(
         default_factory=list,
         title="BYOK RAG stores",
         description="List of BYOK RAG store configurations.",
     )
+
+    `@model_validator`(mode="after")
+    def validate_unique_rag_ids(self) -> Self:
+        """Reject duplicate BYOK store IDs."""
+        seen: set[str] = set()
+        duplicates: set[str] = set()
+        for store in self.stores:
+            if store.rag_id in seen:
+                duplicates.add(store.rag_id)
+            seen.add(store.rag_id)
+        if duplicates:
+            raise ValueError(
+                f"byok.stores contains duplicate rag_id values: {sorted(duplicates)}"
+            )
+        return self

Based on learnings, src/utils/vector_search.py only receives user-facing rag_ids, so rag_id is the join key between configuration and runtime lookup.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/models/config.py` around lines 1996 - 2000, Add a post-validation check
on the model that declares the stores: list[RagStore] field to reject duplicate
RagStore.rag_id values; implement a Pydantic validator (e.g.,
`@validator`("stores") or a `@root_validator` on that model) that collects rag_id
from each RagStore, detects duplicates, and raises a ValueError listing the
duplicated rag_id(s) so validation fails if the same rag_id appears more than
once; reference the stores field and the RagStore.rag_id attribute when locating
where to add this check.
🔇 Additional comments (31)
examples/lightspeed-stack-byok-okp-rag.yaml (1)

39-79: LGTM!

examples/quota-limiter-configuration-sqlite.yaml (1)

36-48: LGTM!

tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml (1)

31-43: LGTM!

tests/e2e/configuration/library-mode/lightspeed-stack-inline-rag.yaml (1)

30-42: LGTM!

tests/e2e/configuration/library-mode/lightspeed-stack.yaml (1)

24-36: LGTM!

tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml (1)

37-49: LGTM!

tests/e2e/configuration/server-mode/lightspeed-stack-inline-rag.yaml (1)

31-43: LGTM!

tests/e2e/configuration/server-mode/lightspeed-stack.yaml (1)

25-37: LGTM!

tests/e2e/features/inline_rag.feature (1)

25-25: LGTM!

Also applies to: 37-37, 46-46, 58-58, 69-69

tests/unit/app/endpoints/test_rags.py (1)

272-293: LGTM!

Also applies to: 386-387, 394-395

tests/integration/endpoints/test_query_byok_integration.py (1)

259-259: LGTM!

Also applies to: 267-270, 289-300, 454-456, 525-527, 785-788, 914-916, 998-1000, 1095-1097, 1105-1105, 1144-1144, 1190-1192, 1197-1197, 1249-1249, 1292-1295, 1300-1300

tests/integration/endpoints/test_responses_byok_integration.py (1)

119-120: LGTM!

Also applies to: 172-173, 222-223, 230-233, 294-304, 384-386, 446-448, 529-531, 537-537, 563-565, 600-602, 607-607, 647-649, 681-684, 689-689

tests/integration/endpoints/test_streaming_query_byok_integration.py (1)

241-241: LGTM!

Also applies to: 249-251, 266-267, 274-277, 348-350, 412-414, 692-695, 775-777, 854-856, 936-938, 945-945, 976-978, 1014-1016, 1023-1023, 1067-1069, 1102-1105, 1112-1112

tests/unit/models/config/test_byok_rag.py (1)

9-214: LGTM!

tests/unit/models/config/test_dump_configuration.py (1)

14-27: LGTM!

Also applies to: 119-254, 478-628, 738-886, 948-1123, 1208-1341, 1509-1644, 1733-1868, 1957-2092

tests/unit/models/config/test_rag_configuration.py (1)

10-17: LGTM!

Also applies to: 20-83, 91-97, 99-112, 114-126, 128-141, 143-177, 185-206, 217-233

tests/unit/telemetry/conftest.py (1)

15-32: LGTM!

Also applies to: 295-316, 391-412

tests/unit/test_configuration.py (1)

1004-1210: LGTM!

Also applies to: 1415-3790

tests/unit/test_llama_stack_configuration.py (1)

265-265: LGTM!

Also applies to: 282-282, 312-312, 540-540, 576-589

tests/unit/utils/test_responses.py (1)

59-59: LGTM!

Also applies to: 1709-1715, 1779-1785, 1803-1809, 1837-1843, 1864-1872, 1895-1900, 1917-1920, 1939-1944, 1968-1974, 3466-3466, 3491-3491, 3528-3528, 3587-3587

tests/unit/utils/test_vector_search.py (1)

25-25: LGTM!

Also applies to: 521-523, 540-543, 580-583, 606-609, 624-630, 670-672, 690-692, 706-707, 716-717, 730-731, 748-749, 763-764, 778-779, 797-803, 821-827, 862-868, 917-923

src/constants.py (1)

172-176: LGTM!

Also applies to: 193-197

src/models/config.py (1)

1765-1791: LGTM!

Also applies to: 1950-1994, 2003-2084, 2294-2299, 2419-2447

src/models/api/responses/successful/configuration.py (1)

82-94: LGTM!

docs/openapi.json (1)

6762-6781: Already covered: regenerate the checked-in OpenAPI artifact.

CI is still reporting docs/openapi.json as out of date relative to the generated schema, so these contract edits will keep drifting until the file is regenerated from the source models.

Also applies to: 11828-11849, 12097-12100, 12183-12202, 14751-14762, 17761-17838, 18745-18784

src/app/endpoints/rags.py (1)

27-27: LGTM!

Also applies to: 169-170

src/configuration.py (1)

478-482: LGTM!

Also applies to: 504-517, 532-535, 549-549

src/llama_stack_configuration.py (1)

20-29: LGTM!

Also applies to: 350-365, 608-619

src/client.py (1)

94-101: LGTM!

src/utils/responses.py (1)

110-110: LGTM!

Also applies to: 242-256, 626-676, 691-696, 1735-1737, 1830-1832

src/utils/vector_search.py (1)

253-253: LGTM!

Also applies to: 461-481, 653-657


Walkthrough

Refactors the RAG configuration into a unified nested rag object (with byok, okp, retrieval), renames ByokRagRagStore (adds backend), adds retrieval strategy models, and rewires configuration access, enrichment, runtime vector-search limits, examples, tests, and docs to the new schema.

Changes

Unified RAG Configuration Refactor

Layer / File(s) Summary
Schema, models, OpenAPI, constants
src/models/config.py, docs/openapi.json, src/constants.py
ByokRagRagStore with backend; added RetrievalStrategyConfiguration, RetrievalConfiguration, ByokConfiguration; RagConfiguration now nests byok, okp, retrieval; OpenAPI components and defaults updated; chunk constants renamed to DEFAULT_*_RAG_MAX_CHUNKS.
Llama-stack provider & generation wiring
src/llama_stack_configuration.py
Added BACKEND_TO_LLAMA_STACK_PROVIDER, import-time backend validation; BYOK provider construction uses per-store backend and sets persistence namespace vector_io::<backend>; generate_configuration reads config.rag.byok.stores.
Configuration access and client enrichment
src/configuration.py, src/client.py
AppConfig accessors (okp, rag_id_mapping, score_multiplier_mapping, inline_solr_enabled) now read nested configuration.rag.*; client enrichment reads config.rag.byok.stores and constructs retrieval inputs from config.rag.retrieval.
Runtime vector-search, responses, endpoints
src/utils/vector_search.py, src/utils/responses.py, src/app/endpoints/rags.py
Runtime reads chunk limits and sources from configuration.rag.* (inline/byok/okp/max_chunks); BYOK selection uses rag.retrieval.inline.sources; vector-store ID translation resolves against rag.byok.stores; helper renamed _fetch_okp_rag; BYOK types in signatures updated.
Examples, tests, and docs
examples/*.yaml, tests/*, docs/byok_guide.md, docs/rag_guide.md
All example/test YAMLs migrated from byok_rag to rag.byok.stores (adding backend where applicable); tests/fixtures updated for new schema and DEFAULT_*_RAG_MAX_CHUNKS; docs add BYOK prioritization and inline RAG flow diagrams and update pgvector/run.yaml guidance.

Estimated code review effort:
🎯 4 (Complex) | ⏱️ ~60 minutes

  • Suggested reviewers:
    • tisnik
    • syedriko
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/models/config.py (1)

1765-2053: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Complete the new model docstrings to match the repo standard.

These new config models only have one-line summaries. Please expand RagStore, RetrievalStrategyConfiguration, RetrievalConfiguration, ByokConfiguration, and RagConfiguration with full Google-style class docstrings, especially an Attributes section.

As per coding guidelines, src/models/**/*.py: Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/models/config.py` around lines 1765 - 2053, Update the one-line class
docstrings for RagStore, RetrievalStrategyConfiguration, RetrievalConfiguration,
ByokConfiguration, and RagConfiguration to full Google-style docstrings: include
a short summary, a longer description, and an Attributes section listing each
model field (e.g., RagStore: rag_id, backend, embedding_model,
embedding_dimension, vector_db_id, db_path, score_multiplier;
RetrievalStrategyConfiguration: sources, max_chunks; RetrievalConfiguration:
inline, tool; ByokConfiguration: max_chunks, stores; RagConfiguration: byok,
okp, retrieval) with types and brief descriptions; ensure the docstrings follow
the repo guideline (Google-style) and include the required sections
(Parameters/Returns/Raises where applicable and an Attributes section for the
class fields).
src/utils/vector_search.py (1)

574-595: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pass rag.okp.max_chunks into the Solr query itself.

limit is only applied after the query returns. Because _build_query_params() still uses its default k, any rag.okp.max_chunks value above that default can never be returned, so the new config is only partially honored.

Suggested fix
-            params = _build_query_params(solr)
+            params = _build_query_params(solr, k=limit)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/vector_search.py` around lines 574 - 595, The code reads
configuration.rag.okp.max_chunks into limit but doesn't pass it into the Solr
query, so increase the returned chunk count by injecting that limit into the
query params: update the call site around _build_query_params(solr) to supply
the configured limit (configuration.rag.okp.max_chunks / local variable limit)
into the params (e.g., set params["k"] or the appropriate param name used by
Solr) before calling client.vector_io.query(vector_store_id=vector_store_id,
query=query, params=params), ensuring _build_query_params or the params dict
reflects the desired max_chunks.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/openapi.json`:
- Around line 6762-6781: The checked-in OpenAPI artifact is stale: regenerate
the OpenAPI JSON from the current source models so the "rag" schema block (the
byok/okp/retrieval sections shown in the diff) and the other affected sections
(around the noted ranges) match the generated contract; run the project's
OpenAPI generation script/command used in CI, replace the existing openapi.json
with the regenerated output, and commit the updated artifact so the schema in
the repo matches the canonical models.

In `@src/llama_stack_configuration.py`:
- Around line 20-23: Add a module-level validation after
BACKEND_TO_LLAMA_STACK_PROVIDER that imports DEFAULT_RAG_BACKEND and checks it
exists as a key in BACKEND_TO_LLAMA_STACK_PROVIDER; if not, raise a clear
ValueError (or AssertionError) explaining the invalid default and listing
supported keys. Reference the symbols BACKEND_TO_LLAMA_STACK_PROVIDER and
DEFAULT_RAG_BACKEND and perform the check at import time so misconfiguration
surfaces early.

In `@src/models/config.py`:
- Around line 1775-1779: The backend Field currently accepts any string
(backend) and should be constrained at the schema boundary: replace the
unconstrained str with a stricter type or add a Pydantic validator that checks
the value against the canonical supported-backend set (e.g.
constants.SUPPORTED_RAG_BACKENDS or a new enum/typing.Literal list) and raise a
ValidationError if it is not present; keep the default as
constants.DEFAULT_RAG_BACKEND and ensure the model-level validator or Enum
update also updates the generated schema so invalid YAML fails fast.
- Around line 2029-2053: Add a Pydantic validation on RagConfiguration that
checks retrieval.inline.sources and retrieval.tool.sources against the declared
set of allowed IDs (collect byok.stores[].rag_id plus constants.OKP_RAG_ID); if
any source id is not in that set raise a clear ValueError listing the unknown
ids so configuration loading fails fast; implement this as a class validator
(e.g., `@root_validator` or `@field_validator`) inside RagConfiguration referencing
the retrieval, byok, and constants.OKP_RAG_ID symbols and include both inline
and tool checks in one pass.

In `@src/utils/vector_search.py`:
- Around line 472-484: The current code filters incoming vector_store_ids
against configuration.configuration.rag.retrieval.inline.sources before mapping
user-facing rag_ids to llama-stack ids, which causes raw llama-stack
vector_db_ids to be dropped; instead, first call resolve_vector_store_ids(...)
to translate incoming vector_store_ids (or the default
configuration.configuration.rag.retrieval.inline.sources when None) into
llama-stack ids using resolve_vector_store_ids, then compute rag_ids_to_query /
vector_store_ids_to_query by intersecting the resolved ids with
configuration.configuration.rag.retrieval.inline.sources so inline-enabled BYOK
stores passed as raw ids are preserved (update references: vector_store_ids,
rag_ids_to_query, resolve_vector_store_ids,
configuration.configuration.rag.retrieval.inline.sources).

---

Outside diff comments:
In `@src/models/config.py`:
- Around line 1765-2053: Update the one-line class docstrings for RagStore,
RetrievalStrategyConfiguration, RetrievalConfiguration, ByokConfiguration, and
RagConfiguration to full Google-style docstrings: include a short summary, a
longer description, and an Attributes section listing each model field (e.g.,
RagStore: rag_id, backend, embedding_model, embedding_dimension, vector_db_id,
db_path, score_multiplier; RetrievalStrategyConfiguration: sources, max_chunks;
RetrievalConfiguration: inline, tool; ByokConfiguration: max_chunks, stores;
RagConfiguration: byok, okp, retrieval) with types and brief descriptions;
ensure the docstrings follow the repo guideline (Google-style) and include the
required sections (Parameters/Returns/Raises where applicable and an Attributes
section for the class fields).

In `@src/utils/vector_search.py`:
- Around line 574-595: The code reads configuration.rag.okp.max_chunks into
limit but doesn't pass it into the Solr query, so increase the returned chunk
count by injecting that limit into the query params: update the call site around
_build_query_params(solr) to supply the configured limit
(configuration.rag.okp.max_chunks / local variable limit) into the params (e.g.,
set params["k"] or the appropriate param name used by Solr) before calling
client.vector_io.query(vector_store_id=vector_store_id, query=query,
params=params), ensuring _build_query_params or the params dict reflects the
desired max_chunks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 497ca0e6-3b47-447b-89c3-24a6a5d13a3f

📥 Commits

Reviewing files that changed from the base of the PR and between 388ea35 and 9df96fe.

📒 Files selected for processing (32)
  • docs/byok_guide.md
  • docs/openapi.json
  • docs/rag_guide.md
  • examples/lightspeed-stack-byok-okp-rag.yaml
  • examples/quota-limiter-configuration-sqlite.yaml
  • src/app/endpoints/rags.py
  • src/client.py
  • src/configuration.py
  • src/constants.py
  • src/llama_stack_configuration.py
  • src/models/api/responses/successful/configuration.py
  • src/models/config.py
  • src/utils/responses.py
  • src/utils/vector_search.py
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-inline-rag.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-inline-rag.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack.yaml
  • tests/integration/endpoints/test_query_byok_integration.py
  • tests/integration/endpoints/test_responses_byok_integration.py
  • tests/integration/endpoints/test_streaming_query_byok_integration.py
  • tests/unit/app/endpoints/test_rags.py
  • tests/unit/models/config/test_byok_rag.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/models/config/test_rag_configuration.py
  • tests/unit/telemetry/conftest.py
  • tests/unit/test_configuration.py
  • tests/unit/test_llama_stack_configuration.py
  • tests/unit/utils/test_responses.py
  • tests/unit/utils/test_vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: build-pr
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (7)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

  • src/client.py
  • src/models/api/responses/successful/configuration.py
  • src/app/endpoints/rags.py
  • src/configuration.py
  • src/constants.py
  • src/llama_stack_configuration.py
  • src/utils/responses.py
  • src/utils/vector_search.py
  • src/models/config.py
src/**/configuration.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/configuration.py: All config models must extend ConfigurationBase with extra="forbid" to reject unknown fields
Use @field_validator and @model_validator for custom validation in Pydantic models

Files:

  • src/models/api/responses/successful/configuration.py
  • src/configuration.py
src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models must use @model_validator and @field_validator for validation and complete type annotations for all attributes, avoiding Any type

Files:

  • src/models/api/responses/successful/configuration.py
  • src/models/config.py
src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/app/**/*.py: FastAPI dependencies: Import from fastapi module for APIRouter, HTTPException, Request, status, Depends
Use FastAPI HTTPException with appropriate status codes for API endpoints and handle APIConnectionError from Llama Stack

Files:

  • src/app/endpoints/rags.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

  • tests/unit/app/endpoints/test_rags.py
  • tests/unit/models/config/test_rag_configuration.py
  • tests/unit/telemetry/conftest.py
  • tests/unit/models/config/test_byok_rag.py
  • tests/unit/test_llama_stack_configuration.py
  • tests/unit/utils/test_vector_search.py
  • tests/integration/endpoints/test_streaming_query_byok_integration.py
  • tests/unit/utils/test_responses.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/integration/endpoints/test_query_byok_integration.py
  • tests/unit/test_configuration.py
  • tests/integration/endpoints/test_responses_byok_integration.py
src/constants.py

📄 CodeRabbit inference engine (AGENTS.md)

Use constants.py for shared constants with descriptive comments and type hints using Final[type]

Files:

  • src/constants.py
tests/**/conftest.py

📄 CodeRabbit inference engine (AGENTS.md)

Use conftest.py for shared pytest fixtures and pytest-mock for AsyncMock objects

Files:

  • tests/unit/telemetry/conftest.py
🧠 Learnings (13)
📚 Learning: 2026-05-20T08:09:30.641Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:30.641Z
Learning: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set `provider_shield_id` to the *guard model identifier* (e.g., `meta-llama/Llama-Guard-3-8B`). Do not use a chat/generative model id (e.g., `openai/gpt-4o-mini`): a chat-model id (or `native_override`) indicates only an override landed and does **not** mean the safety shield is actually gating queries. Ensure any E2E coverage for the related implementation (JIRA/E2E tests) exercises a real Llama Guard model to verify that the shield is effective.

Applied to files:

  • examples/quota-limiter-configuration-sqlite.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
  • examples/lightspeed-stack-byok-okp-rag.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-inline-rag.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-inline-rag.yaml
📚 Learning: 2026-05-20T08:09:36.724Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: src/client.py:104-108
Timestamp: 2026-05-20T08:09:36.724Z
Learning: In the lightspeed-stack repo, the synthesized `run.yaml` file handling in `src/client.py` (`_synthesize_library_config`) uses a fixed `/tmp` path intentionally in the PoC (PR `#1580`). The durable production requirements are tracked in spec doc R10 (docs/design/llama-stack-config-merge/llama-stack-config-merge.md): persistent known path overwritten each boot, file mode 0600 set via explicit create flag (not umask), and a `--synthesized-config-output` CLI flag for debugging. The PoC code is scheduled for removal pre-merge; the implementation JIRA "Unified llama_stack.config schema + synthesizer" inherits R10's requirements.

Applied to files:

  • tests/e2e/configuration/server-mode/lightspeed-stack.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack.yaml
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.

Applied to files:

  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
📚 Learning: 2025-09-02T11:15:02.411Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/test_list.txt:2-3
Timestamp: 2025-09-02T11:15:02.411Z
Learning: In the lightspeed-stack e2e tests, the Authorized tag is intentionally omitted from noop authentication tests because they are designed to test against the default lightspeed-stack.yaml configuration rather than the specialized noop-with-token configuration.

Applied to files:

  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/api/responses/successful/configuration.py
  • src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/api/responses/successful/configuration.py
  • src/models/config.py
📚 Learning: 2026-03-17T11:34:53.242Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1335
File: docs/config.md:411-411
Timestamp: 2026-03-17T11:34:53.242Z
Learning: In the lightspeed-stack project (`src/models/config.py`, `docs/config.md`), the internal Solr filter `is_chunk:true` (defined as `SOLR_CHUNK_FILTER_QUERY` in `src/constants.py`) is always injected by the system for OKP searches and is intentionally hidden from users. The `chunk_filter_query` field in `OkpConfiguration` is user-facing and additive-only, but the documentation must NOT mention the internal `is_chunk:true` behavior — this is a deliberate design decision by the maintainers.

Applied to files:

  • docs/rag_guide.md
  • docs/openapi.json
  • src/utils/vector_search.py
  • tests/unit/utils/test_vector_search.py
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/rags.py
📚 Learning: 2026-05-06T06:57:52.173Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-05-06T06:57:52.173Z
Learning: Applies to src/constants.py : Use `constants.py` for shared constants with descriptive comments and type hints using `Final[type]`

Applied to files:

  • src/constants.py
📚 Learning: 2026-05-20T08:09:43.391Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: src/llama_stack_configuration.py:651-683
Timestamp: 2026-05-20T08:09:43.391Z
Learning: In `src/llama_stack_configuration.py`, the `apply_high_level_inference` function currently emits `provider_id: p_type` (underscore form, e.g. `sentence_transformers`) directly from the high-level type key, which collides with Llama Stack's hyphenated provider IDs (e.g. `sentence-transformers`). This is a known PoC divergence documented in the spike doc ("Findings discovered during PoC") and tracked in the implementation JIRA "Unified llama_stack.config schema + synthesizer". Decision S5 mandates that each backend-specific synthesizer translates LCORE's canonical type Literal vocabulary to the target backend's expected shape (hyphenated provider_id for Llama Stack; model-string prefixes for Pydantic AI). The PoC code will be removed before merge; the fix belongs in the implementation ticket.

Applied to files:

  • src/constants.py
  • src/llama_stack_configuration.py
  • tests/unit/test_llama_stack_configuration.py
📚 Learning: 2026-04-15T18:54:09.157Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:769-773
Timestamp: 2026-04-15T18:54:09.157Z
Learning: In lightspeed-core/lightspeed-stack (src/models/requests.py and related files), schema-level field size limits (e.g., max_length=65_536, max_length=32_768) are intentionally written as inline numeric literals, not extracted to constants.py. constants.py is reserved for configurable runtime defaults (e.g., DEFAULT_RH_IDENTITY_MAX_HEADER_SIZE, DEFAULT_MAX_FILE_UPLOAD_SIZE). Do not flag inline literals in field validators or Pydantic Field constraints as needing extraction to constants.py.

Applied to files:

  • src/constants.py
📚 Learning: 2025-12-18T10:21:09.038Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:09.038Z
Learning: In Llama Stack version 0.3.x, telemetry provider configuration is not supported under the `providers` section in run.yaml configuration files. Telemetry can be enabled with just `telemetry.enabled: true` without requiring an explicit provider block.

Applied to files:

  • src/llama_stack_configuration.py
📚 Learning: 2026-02-23T14:56:59.186Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.

Applied to files:

  • src/utils/responses.py
🪛 GitHub Actions: OpenAPI (Spectral) / 0_spectral.txt
docs/openapi.json

[error] 1-1: OpenAPI schema is out of date. 'diff -u docs/openapi.json /tmp/openapi-generated.json' failed, indicating the generated schema differs. Regenerate with: 'uv run python scripts/generate_openapi_schema.py docs/openapi.json'.

🪛 GitHub Actions: OpenAPI (Spectral) / spectral
docs/openapi.json

[error] 1-1: CI check failed: docs/openapi.json is out of date compared to the generated schema. Regenerate with: uv run scripts/generate_openapi_schema.py docs/openapi.json

🪛 markdownlint-cli2 (0.22.1)
docs/rag_guide.md

[warning] 42-42: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

🔇 Additional comments (66)
src/models/api/responses/successful/configuration.py (1)

82-94: LGTM!

tests/unit/models/config/test_byok_rag.py (4)

15-37: LGTM!


40-64: LGTM!


67-173: LGTM!


176-201: LGTM!

tests/unit/models/config/test_rag_configuration.py (5)

20-36: LGTM!


38-61: LGTM!


63-83: LGTM!


85-176: LGTM!


178-202: LGTM!

tests/unit/models/config/test_dump_configuration.py (3)

46-266: LGTM!


901-1135: LGTM!


389-2105: LGTM!

src/configuration.py (4)

478-482: LGTM!


492-517: LGTM!


520-535: LGTM!


538-549: LGTM!

tests/unit/telemetry/conftest.py (3)

10-36: LGTM!


295-316: LGTM!


391-412: LGTM!

tests/unit/test_configuration.py (7)

1004-1015: LGTM!


1032-1043: LGTM!


1062-1075: LGTM!


1094-1116: LGTM!


1159-1172: LGTM!


1193-1213: LGTM!


1414-1449: LGTM!

Also applies to: 1702-1743, 1903-1937, 2121-2170, 2335-2378, 2607-2656, 2852-2902, 3067-3117, 3281-3332, 3518-3559, 3748-3789

src/app/endpoints/rags.py (3)

27-27: LGTM!


110-129: LGTM!


168-170: LGTM!

tests/unit/app/endpoints/test_rags.py (3)

272-296: LGTM!


383-388: LGTM!


391-395: LGTM!

tests/unit/utils/test_vector_search.py (9)

521-522: LGTM!


606-609: LGTM!


670-671: LGTM!


690-691: LGTM!


723-731: LGTM!


755-764: LGTM!


862-873: LGTM!


917-927: LGTM!


540-542: ⚖️ Poor tradeoff

Tests’ mixed configuration.rag.* vs rag.* access is intentional

The test setup matches src/utils/vector_search.py: it reads inline BYOK sources/stores from configuration.configuration.rag.*, while it reads BYOK max_chunks from configuration.rag.byok.max_chunks, so the apparent inconsistency is expected.

src/llama_stack_configuration.py (2)

344-363: LGTM!


602-613: LGTM!

src/client.py (2)

94-94: LGTM!


97-101: LGTM!

examples/lightspeed-stack-byok-okp-rag.yaml (1)

38-78: LGTM!

examples/quota-limiter-configuration-sqlite.yaml (1)

36-48: LGTM!

tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml (1)

30-43: LGTM!

tests/e2e/configuration/library-mode/lightspeed-stack-inline-rag.yaml (1)

29-42: LGTM!

tests/e2e/configuration/library-mode/lightspeed-stack.yaml (1)

23-36: LGTM!

tests/unit/test_llama_stack_configuration.py (3)

258-318: LGTM!


538-549: LGTM!


573-614: LGTM!

tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml (1)

36-49: LGTM!

tests/e2e/configuration/server-mode/lightspeed-stack-inline-rag.yaml (1)

30-43: LGTM!

tests/e2e/configuration/server-mode/lightspeed-stack.yaml (1)

24-37: LGTM!

tests/integration/endpoints/test_query_byok_integration.py (3)

246-271: LGTM!


274-301: LGTM!


429-455: LGTM!

Also applies to: 498-526, 578-618, 621-662, 768-787, 886-915, 971-999, 1074-1105, 1144-1144, 1164-1197, 1249-1249, 1272-1300

tests/integration/endpoints/test_responses_byok_integration.py (1)

103-154: LGTM!

Also applies to: 157-196, 205-268, 277-350, 359-411, 420-501, 510-571, 575-658, 662-719

tests/integration/endpoints/test_streaming_query_byok_integration.py (1)

232-252: LGTM!

Also applies to: 255-278, 287-320, 323-382, 386-456, 460-491, 495-533, 537-581, 590-623, 627-665, 674-738, 747-819, 828-907, 916-984, 988-1078, 1082-1147

docs/byok_guide.md (2)

81-119: LGTM!


193-193: ⚡ Quick win

Embedding model download command aligns with rag-content tooling (uv, not pdm).

File: docs/byok_guide.md (line 193)

uv run python ./scripts/download_embeddings_model.py -l ./embeddings_model/ -r sentence-transformers/all-mpnet-base-v2

The rag-content README also instructs using uv run python ./scripts/download_embeddings_model.py with the same -l/-r pattern, so the pdmuv tool change is consistent.

docs/rag_guide.md (2)

37-68: LGTM!


350-361: LGTM!

Comment thread docs/openapi.json
Comment thread src/llama_stack_configuration.py
Comment thread src/models/config.py
Comment thread src/models/config.py
Comment thread src/utils/vector_search.py
Comment thread src/llama_stack_configuration.py Fixed
@are-ces are-ces requested a review from syedriko June 3, 2026 11:26
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
docs/rag_guide.md (1)

102-110: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

FAISS example uses the removed byok_rag/rag_type schema.

Same inconsistency as in byok_guide.md: this example and the others at lines 277-288 (rag.inline/rag.tool + top-level okp:), 318-327, and 389-402, plus the prose at lines 92, 98, and 371, still document the legacy flat schema, while your chunk-volume table at 356-361 already uses the new rag.byok.max_chunks / rag.retrieval.*.max_chunks paths. The enrichment in src/llama_stack_configuration.py only reads rag.byok.stores, rag.retrieval.inline.sources, rag.retrieval.tool.sources, and rag.okp, and backend: faiss replaces rag_type: inline::faiss. Please migrate these examples to the unified schema.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/rag_guide.md` around lines 102 - 110, Update the YAML examples and
accompanying prose to the unified schema: replace the legacy top-level byok_rag
/ rag_type entries with rag.byok.stores entries (use backend: faiss instead of
rag_type: inline::faiss), move per-source settings into
rag.retrieval.inline.sources or rag.retrieval.tool.sources as appropriate, and
use rag.byok.max_chunks and rag.retrieval.*.max_chunks for chunk-volume limits;
also ensure rag.okp remains as the top-level OKP config referenced by the
enrichment code. Locate the example blocks that currently show byok_rag/rag_type
(and the prose referencing those keys) and convert them to the new keys
(rag.byok.stores, rag.retrieval.inline.sources, rag.retrieval.tool.sources,
rag.okp, backend: faiss) and update any comments so they reflect generated
vector_db_id/db_path fields and the new max_chunks paths.
src/llama_stack_configuration.py (1)

315-322: 🧹 Nitpick | 🔵 Trivial | 💤 Low value

Docstring is stale relative to the implementation.

The Returns section states each entry's provider_id is "byok_<vector_db_id>" and provider_type is "set from the RAG item", but the code sets provider_id = f"byok_{rag_id}" (line 344) and derives provider_type from BACKEND_TO_LLAMA_STACK_PROVIDER[backend] (lines 350-356).

📝 Suggested wording fix
-        Each appended entry has `provider_id` set to "byok_<vector_db_id>",
-        `provider_type` set from the RAG item, and a `config` with `persistence`
-        referencing the corresponding backend.
+        Each appended entry has `provider_id` set to "byok_<rag_id>",
+        `provider_type` resolved from the entry's `backend` via
+        `BACKEND_TO_LLAMA_STACK_PROVIDER`, and a `config` with `persistence`
+        referencing the corresponding storage backend.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/llama_stack_configuration.py` around lines 315 - 322, The Returns
docstring in llama_stack_configuration.py is outdated: it claims provider_id is
"byok_<vector_db_id>" and provider_type is "set from the RAG item", but the
implementation builds provider_id using rag_id (provider_id = f"byok_{rag_id}")
and derives provider_type via BACKEND_TO_LLAMA_STACK_PROVIDER[backend]; update
the Returns description to state provider_id is "byok_<rag_id>" and that
provider_type is resolved from BACKEND_TO_LLAMA_STACK_PROVIDER using the backend
value so the docstring matches the logic in the function that appends entries
from byok_rag.
docs/byok_guide.md (1)

206-279: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Update BYOK guide examples to match the new nested rag schema
docs/byok_guide.md Step 4/5 still shows the legacy top-level byok_rag list with rag_type and top-level okp, but the code/tested config shape uses nested rag.byok.stores and a backend field (tests build vector_io providers from BYOK entries containing backend and emit provider_type == "inline::faiss"). Copying the current docs wouldn’t match the new paths used by generate_configuration (rag.byok.stores, rag.retrieval.inline.sources, rag.retrieval.tool.sources, rag.okp).

  • Migrate all Step 4/Step 5 snippets, field reference table, and prose around lines ~202/261-262 to the new keys.
📝 Illustrative migration for the Step 4/Step 5 snippets
-byok_rag:
-  - rag_id: my-docs
-    rag_type: inline::faiss
-    embedding_model: sentence-transformers/all-mpnet-base-v2
-    embedding_dimension: 768
-    vector_db_id: vs_8c94967b-81cc-4028-a294-9cfac6fd9ae2
-    db_path: /path/to/vector_db/faiss_store.db
-    score_multiplier: 1.0
+rag:
+  byok:
+    stores:
+      - rag_id: my-docs
+        backend: faiss
+        embedding_model: sentence-transformers/all-mpnet-base-v2
+        embedding_dimension: 768
+        vector_db_id: vs_8c94967b-81cc-4028-a294-9cfac6fd9ae2
+        db_path: /path/to/vector_db/faiss_store.db
+        score_multiplier: 1.0
+  retrieval:
+    inline:
+      sources:
+        - my-docs
+        - okp
+    tool:
+      sources:
+        - my-docs
+        - okp
+  okp:
+    offline: true
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/byok_guide.md` around lines 206 - 279, Update the BYOK docs to match the
new nested schema: replace top-level byok_rag with rag.byok.stores entries that
include backend and provider_type (e.g., "provider_type: inline::faiss") and
update all examples and prose to use rag.retrieval.inline.sources and
rag.retrieval.tool.sources for Step 5 and rag.okp for OKP settings; ensure the
field reference table, YAML examples, and any mentions of rag_type/okp at
top-level are migrated to the new keys so generate_configuration will consume
the documented shape.
src/models/config.py (1)

1960-1964: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Tool max_chunks falls back to the inline default on partial config.

If a user sets rag.retrieval.tool.sources but omits max_chunks, Pydantic instantiates RetrievalStrategyConfiguration from that partial dict and line 1961 applies DEFAULT_INLINE_RAG_MAX_CHUNKS. The tool field’s default_factory only runs when the whole tool section is absent, so configured tool RAG silently gets the wrong result limit. src/utils/responses.py then uses configuration.rag.retrieval.tool.max_chunks directly when building the file_search tool, so this is observable at runtime.

Proposed fix
 class RetrievalConfiguration(ConfigurationBase):
     """Configuration for inline and tool retrieval strategies."""
+
+    `@model_validator`(mode="before")
+    `@classmethod`
+    def apply_strategy_defaults(cls, data: Any) -> Any:
+        """Apply per-strategy max_chunks defaults when nested config is partial."""
+        if not isinstance(data, dict):
+            return data
+
+        inline = data.get("inline")
+        if isinstance(inline, dict):
+            inline.setdefault("max_chunks", constants.DEFAULT_INLINE_RAG_MAX_CHUNKS)
+
+        tool = data.get("tool")
+        if isinstance(tool, dict):
+            tool.setdefault("max_chunks", constants.DEFAULT_TOOL_RAG_MAX_CHUNKS)
+
+        return data
 
     inline: RetrievalStrategyConfiguration = Field(
         default_factory=lambda: RetrievalStrategyConfiguration(
             max_chunks=constants.DEFAULT_INLINE_RAG_MAX_CHUNKS,
         ),

Also applies to: 1978-1984

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/models/config.py` around lines 1960 - 1964, The field
RetrievalStrategyConfiguration.max_chunks currently defaults to
constants.DEFAULT_INLINE_RAG_MAX_CHUNKS which causes a partial tool config to
silently inherit the inline default; change max_chunks to be optional (e.g.
Optional[PositiveInt] with default None) so omission is distinguishable from an
explicit value, do the same for the analogous fields reported around lines
1978-1984, and update the consumer in src/utils/responses.py (the file_search
builder) to treat a None max_chunks as “unspecified” and explicitly fall back to
the appropriate parent/global default only when the entire tool section is
absent (rather than using DEFAULT_INLINE_RAG_MAX_CHUNKS on partial configs).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/constants.py`:
- Around line 171-176: SUPPORTED_RAG_BACKENDS is duplicated work and can drift
from BACKEND_TO_LLAMA_STACK_PROVIDER; change SUPPORTED_RAG_BACKENDS to be
derived from the single source BACKEND_TO_LLAMA_STACK_PROVIDER in
llama_stack_configuration.py (i.e., import BACKEND_TO_LLAMA_STACK_PROVIDER and
set SUPPORTED_RAG_BACKENDS = frozenset(BACKEND_TO_LLAMA_STACK_PROVIDER.keys())
or otherwise centralize the canonical mapping so both the model validator and
enrichment layer read the same data). Ensure DEFAULT_RAG_BACKEND remains valid
against the derived set.

---

Outside diff comments:
In `@docs/byok_guide.md`:
- Around line 206-279: Update the BYOK docs to match the new nested schema:
replace top-level byok_rag with rag.byok.stores entries that include backend and
provider_type (e.g., "provider_type: inline::faiss") and update all examples and
prose to use rag.retrieval.inline.sources and rag.retrieval.tool.sources for
Step 5 and rag.okp for OKP settings; ensure the field reference table, YAML
examples, and any mentions of rag_type/okp at top-level are migrated to the new
keys so generate_configuration will consume the documented shape.

In `@docs/rag_guide.md`:
- Around line 102-110: Update the YAML examples and accompanying prose to the
unified schema: replace the legacy top-level byok_rag / rag_type entries with
rag.byok.stores entries (use backend: faiss instead of rag_type: inline::faiss),
move per-source settings into rag.retrieval.inline.sources or
rag.retrieval.tool.sources as appropriate, and use rag.byok.max_chunks and
rag.retrieval.*.max_chunks for chunk-volume limits; also ensure rag.okp remains
as the top-level OKP config referenced by the enrichment code. Locate the
example blocks that currently show byok_rag/rag_type (and the prose referencing
those keys) and convert them to the new keys (rag.byok.stores,
rag.retrieval.inline.sources, rag.retrieval.tool.sources, rag.okp, backend:
faiss) and update any comments so they reflect generated vector_db_id/db_path
fields and the new max_chunks paths.

In `@src/llama_stack_configuration.py`:
- Around line 315-322: The Returns docstring in llama_stack_configuration.py is
outdated: it claims provider_id is "byok_<vector_db_id>" and provider_type is
"set from the RAG item", but the implementation builds provider_id using rag_id
(provider_id = f"byok_{rag_id}") and derives provider_type via
BACKEND_TO_LLAMA_STACK_PROVIDER[backend]; update the Returns description to
state provider_id is "byok_<rag_id>" and that provider_type is resolved from
BACKEND_TO_LLAMA_STACK_PROVIDER using the backend value so the docstring matches
the logic in the function that appends entries from byok_rag.

In `@src/models/config.py`:
- Around line 1960-1964: The field RetrievalStrategyConfiguration.max_chunks
currently defaults to constants.DEFAULT_INLINE_RAG_MAX_CHUNKS which causes a
partial tool config to silently inherit the inline default; change max_chunks to
be optional (e.g. Optional[PositiveInt] with default None) so omission is
distinguishable from an explicit value, do the same for the analogous fields
reported around lines 1978-1984, and update the consumer in
src/utils/responses.py (the file_search builder) to treat a None max_chunks as
“unspecified” and explicitly fall back to the appropriate parent/global default
only when the entire tool section is absent (rather than using
DEFAULT_INLINE_RAG_MAX_CHUNKS on partial configs).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2edf3be1-2f33-48d5-872a-24b0829cb044

📥 Commits

Reviewing files that changed from the base of the PR and between 9df96fe and 2209ac9.

📒 Files selected for processing (9)
  • docs/byok_guide.md
  • docs/rag_guide.md
  • src/constants.py
  • src/llama_stack_configuration.py
  • src/models/config.py
  • tests/e2e/features/inline_rag.feature
  • tests/unit/models/config/test_byok_rag.py
  • tests/unit/models/config/test_rag_configuration.py
  • tests/unit/utils/test_responses.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: build-pr
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (5)
tests/e2e/**/*.{py,feature}

📄 CodeRabbit inference engine (AGENTS.md)

Use behave (BDD) framework for end-to-end testing with Gherkin feature files

Files:

  • tests/e2e/features/inline_rag.feature
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

  • tests/unit/models/config/test_rag_configuration.py
  • tests/unit/models/config/test_byok_rag.py
  • tests/unit/utils/test_responses.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

  • src/constants.py
  • src/llama_stack_configuration.py
  • src/models/config.py
src/constants.py

📄 CodeRabbit inference engine (AGENTS.md)

Use constants.py for shared constants with descriptive comments and type hints using Final[type]

Files:

  • src/constants.py
src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models must use @model_validator and @field_validator for validation and complete type annotations for all attributes, avoiding Any type

Files:

  • src/models/config.py
🧠 Learnings (11)
📓 Common learnings
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1843
File: src/utils/vector_search.py:472-484
Timestamp: 2026-06-03T11:15:38.136Z
Learning: In `src/utils/vector_search.py`, the `_fetch_byok_rag` function's `vector_store_ids` parameter always contains user-facing `rag_id`s, never raw llama-stack `vector_db_id`s. The API contract enforces rag_ids only, so filtering `vector_store_ids` against `configuration.configuration.rag.retrieval.inline.sources` (also rag_ids) before calling `resolve_vector_store_ids` is intentional and correct.
📚 Learning: 2026-06-03T11:15:38.136Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1843
File: src/utils/vector_search.py:472-484
Timestamp: 2026-06-03T11:15:38.136Z
Learning: In `src/utils/vector_search.py`, the `_fetch_byok_rag` function's `vector_store_ids` parameter always contains user-facing `rag_id`s, never raw llama-stack `vector_db_id`s. The API contract enforces rag_ids only, so filtering `vector_store_ids` against `configuration.configuration.rag.retrieval.inline.sources` (also rag_ids) before calling `resolve_vector_store_ids` is intentional and correct.

Applied to files:

  • tests/unit/models/config/test_rag_configuration.py
  • docs/byok_guide.md
  • docs/rag_guide.md
  • src/constants.py
  • src/llama_stack_configuration.py
  • tests/unit/models/config/test_byok_rag.py
  • tests/unit/utils/test_responses.py
  • src/models/config.py
📚 Learning: 2025-12-18T10:21:09.038Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:09.038Z
Learning: In Llama Stack version 0.3.x, telemetry provider configuration is not supported under the `providers` section in run.yaml configuration files. Telemetry can be enabled with just `telemetry.enabled: true` without requiring an explicit provider block.

Applied to files:

  • docs/byok_guide.md
📚 Learning: 2025-08-18T10:58:14.951Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:47-47
Timestamp: 2025-08-18T10:58:14.951Z
Learning: psycopg2-binary is required by some llama-stack providers in the lightspeed-stack project, so it cannot be replaced with psycopg v3 or moved to optional dependencies without breaking llama-stack functionality.

Applied to files:

  • docs/byok_guide.md
  • docs/rag_guide.md
📚 Learning: 2026-05-20T08:09:36.724Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: src/client.py:104-108
Timestamp: 2026-05-20T08:09:36.724Z
Learning: In the lightspeed-stack repo, the synthesized `run.yaml` file handling in `src/client.py` (`_synthesize_library_config`) uses a fixed `/tmp` path intentionally in the PoC (PR `#1580`). The durable production requirements are tracked in spec doc R10 (docs/design/llama-stack-config-merge/llama-stack-config-merge.md): persistent known path overwritten each boot, file mode 0600 set via explicit create flag (not umask), and a `--synthesized-config-output` CLI flag for debugging. The PoC code is scheduled for removal pre-merge; the implementation JIRA "Unified llama_stack.config schema + synthesizer" inherits R10's requirements.

Applied to files:

  • docs/byok_guide.md
📚 Learning: 2026-05-06T06:57:52.173Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-05-06T06:57:52.173Z
Learning: Applies to src/constants.py : Use `constants.py` for shared constants with descriptive comments and type hints using `Final[type]`

Applied to files:

  • src/constants.py
📚 Learning: 2026-05-20T08:09:43.391Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: src/llama_stack_configuration.py:651-683
Timestamp: 2026-05-20T08:09:43.391Z
Learning: In `src/llama_stack_configuration.py`, the `apply_high_level_inference` function currently emits `provider_id: p_type` (underscore form, e.g. `sentence_transformers`) directly from the high-level type key, which collides with Llama Stack's hyphenated provider IDs (e.g. `sentence-transformers`). This is a known PoC divergence documented in the spike doc ("Findings discovered during PoC") and tracked in the implementation JIRA "Unified llama_stack.config schema + synthesizer". Decision S5 mandates that each backend-specific synthesizer translates LCORE's canonical type Literal vocabulary to the target backend's expected shape (hyphenated provider_id for Llama Stack; model-string prefixes for Pydantic AI). The PoC code will be removed before merge; the fix belongs in the implementation ticket.

Applied to files:

  • src/llama_stack_configuration.py
  • src/models/config.py
📚 Learning: 2026-05-20T08:09:34.319Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:34.319Z
Learning: In the lightspeed-stack project (LCORE-836 unified config), the `provider_shield_id` for a Llama Guard safety shield entry must be a guard model identifier (e.g., `meta-llama/Llama-Guard-3-8B`), not a chat model id like `openai/gpt-4o-mini`. Using a chat model id there only means the native_override key landed — it does not mean the safety shield is actually gating queries. E2E tests for the implementation JIRA must exercise a real Llama Guard model to validate the shield.

Applied to files:

  • src/llama_stack_configuration.py
📚 Learning: 2026-04-15T18:54:09.157Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:769-773
Timestamp: 2026-04-15T18:54:09.157Z
Learning: In lightspeed-core/lightspeed-stack (src/models/requests.py and related files), schema-level field size limits (e.g., max_length=65_536, max_length=32_768) are intentionally written as inline numeric literals, not extracted to constants.py. constants.py is reserved for configurable runtime defaults (e.g., DEFAULT_RH_IDENTITY_MAX_HEADER_SIZE, DEFAULT_MAX_FILE_UPLOAD_SIZE). Do not flag inline literals in field validators or Pydantic Field constraints as needing extraction to constants.py.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/config.py
🔇 Additional comments (2)
src/llama_stack_configuration.py (2)

25-29: LGTM!


607-619: LGTM!

Comment thread src/constants.py
Comment on lines 171 to +176
# BYOK RAG
# Default RAG type for bring-your-own-knowledge RAG configurations, that type
# needs to be supported by Llama Stack
DEFAULT_RAG_TYPE: Final[str] = "inline::faiss"
# Backends that have enrichment support in llama_stack_configuration.py
SUPPORTED_RAG_BACKENDS: Final[frozenset[str]] = frozenset({"faiss"})

# Default RAG backend for bring-your-own-knowledge RAG configurations
DEFAULT_RAG_BACKEND: Final[str] = "faiss"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | 💤 Low value

Dual source of truth for supported backends.

SUPPORTED_RAG_BACKENDS here and BACKEND_TO_LLAMA_STACK_PROVIDER in src/llama_stack_configuration.py independently enumerate the supported backends (both currently {"faiss"}). The comment even states this set mirrors "enrichment support in llama_stack_configuration.py". If one is updated (e.g. enabling pgvector) without the other, the model validator and the enrichment layer will disagree. Consider deriving the set from a single mapping to keep them in lockstep.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/constants.py` around lines 171 - 176, SUPPORTED_RAG_BACKENDS is
duplicated work and can drift from BACKEND_TO_LLAMA_STACK_PROVIDER; change
SUPPORTED_RAG_BACKENDS to be derived from the single source
BACKEND_TO_LLAMA_STACK_PROVIDER in llama_stack_configuration.py (i.e., import
BACKEND_TO_LLAMA_STACK_PROVIDER and set SUPPORTED_RAG_BACKENDS =
frozenset(BACKEND_TO_LLAMA_STACK_PROVIDER.keys()) or otherwise centralize the
canonical mapping so both the model validator and enrichment layer read the same
data). Ensure DEFAULT_RAG_BACKEND remains valid against the derived set.

Comment thread src/utils/vector_search.py Outdated
- referenced_documents: Documents referenced in BYOK RAG results
"""
limit = max_chunks if max_chunks is not None else constants.BYOK_RAG_MAX_CHUNKS
limit = max_chunks if max_chunks is not None else configuration.rag.byok.max_chunks
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's IMO confusing which value the limit is going to get in the end. I'd remove the default for the max_chunks parameter and always passed it from config, which is set to the default unless specified manually.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

Comment thread src/utils/vector_search.py Outdated
solr: Structured Solr inline RAG request from the API (optional).
max_chunks: Maximum number of chunks to return. If None, uses
constants.OKP_RAG_MAX_CHUNKS.
rag.okp.max_chunks from configuration.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no max_chunks arg here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in new commit TY!

are-ces and others added 11 commits June 4, 2026 14:02
Restructure the RAG configuration to consolidate byok_rag, rag, and okp
top-level sections into a single rag section with byok, okp, and
retrieval sub-sections. Make chunk limits user-configurable instead of
hardcoded constants.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use explicit default= keyword in Field() for PositiveInt fields so
type checkers recognize the defaults.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…w diagram

- Use explicit default= keyword in Field() for pyright/mypy compatibility
- Fix byok.max_chunks description: caps total across all stores, not per-store
- Add inline RAG chunk flow Mermaid diagram to RAG guide
- Regenerate OpenAPI schema

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add BYOK content prioritization subsection (score_multiplier + reranker)
- Add inline RAG chunk flow diagram to both guides
- Replace hardcoded constants references with config paths
- Add chunk limits table
- Replace pdm run with uv run

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add SUPPORTED_BACKENDS dict to validate backend types during
  enrichment instead of hardcoding inline:: prefix
- Restore inline comments in reranker auto-enable validator
- Add backward compatibility note to RAG guide

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add SUPPORTED_RAG_BACKENDS constant and field validator on
  RagStore.backend to reject unsupported backends at config load time
- Add model validator on RagConfiguration to reject retrieval source
  IDs not declared in byok.stores
- Add assert that DEFAULT_RAG_BACKEND has a BACKEND_TO_LLAMA_STACK_PROVIDER
  mapping
- Remove llama-stack references from docs, use run.yaml instead
- Add backward compatibility note to RAG guide
- Update tests for new validators

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace assert with runtime ValueError for bandit compliance
- Fix backend="rag" → backend="faiss" in test helper
- Add pylint disable for no-member on Pydantic model validator

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rag to _fetch_okp_rag

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@are-ces are-ces force-pushed the lcore-1426-byok-config-refactoring branch from 2209ac9 to 472397b Compare June 4, 2026 12:03
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/app/endpoints/rags.py (1)

111-125: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Use Google-style docstring sections in this function.

The function docstring is currently NumPy-style; please convert it to Google format (Args, Returns, Raises) to match repo standards.

As per coding guidelines: "Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/app/endpoints/rags.py` around lines 111 - 125, Update the function
docstring that "Resolves a user-facing rag_id to the llama-stack vector_db_id"
to use Google-style sections: replace NumPy-style "Parameters"/"Returns" with
"Args:", "Returns:", and add a "Raises:" section (use "Raises: None" if it
doesn't raise) to match repository docstring conventions; keep the existing
short description and parameter details (rag_id, byok_rags) and ensure types and
return description remain accurate.
src/utils/vector_search.py (1)

569-589: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pass rag.okp.max_chunks into the Solr query.

limit = configuration.rag.okp.max_chunks only affects the post-query slice right now. _build_query_params(solr) still falls back to constants.SOLR_VECTOR_SEARCH_DEFAULT_K, so the new config value never changes how many OKP chunks are actually fetched.

Suggested fix
-            params = _build_query_params(solr)
+            params = _build_query_params(solr, k=limit)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/vector_search.py` around lines 569 - 589, The Solr query currently
ignores configuration.rag.okp.max_chunks because _build_query_params(solr) falls
back to constants.SOLR_VECTOR_SEARCH_DEFAULT_K; pass the configured limit into
the query parameters so the client.vector_io.query actually requests that many
chunks. Either update _build_query_params to accept a limit (e.g.,
_build_query_params(solr, limit)) and use configuration.rag.okp.max_chunks when
building params, or set params['k'] = limit (or the appropriate param key) after
params = _build_query_params(solr) and before calling client.vector_io.query;
reference the limit variable, _build_query_params, params,
client.vector_io.query, and constants.SOLR_VECTOR_SEARCH_DEFAULT_K when making
the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/byok_guide.md`:
- Around line 223-231: The table documents deprecated flat RAG keys (rag_id,
rag_type, embedding_model, embedding_dimension, vector_db_id, db_path,
score_multiplier); update these to the new nested schema using
rag.byok.stores[*] entries and the rag.backend field (replace rag_type →
rag.backend, move rag_id/vector_db_id/db_path/etc. under rag.byok.stores with
per-store keys, and adjust embedding_model/embedding_dimension/score_multiplier
to per-store fields). Ensure all examples and the related occurrences (also at
the later section mentioned) use rag.byok.stores and rag.backend rather than the
old top-level keys so the config matches the new contract.

In `@docs/rag_guide.md`:
- Line 42: Change the heading "Inline RAG chunk flow" from an H3 to H2 to fix
the markdown lint MD001; locate the heading line with the text "Inline RAG chunk
flow" in docs/rag_guide.md and update its markdown header marker to the H2 style
(i.e., use two hash symbols) so the document heading levels are sequential.

In `@src/models/config.py`:
- Around line 1996-2000: Add a post-validation check on the model that declares
the stores: list[RagStore] field to reject duplicate RagStore.rag_id values;
implement a Pydantic validator (e.g., `@validator`("stores") or a `@root_validator`
on that model) that collects rag_id from each RagStore, detects duplicates, and
raises a ValueError listing the duplicated rag_id(s) so validation fails if the
same rag_id appears more than once; reference the stores field and the
RagStore.rag_id attribute when locating where to add this check.

---

Outside diff comments:
In `@src/app/endpoints/rags.py`:
- Around line 111-125: Update the function docstring that "Resolves a
user-facing rag_id to the llama-stack vector_db_id" to use Google-style
sections: replace NumPy-style "Parameters"/"Returns" with "Args:", "Returns:",
and add a "Raises:" section (use "Raises: None" if it doesn't raise) to match
repository docstring conventions; keep the existing short description and
parameter details (rag_id, byok_rags) and ensure types and return description
remain accurate.

In `@src/utils/vector_search.py`:
- Around line 569-589: The Solr query currently ignores
configuration.rag.okp.max_chunks because _build_query_params(solr) falls back to
constants.SOLR_VECTOR_SEARCH_DEFAULT_K; pass the configured limit into the query
parameters so the client.vector_io.query actually requests that many chunks.
Either update _build_query_params to accept a limit (e.g.,
_build_query_params(solr, limit)) and use configuration.rag.okp.max_chunks when
building params, or set params['k'] = limit (or the appropriate param key) after
params = _build_query_params(solr) and before calling client.vector_io.query;
reference the limit variable, _build_query_params, params,
client.vector_io.query, and constants.SOLR_VECTOR_SEARCH_DEFAULT_K when making
the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7791912b-70fc-45af-b294-8bbec271970e

📥 Commits

Reviewing files that changed from the base of the PR and between 2209ac9 and 472397b.

📒 Files selected for processing (33)
  • docs/byok_guide.md
  • docs/openapi.json
  • docs/rag_guide.md
  • examples/lightspeed-stack-byok-okp-rag.yaml
  • examples/quota-limiter-configuration-sqlite.yaml
  • src/app/endpoints/rags.py
  • src/client.py
  • src/configuration.py
  • src/constants.py
  • src/llama_stack_configuration.py
  • src/models/api/responses/successful/configuration.py
  • src/models/config.py
  • src/utils/responses.py
  • src/utils/vector_search.py
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-inline-rag.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-inline-rag.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack.yaml
  • tests/e2e/features/inline_rag.feature
  • tests/integration/endpoints/test_query_byok_integration.py
  • tests/integration/endpoints/test_responses_byok_integration.py
  • tests/integration/endpoints/test_streaming_query_byok_integration.py
  • tests/unit/app/endpoints/test_rags.py
  • tests/unit/models/config/test_byok_rag.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/models/config/test_rag_configuration.py
  • tests/unit/telemetry/conftest.py
  • tests/unit/test_configuration.py
  • tests/unit/test_llama_stack_configuration.py
  • tests/unit/utils/test_responses.py
  • tests/unit/utils/test_vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: build-pr
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (8)
tests/e2e/**/*.{py,feature}

📄 CodeRabbit inference engine (AGENTS.md)

Use behave (BDD) framework for end-to-end testing with Gherkin feature files

Files:

  • tests/e2e/features/inline_rag.feature
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

  • src/client.py
  • src/models/api/responses/successful/configuration.py
  • src/configuration.py
  • src/app/endpoints/rags.py
  • src/llama_stack_configuration.py
  • src/constants.py
  • src/utils/responses.py
  • src/models/config.py
  • src/utils/vector_search.py
src/**/configuration.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/configuration.py: All config models must extend ConfigurationBase with extra="forbid" to reject unknown fields
Use @field_validator and @model_validator for custom validation in Pydantic models

Files:

  • src/models/api/responses/successful/configuration.py
  • src/configuration.py
src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models must use @model_validator and @field_validator for validation and complete type annotations for all attributes, avoiding Any type

Files:

  • src/models/api/responses/successful/configuration.py
  • src/models/config.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

  • tests/unit/telemetry/conftest.py
  • tests/unit/models/config/test_byok_rag.py
  • tests/unit/test_llama_stack_configuration.py
  • tests/integration/endpoints/test_responses_byok_integration.py
  • tests/unit/utils/test_vector_search.py
  • tests/unit/app/endpoints/test_rags.py
  • tests/unit/utils/test_responses.py
  • tests/unit/models/config/test_rag_configuration.py
  • tests/integration/endpoints/test_query_byok_integration.py
  • tests/unit/test_configuration.py
  • tests/integration/endpoints/test_streaming_query_byok_integration.py
  • tests/unit/models/config/test_dump_configuration.py
tests/**/conftest.py

📄 CodeRabbit inference engine (AGENTS.md)

Use conftest.py for shared pytest fixtures and pytest-mock for AsyncMock objects

Files:

  • tests/unit/telemetry/conftest.py
src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/app/**/*.py: FastAPI dependencies: Import from fastapi module for APIRouter, HTTPException, Request, status, Depends
Use FastAPI HTTPException with appropriate status codes for API endpoints and handle APIConnectionError from Llama Stack

Files:

  • src/app/endpoints/rags.py
src/constants.py

📄 CodeRabbit inference engine (AGENTS.md)

Use constants.py for shared constants with descriptive comments and type hints using Final[type]

Files:

  • src/constants.py
🧠 Learnings (21)
📓 Common learnings
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1843
File: src/utils/vector_search.py:472-484
Timestamp: 2026-06-03T11:15:38.136Z
Learning: In `src/utils/vector_search.py`, the `_fetch_byok_rag` function's `vector_store_ids` parameter always contains user-facing `rag_id`s, never raw llama-stack `vector_db_id`s. The API contract enforces rag_ids only, so filtering `vector_store_ids` against `configuration.configuration.rag.retrieval.inline.sources` (also rag_ids) before calling `resolve_vector_store_ids` is intentional and correct.
📚 Learning: 2026-06-03T11:15:38.136Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1843
File: src/utils/vector_search.py:472-484
Timestamp: 2026-06-03T11:15:38.136Z
Learning: In `src/utils/vector_search.py`, the `_fetch_byok_rag` function's `vector_store_ids` parameter always contains user-facing `rag_id`s, never raw llama-stack `vector_db_id`s. The API contract enforces rag_ids only, so filtering `vector_store_ids` against `configuration.configuration.rag.retrieval.inline.sources` (also rag_ids) before calling `resolve_vector_store_ids` is intentional and correct.

Applied to files:

  • tests/e2e/configuration/server-mode/lightspeed-stack-inline-rag.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
  • examples/quota-limiter-configuration-sqlite.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-inline-rag.yaml
  • docs/rag_guide.md
  • examples/lightspeed-stack-byok-okp-rag.yaml
  • src/models/api/responses/successful/configuration.py
  • src/configuration.py
  • docs/byok_guide.md
  • src/app/endpoints/rags.py
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
  • src/llama_stack_configuration.py
  • src/constants.py
  • src/utils/responses.py
  • tests/unit/models/config/test_byok_rag.py
  • tests/unit/test_llama_stack_configuration.py
  • tests/integration/endpoints/test_responses_byok_integration.py
  • tests/unit/utils/test_vector_search.py
  • tests/unit/app/endpoints/test_rags.py
  • tests/unit/utils/test_responses.py
  • tests/unit/models/config/test_rag_configuration.py
  • tests/integration/endpoints/test_query_byok_integration.py
  • src/models/config.py
  • tests/unit/test_configuration.py
  • docs/openapi.json
  • tests/integration/endpoints/test_streaming_query_byok_integration.py
  • tests/unit/models/config/test_dump_configuration.py
  • src/utils/vector_search.py
📚 Learning: 2026-05-20T08:09:30.641Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:30.641Z
Learning: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set `provider_shield_id` to the *guard model identifier* (e.g., `meta-llama/Llama-Guard-3-8B`). Do not use a chat/generative model id (e.g., `openai/gpt-4o-mini`): a chat-model id (or `native_override`) indicates only an override landed and does **not** mean the safety shield is actually gating queries. Ensure any E2E coverage for the related implementation (JIRA/E2E tests) exercises a real Llama Guard model to verify that the shield is effective.

Applied to files:

  • tests/e2e/configuration/server-mode/lightspeed-stack-inline-rag.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
  • examples/quota-limiter-configuration-sqlite.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-inline-rag.yaml
  • examples/lightspeed-stack-byok-okp-rag.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.

Applied to files:

  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
📚 Learning: 2025-09-02T11:15:02.411Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/test_list.txt:2-3
Timestamp: 2025-09-02T11:15:02.411Z
Learning: In the lightspeed-stack e2e tests, the Authorized tag is intentionally omitted from noop authentication tests because they are designed to test against the default lightspeed-stack.yaml configuration rather than the specialized noop-with-token configuration.

Applied to files:

  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
📚 Learning: 2026-05-06T06:57:52.173Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-05-06T06:57:52.173Z
Learning: Applies to src/**/*.py : Llama Stack imports: Use `from llama_stack_client import AsyncLlamaStackClient`

Applied to files:

  • src/client.py
📚 Learning: 2026-05-20T08:09:36.724Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: src/client.py:104-108
Timestamp: 2026-05-20T08:09:36.724Z
Learning: In the lightspeed-stack repo, the synthesized `run.yaml` file handling in `src/client.py` (`_synthesize_library_config`) uses a fixed `/tmp` path intentionally in the PoC (PR `#1580`). The durable production requirements are tracked in spec doc R10 (docs/design/llama-stack-config-merge/llama-stack-config-merge.md): persistent known path overwritten each boot, file mode 0600 set via explicit create flag (not umask), and a `--synthesized-config-output` CLI flag for debugging. The PoC code is scheduled for removal pre-merge; the implementation JIRA "Unified llama_stack.config schema + synthesizer" inherits R10's requirements.

Applied to files:

  • tests/e2e/configuration/server-mode/lightspeed-stack.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack.yaml
  • docs/byok_guide.md
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
📚 Learning: 2026-04-13T13:39:59.316Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1490
File: tests/e2e/features/environment.py:206-211
Timestamp: 2026-04-13T13:39:59.316Z
Learning: In lightspeed-stack e2e tests (tests/e2e/features/), `context.feature_config` is intentionally set inside Background/step functions (scenario-scoped Behave layer). The `after_scenario` restore logic in `environment.py` only restores config when `context.scenario_lightspeed_override_active` is True, which is only set by `configure_service` when an actual config switch occurs. The module-level `_active_lightspeed_stack_config_basename` in `tests/e2e/features/steps/common.py` prevents re-applying the same config in subsequent scenarios (making `scenario_lightspeed_override_active` stay False). This means the ephemeral nature of step-set context attributes is intentional — the design ensures config restore happens exactly once per actual switch, not redundantly on every scenario.

Applied to files:

  • tests/e2e/configuration/server-mode/lightspeed-stack.yaml
📚 Learning: 2025-08-18T10:58:14.951Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:47-47
Timestamp: 2025-08-18T10:58:14.951Z
Learning: psycopg2-binary is required by some llama-stack providers in the lightspeed-stack project, so it cannot be replaced with psycopg v3 or moved to optional dependencies without breaking llama-stack functionality.

Applied to files:

  • docs/rag_guide.md
  • docs/byok_guide.md
📚 Learning: 2026-03-17T11:34:53.242Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1335
File: docs/config.md:411-411
Timestamp: 2026-03-17T11:34:53.242Z
Learning: In the lightspeed-stack project (`src/models/config.py`, `docs/config.md`), the internal Solr filter `is_chunk:true` (defined as `SOLR_CHUNK_FILTER_QUERY` in `src/constants.py`) is always injected by the system for OKP searches and is intentionally hidden from users. The `chunk_filter_query` field in `OkpConfiguration` is user-facing and additive-only, but the documentation must NOT mention the internal `is_chunk:true` behavior — this is a deliberate design decision by the maintainers.

Applied to files:

  • docs/rag_guide.md
  • tests/unit/utils/test_vector_search.py
  • docs/openapi.json
  • src/utils/vector_search.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/api/responses/successful/configuration.py
  • src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/api/responses/successful/configuration.py
  • src/models/config.py
📚 Learning: 2025-12-18T10:21:09.038Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:09.038Z
Learning: In Llama Stack version 0.3.x, telemetry provider configuration is not supported under the `providers` section in run.yaml configuration files. Telemetry can be enabled with just `telemetry.enabled: true` without requiring an explicit provider block.

Applied to files:

  • docs/byok_guide.md
📚 Learning: 2026-04-06T20:18:07.852Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1463
File: src/app/endpoints/rlsapi_v1.py:266-271
Timestamp: 2026-04-06T20:18:07.852Z
Learning: In the lightspeed-stack codebase, within `src/app/endpoints/` inference/MCP endpoints, treat `tools: Optional[list[Any]]` in MCP tool definitions as an intentional, consistent typing pattern (used across `query`, `responses`, `streaming_query`, `rlsapi_v1`). Do not raise or suggest this as a typing issue during code review; changing it in isolation could break endpoint typing consistency across the codebase.

Applied to files:

  • src/app/endpoints/rags.py
📚 Learning: 2026-05-20T08:09:43.391Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: src/llama_stack_configuration.py:651-683
Timestamp: 2026-05-20T08:09:43.391Z
Learning: In `src/llama_stack_configuration.py`, the `apply_high_level_inference` function currently emits `provider_id: p_type` (underscore form, e.g. `sentence_transformers`) directly from the high-level type key, which collides with Llama Stack's hyphenated provider IDs (e.g. `sentence-transformers`). This is a known PoC divergence documented in the spike doc ("Findings discovered during PoC") and tracked in the implementation JIRA "Unified llama_stack.config schema + synthesizer". Decision S5 mandates that each backend-specific synthesizer translates LCORE's canonical type Literal vocabulary to the target backend's expected shape (hyphenated provider_id for Llama Stack; model-string prefixes for Pydantic AI). The PoC code will be removed before merge; the fix belongs in the implementation ticket.

Applied to files:

  • src/llama_stack_configuration.py
  • src/constants.py
  • tests/unit/test_llama_stack_configuration.py
  • src/models/config.py
  • src/utils/vector_search.py
📚 Learning: 2026-05-20T08:09:34.319Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:34.319Z
Learning: In the lightspeed-stack project (LCORE-836 unified config), the `provider_shield_id` for a Llama Guard safety shield entry must be a guard model identifier (e.g., `meta-llama/Llama-Guard-3-8B`), not a chat model id like `openai/gpt-4o-mini`. Using a chat model id there only means the native_override key landed — it does not mean the safety shield is actually gating queries. E2E tests for the implementation JIRA must exercise a real Llama Guard model to validate the shield.

Applied to files:

  • src/llama_stack_configuration.py
  • src/utils/vector_search.py
📚 Learning: 2026-05-06T06:57:52.173Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-05-06T06:57:52.173Z
Learning: Applies to src/constants.py : Use `constants.py` for shared constants with descriptive comments and type hints using `Final[type]`

Applied to files:

  • src/constants.py
📚 Learning: 2026-04-15T18:54:09.157Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:769-773
Timestamp: 2026-04-15T18:54:09.157Z
Learning: In lightspeed-core/lightspeed-stack (src/models/requests.py and related files), schema-level field size limits (e.g., max_length=65_536, max_length=32_768) are intentionally written as inline numeric literals, not extracted to constants.py. constants.py is reserved for configurable runtime defaults (e.g., DEFAULT_RH_IDENTITY_MAX_HEADER_SIZE, DEFAULT_MAX_FILE_UPLOAD_SIZE). Do not flag inline literals in field validators or Pydantic Field constraints as needing extraction to constants.py.

Applied to files:

  • src/constants.py
  • src/models/config.py
📚 Learning: 2026-02-23T14:56:59.186Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.

Applied to files:

  • src/utils/responses.py
📚 Learning: 2026-04-07T14:44:42.022Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1469
File: src/models/config.py:1928-1933
Timestamp: 2026-04-07T14:44:42.022Z
Learning: In lightspeed-core/lightspeed-stack, `allow_verbose_infer` (previously `customization.allow_verbose_infer`, now `rlsapi_v1.allow_verbose_infer`) is only used internally by the `rlsapi_v1` `/infer` endpoint and has a single known consumer (the PR author). Backward compatibility for this config field relocation is intentionally not required and should not be flagged in future reviews.

Applied to files:

  • src/utils/vector_search.py
📚 Learning: 2026-04-20T15:09:48.726Z
Learnt from: major
Repo: lightspeed-core/lightspeed-stack PR: 1548
File: src/app/endpoints/rlsapi_v1.py:56-56
Timestamp: 2026-04-20T15:09:48.726Z
Learning: In `src/app/endpoints/rlsapi_v1.py`, the `_get_rh_identity_context = get_rh_identity_context` alias is a deliberate, temporary backward-compatibility shim introduced in PR `#1548` (part 1/3 of Splunk HEC telemetry work). It is planned for removal in part 3 once the responses endpoint is fully wired up and no tests/consumers reference the underscore-prefixed name. Do not flag this alias as unnecessary or dead code until part 3 is merged.

Applied to files:

  • src/utils/vector_search.py
🪛 GitHub Actions: OpenAPI (Spectral) / 0_spectral.txt
docs/openapi.json

[error] 1-1: OpenAPI schema is out of date. 'diff -u docs/openapi.json /tmp/openapi-generated.json' failed; regenerate by running: 'uv run scripts/generate_openapi_schema.py docs/openapi.json'.

🪛 GitHub Actions: OpenAPI (Spectral) / spectral
docs/openapi.json

[error] 1-1: CI failed because docs/openapi.json is out of date. Diff detected between docs/openapi.json and generated schema. Regenerate with: uv run python scripts/generate_openapi_schema.py docs/openapi.json

🪛 markdownlint-cli2 (0.22.1)
docs/rag_guide.md

[warning] 42-42: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

🔇 Additional comments (31)
examples/lightspeed-stack-byok-okp-rag.yaml (1)

39-79: LGTM!

examples/quota-limiter-configuration-sqlite.yaml (1)

36-48: LGTM!

tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml (1)

31-43: LGTM!

tests/e2e/configuration/library-mode/lightspeed-stack-inline-rag.yaml (1)

30-42: LGTM!

tests/e2e/configuration/library-mode/lightspeed-stack.yaml (1)

24-36: LGTM!

tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml (1)

37-49: LGTM!

tests/e2e/configuration/server-mode/lightspeed-stack-inline-rag.yaml (1)

31-43: LGTM!

tests/e2e/configuration/server-mode/lightspeed-stack.yaml (1)

25-37: LGTM!

tests/e2e/features/inline_rag.feature (1)

25-25: LGTM!

Also applies to: 37-37, 46-46, 58-58, 69-69

tests/unit/app/endpoints/test_rags.py (1)

272-293: LGTM!

Also applies to: 386-387, 394-395

tests/integration/endpoints/test_query_byok_integration.py (1)

259-259: LGTM!

Also applies to: 267-270, 289-300, 454-456, 525-527, 785-788, 914-916, 998-1000, 1095-1097, 1105-1105, 1144-1144, 1190-1192, 1197-1197, 1249-1249, 1292-1295, 1300-1300

tests/integration/endpoints/test_responses_byok_integration.py (1)

119-120: LGTM!

Also applies to: 172-173, 222-223, 230-233, 294-304, 384-386, 446-448, 529-531, 537-537, 563-565, 600-602, 607-607, 647-649, 681-684, 689-689

tests/integration/endpoints/test_streaming_query_byok_integration.py (1)

241-241: LGTM!

Also applies to: 249-251, 266-267, 274-277, 348-350, 412-414, 692-695, 775-777, 854-856, 936-938, 945-945, 976-978, 1014-1016, 1023-1023, 1067-1069, 1102-1105, 1112-1112

tests/unit/models/config/test_byok_rag.py (1)

9-214: LGTM!

tests/unit/models/config/test_dump_configuration.py (1)

14-27: LGTM!

Also applies to: 119-254, 478-628, 738-886, 948-1123, 1208-1341, 1509-1644, 1733-1868, 1957-2092

tests/unit/models/config/test_rag_configuration.py (1)

10-17: LGTM!

Also applies to: 20-83, 91-97, 99-112, 114-126, 128-141, 143-177, 185-206, 217-233

tests/unit/telemetry/conftest.py (1)

15-32: LGTM!

Also applies to: 295-316, 391-412

tests/unit/test_configuration.py (1)

1004-1210: LGTM!

Also applies to: 1415-3790

tests/unit/test_llama_stack_configuration.py (1)

265-265: LGTM!

Also applies to: 282-282, 312-312, 540-540, 576-589

tests/unit/utils/test_responses.py (1)

59-59: LGTM!

Also applies to: 1709-1715, 1779-1785, 1803-1809, 1837-1843, 1864-1872, 1895-1900, 1917-1920, 1939-1944, 1968-1974, 3466-3466, 3491-3491, 3528-3528, 3587-3587

tests/unit/utils/test_vector_search.py (1)

25-25: LGTM!

Also applies to: 521-523, 540-543, 580-583, 606-609, 624-630, 670-672, 690-692, 706-707, 716-717, 730-731, 748-749, 763-764, 778-779, 797-803, 821-827, 862-868, 917-923

src/constants.py (1)

172-176: LGTM!

Also applies to: 193-197

src/models/config.py (1)

1765-1791: LGTM!

Also applies to: 1950-1994, 2003-2084, 2294-2299, 2419-2447

src/models/api/responses/successful/configuration.py (1)

82-94: LGTM!

docs/openapi.json (1)

6762-6781: Already covered: regenerate the checked-in OpenAPI artifact.

CI is still reporting docs/openapi.json as out of date relative to the generated schema, so these contract edits will keep drifting until the file is regenerated from the source models.

Also applies to: 11828-11849, 12097-12100, 12183-12202, 14751-14762, 17761-17838, 18745-18784

src/app/endpoints/rags.py (1)

27-27: LGTM!

Also applies to: 169-170

src/configuration.py (1)

478-482: LGTM!

Also applies to: 504-517, 532-535, 549-549

src/llama_stack_configuration.py (1)

20-29: LGTM!

Also applies to: 350-365, 608-619

src/client.py (1)

94-101: LGTM!

src/utils/responses.py (1)

110-110: LGTM!

Also applies to: 242-256, 626-676, 691-696, 1735-1737, 1830-1832

src/utils/vector_search.py (1)

253-253: LGTM!

Also applies to: 461-481, 653-657

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/app/endpoints/rags.py (1)

111-125: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Use Google-style docstring sections in this function.

The function docstring is currently NumPy-style; please convert it to Google format (Args, Returns, Raises) to match repo standards.

As per coding guidelines: "Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/app/endpoints/rags.py` around lines 111 - 125, Update the function
docstring that "Resolves a user-facing rag_id to the llama-stack vector_db_id"
to use Google-style sections: replace NumPy-style "Parameters"/"Returns" with
"Args:", "Returns:", and add a "Raises:" section (use "Raises: None" if it
doesn't raise) to match repository docstring conventions; keep the existing
short description and parameter details (rag_id, byok_rags) and ensure types and
return description remain accurate.
src/utils/vector_search.py (1)

569-589: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Pass rag.okp.max_chunks into the Solr query.

limit = configuration.rag.okp.max_chunks only affects the post-query slice right now. _build_query_params(solr) still falls back to constants.SOLR_VECTOR_SEARCH_DEFAULT_K, so the new config value never changes how many OKP chunks are actually fetched.

Suggested fix
-            params = _build_query_params(solr)
+            params = _build_query_params(solr, k=limit)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/vector_search.py` around lines 569 - 589, The Solr query currently
ignores configuration.rag.okp.max_chunks because _build_query_params(solr) falls
back to constants.SOLR_VECTOR_SEARCH_DEFAULT_K; pass the configured limit into
the query parameters so the client.vector_io.query actually requests that many
chunks. Either update _build_query_params to accept a limit (e.g.,
_build_query_params(solr, limit)) and use configuration.rag.okp.max_chunks when
building params, or set params['k'] = limit (or the appropriate param key) after
params = _build_query_params(solr) and before calling client.vector_io.query;
reference the limit variable, _build_query_params, params,
client.vector_io.query, and constants.SOLR_VECTOR_SEARCH_DEFAULT_K when making
the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/byok_guide.md`:
- Around line 223-231: The table documents deprecated flat RAG keys (rag_id,
rag_type, embedding_model, embedding_dimension, vector_db_id, db_path,
score_multiplier); update these to the new nested schema using
rag.byok.stores[*] entries and the rag.backend field (replace rag_type →
rag.backend, move rag_id/vector_db_id/db_path/etc. under rag.byok.stores with
per-store keys, and adjust embedding_model/embedding_dimension/score_multiplier
to per-store fields). Ensure all examples and the related occurrences (also at
the later section mentioned) use rag.byok.stores and rag.backend rather than the
old top-level keys so the config matches the new contract.

In `@docs/rag_guide.md`:
- Line 42: Change the heading "Inline RAG chunk flow" from an H3 to H2 to fix
the markdown lint MD001; locate the heading line with the text "Inline RAG chunk
flow" in docs/rag_guide.md and update its markdown header marker to the H2 style
(i.e., use two hash symbols) so the document heading levels are sequential.

In `@src/models/config.py`:
- Around line 1996-2000: Add a post-validation check on the model that declares
the stores: list[RagStore] field to reject duplicate RagStore.rag_id values;
implement a Pydantic validator (e.g., `@validator`("stores") or a `@root_validator`
on that model) that collects rag_id from each RagStore, detects duplicates, and
raises a ValueError listing the duplicated rag_id(s) so validation fails if the
same rag_id appears more than once; reference the stores field and the
RagStore.rag_id attribute when locating where to add this check.

---

Outside diff comments:
In `@src/app/endpoints/rags.py`:
- Around line 111-125: Update the function docstring that "Resolves a
user-facing rag_id to the llama-stack vector_db_id" to use Google-style
sections: replace NumPy-style "Parameters"/"Returns" with "Args:", "Returns:",
and add a "Raises:" section (use "Raises: None" if it doesn't raise) to match
repository docstring conventions; keep the existing short description and
parameter details (rag_id, byok_rags) and ensure types and return description
remain accurate.

In `@src/utils/vector_search.py`:
- Around line 569-589: The Solr query currently ignores
configuration.rag.okp.max_chunks because _build_query_params(solr) falls back to
constants.SOLR_VECTOR_SEARCH_DEFAULT_K; pass the configured limit into the query
parameters so the client.vector_io.query actually requests that many chunks.
Either update _build_query_params to accept a limit (e.g.,
_build_query_params(solr, limit)) and use configuration.rag.okp.max_chunks when
building params, or set params['k'] = limit (or the appropriate param key) after
params = _build_query_params(solr) and before calling client.vector_io.query;
reference the limit variable, _build_query_params, params,
client.vector_io.query, and constants.SOLR_VECTOR_SEARCH_DEFAULT_K when making
the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7791912b-70fc-45af-b294-8bbec271970e

📥 Commits

Reviewing files that changed from the base of the PR and between 2209ac9 and 472397b.

📒 Files selected for processing (33)
  • docs/byok_guide.md
  • docs/openapi.json
  • docs/rag_guide.md
  • examples/lightspeed-stack-byok-okp-rag.yaml
  • examples/quota-limiter-configuration-sqlite.yaml
  • src/app/endpoints/rags.py
  • src/client.py
  • src/configuration.py
  • src/constants.py
  • src/llama_stack_configuration.py
  • src/models/api/responses/successful/configuration.py
  • src/models/config.py
  • src/utils/responses.py
  • src/utils/vector_search.py
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-inline-rag.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-inline-rag.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack.yaml
  • tests/e2e/features/inline_rag.feature
  • tests/integration/endpoints/test_query_byok_integration.py
  • tests/integration/endpoints/test_responses_byok_integration.py
  • tests/integration/endpoints/test_streaming_query_byok_integration.py
  • tests/unit/app/endpoints/test_rags.py
  • tests/unit/models/config/test_byok_rag.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/models/config/test_rag_configuration.py
  • tests/unit/telemetry/conftest.py
  • tests/unit/test_configuration.py
  • tests/unit/test_llama_stack_configuration.py
  • tests/unit/utils/test_responses.py
  • tests/unit/utils/test_vector_search.py
📜 Review details
🔇 Additional comments (31)
examples/lightspeed-stack-byok-okp-rag.yaml (1)

39-79: LGTM!

examples/quota-limiter-configuration-sqlite.yaml (1)

36-48: LGTM!

tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml (1)

31-43: LGTM!

tests/e2e/configuration/library-mode/lightspeed-stack-inline-rag.yaml (1)

30-42: LGTM!

tests/e2e/configuration/library-mode/lightspeed-stack.yaml (1)

24-36: LGTM!

tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml (1)

37-49: LGTM!

tests/e2e/configuration/server-mode/lightspeed-stack-inline-rag.yaml (1)

31-43: LGTM!

tests/e2e/configuration/server-mode/lightspeed-stack.yaml (1)

25-37: LGTM!

tests/e2e/features/inline_rag.feature (1)

25-25: LGTM!

Also applies to: 37-37, 46-46, 58-58, 69-69

tests/unit/app/endpoints/test_rags.py (1)

272-293: LGTM!

Also applies to: 386-387, 394-395

tests/integration/endpoints/test_query_byok_integration.py (1)

259-259: LGTM!

Also applies to: 267-270, 289-300, 454-456, 525-527, 785-788, 914-916, 998-1000, 1095-1097, 1105-1105, 1144-1144, 1190-1192, 1197-1197, 1249-1249, 1292-1295, 1300-1300

tests/integration/endpoints/test_responses_byok_integration.py (1)

119-120: LGTM!

Also applies to: 172-173, 222-223, 230-233, 294-304, 384-386, 446-448, 529-531, 537-537, 563-565, 600-602, 607-607, 647-649, 681-684, 689-689

tests/integration/endpoints/test_streaming_query_byok_integration.py (1)

241-241: LGTM!

Also applies to: 249-251, 266-267, 274-277, 348-350, 412-414, 692-695, 775-777, 854-856, 936-938, 945-945, 976-978, 1014-1016, 1023-1023, 1067-1069, 1102-1105, 1112-1112

tests/unit/models/config/test_byok_rag.py (1)

9-214: LGTM!

tests/unit/models/config/test_dump_configuration.py (1)

14-27: LGTM!

Also applies to: 119-254, 478-628, 738-886, 948-1123, 1208-1341, 1509-1644, 1733-1868, 1957-2092

tests/unit/models/config/test_rag_configuration.py (1)

10-17: LGTM!

Also applies to: 20-83, 91-97, 99-112, 114-126, 128-141, 143-177, 185-206, 217-233

tests/unit/telemetry/conftest.py (1)

15-32: LGTM!

Also applies to: 295-316, 391-412

tests/unit/test_configuration.py (1)

1004-1210: LGTM!

Also applies to: 1415-3790

tests/unit/test_llama_stack_configuration.py (1)

265-265: LGTM!

Also applies to: 282-282, 312-312, 540-540, 576-589

tests/unit/utils/test_responses.py (1)

59-59: LGTM!

Also applies to: 1709-1715, 1779-1785, 1803-1809, 1837-1843, 1864-1872, 1895-1900, 1917-1920, 1939-1944, 1968-1974, 3466-3466, 3491-3491, 3528-3528, 3587-3587

tests/unit/utils/test_vector_search.py (1)

25-25: LGTM!

Also applies to: 521-523, 540-543, 580-583, 606-609, 624-630, 670-672, 690-692, 706-707, 716-717, 730-731, 748-749, 763-764, 778-779, 797-803, 821-827, 862-868, 917-923

src/constants.py (1)

172-176: LGTM!

Also applies to: 193-197

src/models/config.py (1)

1765-1791: LGTM!

Also applies to: 1950-1994, 2003-2084, 2294-2299, 2419-2447

src/models/api/responses/successful/configuration.py (1)

82-94: LGTM!

docs/openapi.json (1)

6762-6781: Already covered: regenerate the checked-in OpenAPI artifact.

CI is still reporting docs/openapi.json as out of date relative to the generated schema, so these contract edits will keep drifting until the file is regenerated from the source models.

Also applies to: 11828-11849, 12097-12100, 12183-12202, 14751-14762, 17761-17838, 18745-18784

src/app/endpoints/rags.py (1)

27-27: LGTM!

Also applies to: 169-170

src/configuration.py (1)

478-482: LGTM!

Also applies to: 504-517, 532-535, 549-549

src/llama_stack_configuration.py (1)

20-29: LGTM!

Also applies to: 350-365, 608-619

src/client.py (1)

94-101: LGTM!

src/utils/responses.py (1)

110-110: LGTM!

Also applies to: 242-256, 626-676, 691-696, 1735-1737, 1830-1832

src/utils/vector_search.py (1)

253-253: LGTM!

Also applies to: 461-481, 653-657

🛑 Comments failed to post (3)
docs/byok_guide.md (1)

223-231: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Update these sections to the new nested RAG schema.

These changed sections still document deprecated keys (byok_rag, rag_type). The new contract uses rag.byok.stores and backend, so current snippets can lead users to invalid configuration.

Suggested doc update
-**`byok_rag` field reference:**
+**`rag.byok.stores` field reference:**

-| `rag_type`            | No       | `inline::faiss`                           | Vector store provider type                                                                |
+| `backend`             | No       | `faiss`                                   | Vector store backend type                                                                 |

-A configuration combining a local FAISS store (via `byok_rag`) with a remote pgvector store ...
+A configuration combining a local FAISS store (via `rag.byok.stores`) with a remote pgvector store ...

Also applies to: 374-375

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/byok_guide.md` around lines 223 - 231, The table documents deprecated
flat RAG keys (rag_id, rag_type, embedding_model, embedding_dimension,
vector_db_id, db_path, score_multiplier); update these to the new nested schema
using rag.byok.stores[*] entries and the rag.backend field (replace rag_type →
rag.backend, move rag_id/vector_db_id/db_path/etc. under rag.byok.stores with
per-store keys, and adjust embedding_model/embedding_dimension/score_multiplier
to per-store fields). Ensure all examples and the related occurrences (also at
the later section mentioned) use rag.byok.stores and rag.backend rather than the
old top-level keys so the config matches the new contract.
docs/rag_guide.md (1)

42-42: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix heading level jump to satisfy markdown linting.

This heading jumps from H1 to H3 and triggers MD001. Use H2 here.

Suggested fix
-### Inline RAG chunk flow
+## Inline RAG chunk flow
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

## Inline RAG chunk flow
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 42-42: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/rag_guide.md` at line 42, Change the heading "Inline RAG chunk flow"
from an H3 to H2 to fix the markdown lint MD001; locate the heading line with
the text "Inline RAG chunk flow" in docs/rag_guide.md and update its markdown
header marker to the H2 style (i.e., use two hash symbols) so the document
heading levels are sequential.
src/models/config.py (1)

1996-2000: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject duplicate BYOK rag_id values.

retrieval.inline.sources / retrieval.tool.sources are now keyed by byok.stores[].rag_id, but stores still allows the same rag_id to appear multiple times. That makes one configured source name ambiguous at runtime while still passing validate_retrieval_sources, because the validator collapses IDs into a set. Add a post-validator that fails when rag_id is repeated.

Proposed fix
 class ByokConfiguration(ConfigurationBase):
     """BYOK (Bring Your Own Knowledge) configuration."""
@@
     stores: list[RagStore] = Field(
         default_factory=list,
         title="BYOK RAG stores",
         description="List of BYOK RAG store configurations.",
     )
+
+    `@model_validator`(mode="after")
+    def validate_unique_rag_ids(self) -> Self:
+        """Reject duplicate BYOK store IDs."""
+        seen: set[str] = set()
+        duplicates: set[str] = set()
+        for store in self.stores:
+            if store.rag_id in seen:
+                duplicates.add(store.rag_id)
+            seen.add(store.rag_id)
+        if duplicates:
+            raise ValueError(
+                f"byok.stores contains duplicate rag_id values: {sorted(duplicates)}"
+            )
+        return self

Based on learnings, src/utils/vector_search.py only receives user-facing rag_ids, so rag_id is the join key between configuration and runtime lookup.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/models/config.py` around lines 1996 - 2000, Add a post-validation check
on the model that declares the stores: list[RagStore] field to reject duplicate
RagStore.rag_id values; implement a Pydantic validator (e.g.,
`@validator`("stores") or a `@root_validator` on that model) that collects rag_id
from each RagStore, detects duplicates, and raises a ValueError listing the
duplicated rag_id(s) so validation fails if the same rag_id appears more than
once; reference the stores field and the RagStore.rag_id attribute when locating
where to add this check.

Copy link
Copy Markdown
Contributor

@syedriko syedriko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tisnik tisnik requested a review from radofuchs June 4, 2026 13:01
}
"""

@flaky
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem here is not with flakiness, adding this tag just hides the real issue .

this is in the llama-stack logs
INFO 2026-06-04 12:39:03,513 uvicorn.access:481 server: ::1:52598 - "GET /v1/health HTTP/1.1" 200
llama-stack | HTTP Error 429 thrown while requesting HEAD https://huggingface.co/sentence-transformers/all-mpnet-base-v2/resolve/main/./modules.json
llama-stack | WARNING 2026-06-04 12:39:11,340 huggingface_hub.utils._http:446 uncategorized: HTTP Error 429 thrown while requesting
llama-stack | HEAD https://huggingface.co/sentence-transformers/all-mpnet-base-v2/resolve/main/./modules.json
llama-stack | Retrying in 1s [Retry 1/5].
llama-stack | WARNING 2026-06-04 12:39:11,341 huggingface_hub.utils._http:486 uncategorized: Retrying in 1s [Retry 1/5].
llama-stack | HTTP Error 429 thrown while requesting HEAD https://huggingface.co/sentence-transformers/all-mpnet-base-v2/resolve/main/./modules.json
llama-stack | WARNING 2026-06-04 12:39:12,362 huggingface_hub.utils._http:446 uncategorized: HTTP Error 429 thrown while requesting
llama-stack | HEAD https://huggingface.co/sentence-transformers/all-mpnet-base-v2/resolve/main/./modules.json
llama-stack | Retrying in 2s [Retry 2/5].

@tisnik
Copy link
Copy Markdown
Contributor

tisnik commented Jun 4, 2026

/hold

@tisnik tisnik added the hold label Jun 4, 2026
@tisnik tisnik added this to the 0.7 milestone Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants