Skip to content

LCORE-2437: Support pgvector in byok_rag enrichment script#1856

Open
are-ces wants to merge 1 commit into
lightspeed-core:mainfrom
are-ces:lcore-2437-byok-pgvector-support-v2
Open

LCORE-2437: Support pgvector in byok_rag enrichment script#1856
are-ces wants to merge 1 commit into
lightspeed-core:mainfrom
are-ces:lcore-2437-byok-pgvector-support-v2

Conversation

@are-ces
Copy link
Copy Markdown
Contributor

@are-ces are-ces commented Jun 5, 2026

Description

Add pgvector as a supported rag_type in the BYOK RAG enrichment pipeline. When rag_type: remote::pgvector is set in a byok_rag entry, the enrichment script now generates a valid pgvector provider config with PostgreSQL connection fields (host, port, db, user, password) instead of the FAISS-style kv_sqlite storage backend.

Key changes:

  • ByokRag model: Add optional pgvector connection fields with ${env.POSTGRES_*} defaults, make db_path optional (not needed for pgvector), add model validator for backend-specific field requirements.
  • Enrichment pipeline: Introduce a VECTOR_IO_TEMPLATES map so adding new backends requires only a new map entry. construct_storage_backends_section skips backends that don't need custom storage. _build_vector_io_config builds provider configs from templates.
  • Docs: Remove "not yet supported" notes for pgvector, add byok_rag config examples.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

  • Assisted-by: Claude Opus 4.6
  • Generated by: Claude Opus 4.6

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  1. Configure lightspeed-stack.yaml with a pgvector BYOK entry:
    byok_rag:
      - rag_id: pgvector-test
        rag_type: remote::pgvector
        embedding_model: sentence-transformers/all-mpnet-base-v2
        embedding_dimension: 768
        vector_db_id: e2e_pgvector_test
        host: localhost
        port: "5432"
        db: pgvector_example
        user: user
        password: ""
  2. Run enrichment: uv run python src/llama_stack_configuration.py -c lightspeed-stack.yaml -i run.yaml -o /tmp/enriched.yaml
  3. Verify enriched config contains remote::pgvector provider with persistence.backend: kv_default and connection fields.
  4. Start the stack in library mode and query — pgvector chunks are returned via inline RAG.
  5. Verify existing FAISS behavior is unchanged: uv run make test-unit (2528 passed).

Summary by CodeRabbit

  • New Features

    • Enabled pgvector as a configurable remote RAG backend with automatic environment variable defaults for PostgreSQL connection fields.
  • Documentation

    • Updated configuration guides with pgvector setup examples and clarified backend selection options.
  • Tests

    • Added comprehensive test coverage for pgvector and mixed FAISS/pgvector configurations.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 5, 2026

Review Change Stack

Warning

Review limit reached

@are-ces, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 12 minutes and 43 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 74b5ee48-a787-4d85-83bf-31d22594cad1

📥 Commits

Reviewing files that changed from the base of the PR and between 88f1994 and 94b5cf4.

📒 Files selected for processing (8)
  • docs/byok_guide.md
  • docs/openapi.json
  • docs/rag_guide.md
  • src/llama_stack_configuration.py
  • src/models/config.py
  • tests/unit/models/config/test_byok_rag.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/test_llama_stack_configuration.py

Walkthrough

This PR enables pgvector as a configurable remote RAG backend in BYOK (Bring Your Own Knowledge) configurations. The ByokRag model now supports both inline::faiss (local) and remote::pgvector (remote) backends with automatic PostgreSQL connection field defaults, configuration generation uses a template-driven approach to build backend-specific provider configs, and documentation guides users through setting up pgvector with environment variable references.

Changes

pgvector BYOK RAG integration

Layer / File(s) Summary
ByokRag model contract and validation
src/models/config.py, tests/unit/models/config/test_byok_rag.py, tests/unit/models/config/test_dump_configuration.py
ByokRag.db_path is now optional (required only for FAISS), five new PostgreSQL connection fields are added for pgvector, and a validate_rag_type_fields validator enforces FAISS requiring db_path and auto-populates missing pgvector fields with ${env.POSTGRES_*} defaults. Tests cover FAISS validation, pgvector defaults, explicit overrides, partial overrides, and absence of db_path requirement for pgvector.
Vector I/O template system and provider generation
src/llama_stack_configuration.py, tests/unit/test_llama_stack_configuration.py
New VECTOR_IO_TEMPLATES mapping rag_type values to persistence namespaces, backends, and extra fields drives conditional storage backend creation: pgvector skips backend creation, FAISS creates one. Helper _build_vector_io_config dynamically builds vector-IO provider config by merging template defaults with BYOK RAG values. Tests validate single and mixed FAISS+pgvector inputs, storage backend exclusion for pgvector, and end-to-end provider registration and configuration emission.
Documentation updates for pgvector configuration
docs/byok_guide.md, docs/rag_guide.md
Removed outdated "pgvector not supported via byok_rag" notes and replaced hardcoded Llama Stack provider examples with byok_rag-based configuration using rag_type: remote::pgvector. Added documentation of PostgreSQL field defaults to environment variables. Updated Example 2 in byok_guide to demonstrate combined local FAISS + remote pgvector setup.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • tisnik
  • syedriko
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly describes the main change: adding pgvector support to the byok_rag enrichment script, which is the primary objective of this PR.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Warning

Review ran into problems

🔥 Problems

Stopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a @coderabbit review after the pipeline has finished.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/models/config.py (1)

1775-1870: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restrict rag_type to supported values to prevent late enrichment failures.

Line 1775 accepts any non-empty string, but unsupported values fail later in src/llama_stack_configuration.py during provider generation. Validate this at model parse time (or use Literal) so bad config is rejected early with a clear error.

Suggested fix
-from typing import Any, Literal, Optional, Self
+from typing import Any, Literal, Optional, Self
@@
-    rag_type: str = Field(
+    rag_type: Literal["inline::faiss", "remote::pgvector"] = Field(
         constants.DEFAULT_RAG_TYPE,
         min_length=1,
         title="RAG type",
         description="Type of RAG database (e.g. 'inline::faiss', 'remote::pgvector').",
     )
@@
-        if self.rag_type == "inline::faiss":
+        if self.rag_type == "inline::faiss":
             if not self.db_path:
                 raise ValueError("db_path is required when rag_type is 'inline::faiss'")
         elif self.rag_type == "remote::pgvector":
@@
                 if getattr(self, field_name) is None:
                     object.__setattr__(self, field_name, default_value)
+        else:
+            raise ValueError(
+                "Unsupported rag_type. Expected one of: 'inline::faiss', 'remote::pgvector'"
+            )
         return self
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/models/config.py` around lines 1775 - 1870, The rag_type Field currently
allows any non-empty string (rag_type) which defers errors until provider
generation; restrict it to supported values (e.g. use typing.Literal[...] or an
Enum) and validate early in the model: change the type of rag_type to a
Literal/Enum of the supported options (e.g. "inline::faiss", "remote::pgvector",
etc.) and/or add a check at the start of validate_rag_type_fields to raise
ValueError if self.rag_type not in the allowed set, so invalid configs are
rejected during model parsing; update the Field description accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/byok_guide.md`:
- Around line 303-306: Update the field-reference table so the `db_path` row no
longer lists it as always required; change its requirement to "required for
`inline::faiss` only" and ensure the pgvector section stays consistent with the
note that connection fields (`host`, `port`, `db`, `user`, `password`) default
to `${env.POSTGRES_*}` when omitted; update any caption or remark in the table
that currently implies `db_path` is mandatory to reference `inline::faiss`
specifically instead.

In `@src/llama_stack_configuration.py`:
- Around line 342-344: When applying extra_fields from template into config,
treat explicit None/null or empty-string values in brag as missing so defaults
are used; replace the current assignment config[field] = brag.get(field,
default) with logic that reads val = brag.get(field, None) and if val is None or
val == "" sets val = default before assigning to config[field], ensuring fields
like host: (YAML null/empty) get the template default rather than literal
null/empty in the provider config.

In `@src/models/config.py`:
- Around line 1845-1850: The password field in src/models/config.py is declared
as password: Optional[str] using Field which allows plain-text serialization;
change its type to Optional[SecretStr] (from pydantic) and update the Field
usage accordingly so the value is treated as a secret (e.g., use SecretStr for
Postgres password handling consistent with other DB secrets). Add/adjust the
import for SecretStr, update any type hints or callers that access password
(e.g., password.get_secret_value() where raw string is required), and ensure
default behavior and env-derived fallback remain intact.

In `@tests/unit/models/config/test_byok_rag.py`:
- Around line 230-245: The test function
test_byok_rag_pgvector_custom_connection_fields sets custom connection values
for user and password but only asserts host, port, and db; update this test to
also assert the overridden credentials by adding assertions for store.user ==
"admin" and store.password == "secret" within the
test_byok_rag_pgvector_custom_connection_fields function so regressions in
mapping user/password are caught.

---

Outside diff comments:
In `@src/models/config.py`:
- Around line 1775-1870: The rag_type Field currently allows any non-empty
string (rag_type) which defers errors until provider generation; restrict it to
supported values (e.g. use typing.Literal[...] or an Enum) and validate early in
the model: change the type of rag_type to a Literal/Enum of the supported
options (e.g. "inline::faiss", "remote::pgvector", etc.) and/or add a check at
the start of validate_rag_type_fields to raise ValueError if self.rag_type not
in the allowed set, so invalid configs are rejected during model parsing; update
the Field description accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4d96cc64-eda3-4d81-bf28-c9b67ecaee54

📥 Commits

Reviewing files that changed from the base of the PR and between 014a5ba and 88f1994.

📒 Files selected for processing (7)
  • docs/byok_guide.md
  • docs/rag_guide.md
  • src/llama_stack_configuration.py
  • src/models/config.py
  • tests/unit/models/config/test_byok_rag.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/test_llama_stack_configuration.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: build-pr
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (3)
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/models/config/test_byok_rag.py
  • tests/unit/test_llama_stack_configuration.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Llama Stack imports: Use from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
All functions must have complete type annotations for parameters and return types, use modern syntax (str | int), and include descriptive docstrings
Use snake_case with descriptive, action-oriented names for functions (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying function parameters
Use async def for I/O operations and external API calls
Use standard log levels with clear purposes: debug() for diagnostic info, info() for program execution, warning() for unexpected events, error() for serious problems
All classes must have descriptive docstrings explaining purpose and use PascalCase with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Abstract classes must use ABC with @abstractmethod decorators
Follow Google Python docstring conventions with required sections: Parameters, Returns, Raises, and Attributes for classes

Files:

  • src/models/config.py
  • src/llama_stack_configuration.py
src/models/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Pydantic models must use @model_validator and @field_validator for validation and complete type annotations for all attributes, avoiding Any type

Files:

  • src/models/config.py
🧠 Learnings (7)
📓 Common learnings
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1843
File: src/utils/vector_search.py:472-484
Timestamp: 2026-06-03T11:15:41.987Z
Learning: In `src/utils/vector_search.py`, the `_fetch_byok_rag` function's `vector_store_ids` parameter always contains user-facing `rag_id`s, never raw llama-stack `vector_db_id`s. The API contract enforces rag_ids only, so filtering `vector_store_ids` against `configuration.configuration.rag.retrieval.inline.sources` (also rag_ids) before calling `resolve_vector_store_ids` is intentional and correct.
📚 Learning: 2026-06-03T11:15:41.987Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1843
File: src/utils/vector_search.py:472-484
Timestamp: 2026-06-03T11:15:41.987Z
Learning: In `src/utils/vector_search.py`, the `_fetch_byok_rag` function's `vector_store_ids` parameter always contains user-facing `rag_id`s, never raw llama-stack `vector_db_id`s. The API contract enforces rag_ids only, so filtering `vector_store_ids` against `configuration.configuration.rag.retrieval.inline.sources` (also rag_ids) before calling `resolve_vector_store_ids` is intentional and correct.

Applied to files:

  • docs/rag_guide.md
  • src/models/config.py
  • tests/unit/models/config/test_byok_rag.py
  • tests/unit/test_llama_stack_configuration.py
  • docs/byok_guide.md
  • src/llama_stack_configuration.py
📚 Learning: 2025-08-18T10:58:14.951Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:47-47
Timestamp: 2025-08-18T10:58:14.951Z
Learning: psycopg2-binary is required by some llama-stack providers in the lightspeed-stack project, so it cannot be replaced with psycopg v3 or moved to optional dependencies without breaking llama-stack functionality.

Applied to files:

  • docs/rag_guide.md
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.

Applied to files:

  • src/models/config.py
📚 Learning: 2025-12-18T10:21:09.038Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:09.038Z
Learning: In Llama Stack version 0.3.x, telemetry provider configuration is not supported under the `providers` section in run.yaml configuration files. Telemetry can be enabled with just `telemetry.enabled: true` without requiring an explicit provider block.

Applied to files:

  • docs/byok_guide.md
📚 Learning: 2026-05-20T08:09:43.391Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: src/llama_stack_configuration.py:651-683
Timestamp: 2026-05-20T08:09:43.391Z
Learning: In `src/llama_stack_configuration.py`, the `apply_high_level_inference` function currently emits `provider_id: p_type` (underscore form, e.g. `sentence_transformers`) directly from the high-level type key, which collides with Llama Stack's hyphenated provider IDs (e.g. `sentence-transformers`). This is a known PoC divergence documented in the spike doc ("Findings discovered during PoC") and tracked in the implementation JIRA "Unified llama_stack.config schema + synthesizer". Decision S5 mandates that each backend-specific synthesizer translates LCORE's canonical type Literal vocabulary to the target backend's expected shape (hyphenated provider_id for Llama Stack; model-string prefixes for Pydantic AI). The PoC code will be removed before merge; the fix belongs in the implementation ticket.

Applied to files:

  • src/llama_stack_configuration.py

Comment thread docs/byok_guide.md
Comment thread src/llama_stack_configuration.py
Comment thread src/models/config.py Outdated
Comment thread tests/unit/models/config/test_byok_rag.py
@are-ces are-ces force-pushed the lcore-2437-byok-pgvector-support-v2 branch 3 times, most recently from d0aced5 to 14bfe24 Compare June 5, 2026 13:08
@are-ces are-ces requested review from syedriko and tisnik June 5, 2026 13:15
Add pgvector as a supported rag_type in the BYOK RAG enrichment
pipeline. When rag_type is set to remote::pgvector, the enrichment
script generates a pgvector provider config with PostgreSQL connection
fields instead of the FAISS-style kv_sqlite storage backend.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@are-ces are-ces force-pushed the lcore-2437-byok-pgvector-support-v2 branch from 14bfe24 to 94b5cf4 Compare June 5, 2026 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant