Skip to content

feat: persist schema generator examples#362

Open
nw9663644-eng wants to merge 3 commits into
apache:mainfrom
nw9663644-eng:feat-persist-schema-generator-examples
Open

feat: persist schema generator examples#362
nw9663644-eng wants to merge 3 commits into
apache:mainfrom
nw9663644-eng:feat-persist-schema-generator-examples

Conversation

@nw9663644-eng

Copy link
Copy Markdown
Contributor

Purpose

Closes #346.

This PR persists edited schema-generator query examples and few-shot schema examples in the demo.

Changes

  • Added prompt config fields for schema-generator query examples and few-shot schema examples.
  • Load persisted schema-generator examples into the demo UI before falling back to bundled resource examples.
  • Persist edited examples through the existing prompt config save path.
  • Validate edited examples as JSON before saving.
  • Reject invalid JSON with a clear UI error and avoid persisting invalid content.
  • Keep bundled examples under resources/prompt_examples read-only as defaults.
  • Added tests for save/load behavior, invalid JSON handling, bundled fallback behavior, invalid persisted fallback behavior, UI persistence wiring, and old prompt config compatibility.

Tests

  • uv run ruff check hugegraph-llm/src/hugegraph_llm/config/models/base_prompt_config.py hugegraph-llm/src/hugegraph_llm/demo/rag_demo/vector_graph_block.py hugegraph-llm/src/tests/document/test_schema_generator_examples_persistence.py
  • uv run ruff format --check .
  • uv run pytest hugegraph-llm/src/tests/document/test_schema_generator_examples_persistence.py
  • uv run pytest hugegraph-llm/src/tests/document

@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Jun 10, 2026
@github-actions github-actions Bot added the llm label Jun 10, 2026

@imbajin imbajin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: no. Summary: Schema-example persistence currently validates after unrelated UI actions, which can make successful actions look failed. Evidence: static diff in vector_graph_block.py, targeted pytest passed after uv sync, and latest-head CI has no visible failures.

query_examples=None,
few_shot_examples=None,
):
validated_query_examples = _validate_schema_generator_examples(query_examples, "Query examples")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Validate schema examples before generic actions

Evidence: these save chains now run the primary action first and only then call store_prompt(..., query_example, few_shot). Since store_prompt() validates those JSON blobs immediately, an invalid or blank schema-example editor can make actions like clearing indexes report an error after the action already ran, and blank values cannot clear a persisted override back to bundled defaults.

Please keep schema-generator validation/persistence on an explicit schema-generator save/generate path, or validate before the primary action and treat blank/whitespace examples as clearing the persisted override. Add a regression test for clearing both fields back to the bundled fallback.

@imbajin imbajin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: yes. Summary: Found one schema-example contract regression in the current head. Evidence: target pytest passed after uv sync; static check shows persisted bundled query examples are filtered out before schema generation.

if not examples:
return ""
try:
json.loads(examples)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

‼️ Validate the schema-example shape, not only JSON syntax

Evidence: the persisted/bundled query_examples.json is a list[str], but SchemaBuildNode only keeps parsed items that are dicts containing description and gremlin; the current head therefore reloads and persists examples that are later collapsed to an empty list before schema generation. The new tests cover YAML persistence and JSON syntax, but not that persisted query examples actually reach the schema prompt.

Impact: users can save examples successfully and still generate schemas without any query examples, so this feature appears to work while silently dropping the main payload. Please align this validator with the downstream contract, or relax the downstream parser to accept the shipped string-list format, and add a regression test that persisted/bundled query examples produce a non-empty query_examples payload for schema generation.

@VGalaxies VGalaxies left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary

  • Blocking: yes
  • Summary: The PR can persist few-shot schema examples that pass JSON syntax validation but are rejected by the schema-generation flow.
  • Evidence:
    • git diff --check origin/main...HEAD
    • static inspection of the changed validator/persistence path and downstream schema builder contract


def _persist_schema_generator_examples(query_examples, few_shot_examples):
validated_query_examples = _validate_schema_generator_examples(query_examples, "Query examples")
validated_few_shot_examples = _validate_schema_generator_examples(few_shot_examples, "Few-shot schema examples")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium: Validate few-shot schema shape before persisting

hugegraph-llm/src/hugegraph_llm/demo/rag_demo/vector_graph_block.py:72

Evidence

  • _validate_schema_generator_examples() only checks json.loads(), so _persist_schema_generator_examples() accepts and saves values like [{"schema": {"vertices": []}}]; the new tests use that array shape. Downstream, SchemaBuildNode passes the parsed value through as few_shot_schema, and SchemaBuilder.run() rejects anything that is not a dict with "'few_shot_schema' must be a dict".

Impact

  • A user can save syntactically valid few-shot examples that make schema generation fail, then the invalid persisted override keeps reloading until manually cleared.

Requested fix

  • Validate few_shot_examples as the object/dict shape required by schema generation before writing it to YAML, or update the downstream flow to intentionally accept the persisted shape and cover that path with a regression test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request llm size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Persist edited schema generator query and few-shot examples in the demo

4 participants