feat: persist schema generator examples#362
Conversation
imbajin
left a comment
There was a problem hiding this comment.
Blocking: no. Summary: Schema-example persistence currently validates after unrelated UI actions, which can make successful actions look failed. Evidence: static diff in vector_graph_block.py, targeted pytest passed after uv sync, and latest-head CI has no visible failures.
| query_examples=None, | ||
| few_shot_examples=None, | ||
| ): | ||
| validated_query_examples = _validate_schema_generator_examples(query_examples, "Query examples") |
There was a problem hiding this comment.
Evidence: these save chains now run the primary action first and only then call store_prompt(..., query_example, few_shot). Since store_prompt() validates those JSON blobs immediately, an invalid or blank schema-example editor can make actions like clearing indexes report an error after the action already ran, and blank values cannot clear a persisted override back to bundled defaults.
Please keep schema-generator validation/persistence on an explicit schema-generator save/generate path, or validate before the primary action and treat blank/whitespace examples as clearing the persisted override. Add a regression test for clearing both fields back to the bundled fallback.
imbajin
left a comment
There was a problem hiding this comment.
Blocking: yes. Summary: Found one schema-example contract regression in the current head. Evidence: target pytest passed after uv sync; static check shows persisted bundled query examples are filtered out before schema generation.
| if not examples: | ||
| return "" | ||
| try: | ||
| json.loads(examples) |
There was a problem hiding this comment.
Evidence: the persisted/bundled query_examples.json is a list[str], but SchemaBuildNode only keeps parsed items that are dicts containing description and gremlin; the current head therefore reloads and persists examples that are later collapsed to an empty list before schema generation. The new tests cover YAML persistence and JSON syntax, but not that persisted query examples actually reach the schema prompt.
Impact: users can save examples successfully and still generate schemas without any query examples, so this feature appears to work while silently dropping the main payload. Please align this validator with the downstream contract, or relax the downstream parser to accept the shipped string-list format, and add a regression test that persisted/bundled query examples produce a non-empty query_examples payload for schema generation.
VGalaxies
left a comment
There was a problem hiding this comment.
Review summary
- Blocking: yes
- Summary: The PR can persist few-shot schema examples that pass JSON syntax validation but are rejected by the schema-generation flow.
- Evidence:
git diff --check origin/main...HEAD- static inspection of the changed validator/persistence path and downstream schema builder contract
|
|
||
| def _persist_schema_generator_examples(query_examples, few_shot_examples): | ||
| validated_query_examples = _validate_schema_generator_examples(query_examples, "Query examples") | ||
| validated_few_shot_examples = _validate_schema_generator_examples(few_shot_examples, "Few-shot schema examples") |
There was a problem hiding this comment.
Medium: Validate few-shot schema shape before persisting
hugegraph-llm/src/hugegraph_llm/demo/rag_demo/vector_graph_block.py:72
Evidence
_validate_schema_generator_examples()only checksjson.loads(), so_persist_schema_generator_examples()accepts and saves values like[{"schema": {"vertices": []}}]; the new tests use that array shape. Downstream,SchemaBuildNodepasses the parsed value through asfew_shot_schema, andSchemaBuilder.run()rejects anything that is not a dict with"'few_shot_schema' must be a dict".
Impact
- A user can save syntactically valid few-shot examples that make schema generation fail, then the invalid persisted override keeps reloading until manually cleared.
Requested fix
- Validate
few_shot_examplesas the object/dict shape required by schema generation before writing it to YAML, or update the downstream flow to intentionally accept the persisted shape and cover that path with a regression test.
Purpose
Closes #346.
This PR persists edited schema-generator query examples and few-shot schema examples in the demo.
Changes
resources/prompt_examplesread-only as defaults.Tests
uv run ruff check hugegraph-llm/src/hugegraph_llm/config/models/base_prompt_config.py hugegraph-llm/src/hugegraph_llm/demo/rag_demo/vector_graph_block.py hugegraph-llm/src/tests/document/test_schema_generator_examples_persistence.pyuv run ruff format --check .uv run pytest hugegraph-llm/src/tests/document/test_schema_generator_examples_persistence.pyuv run pytest hugegraph-llm/src/tests/document