refactor: fix code scan findings in core modules by imbajin · Pull Request #358 · apache/hugegraph-ai

imbajin · 2026-06-03T05:01:01Z

Follow #357

Short Summary

fix CS-001 admin log path traversal and insecure default token access
fix client error contracts for sensitive logging, malformed successful responses, and Gremlin exception propagation
fix graph import typed-value preservation and request-scoped config/provider validation
replace fake integration tests with production flow/operator smoke coverage

Full Summary by CR

Bug Fixes

Enhance logging and error handling: Prevent path traversal, improve exception observability, and introduce more precise error types.
Mask request/response logs to reduce the risk of sensitive information leakage.
Prevent global configurations from leaking or interfering with each other during concurrent requests.

New Features

Restrict and validate supported LLM and reranker provider types.
Strengthen input validation and controlled log access for management interfaces.

Documentation

Add a complete code scanning plan, checkpoints, and final report documentation.

Tests

Expand and adjust unit/integration tests to cover security, configuration isolation, and exception paths.

- rebuild goal-scan from the latest main branch - retain code-scan audit docs and resolved issue ledger - preserve response error contracts and redaction behavior - keep production smoke tests on namespaced fixtures

- make RAG request client_config scoped and restored after use - avoid eager request redaction when debug logging is disabled - preserve explicit response validation errors and simplify HTTPError flow - clean admin error control flow and cross-platform path assertion - expose empty Gremlin payloads instead of silently returning None

- reject malformed Gremlin response payloads explicitly - avoid leaking raw KeyError from ResponseData construction - keep None response handling as the existing no-result path - cover empty payload behavior with a focused regression test

- remove request-wide graph config lock from RAG API - roll back provider type settings when config apply fails - let graph import apply typed defaults for missing properties - clarify scan report wording for post-scan fixes

- add TODO for replacing temporary huge_settings mutation - document the request-scoped flow/operator context direction - keep behavior unchanged for the current PR

- stop tracking .workflow/code-scan execution records - ignore local code-scan scratch artifacts - keep durable code-scan docs under docs/

Copilot

Pull request overview

This PR strengthens core reliability/security contracts across hugegraph-python-client and hugegraph-llm (response parsing, error typing, sensitive-log redaction, admin log path hardening), while also restructuring/expanding tests and adding a restartable code-scan ledger/reporting set under .workflow/code-scan/.

Changes:

Hardened security boundaries (admin /logs path validation; redaction of password/token-like fields in client debug/error logs).
Tightened client error/response contracts (typed ServerError / ResponseParseError, stricter Gremlin response payload validation).
Replaced/augmented integration & contract tests to exercise production flows/operators and newly defined contracts; added code-scan plan/spec/report artifacts.

Reviewed changes

Copilot reviewed 42 out of 43 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
hugegraph-python-client/src/tests/api/test_schema.py	Adds targeted FIXME markers for missing schema-builder contract coverage.
hugegraph-python-client/src/tests/api/test_response_validation.py	Adds contract tests for parse errors, status typing, and log redaction behavior.
hugegraph-python-client/src/tests/api/test_gremlin.py	Updates expected Gremlin error typing and adds contract tests for exception preservation and payload validation.
hugegraph-python-client/src/tests/api/test_graph.py	Adds FIXME markers and highlights current integration test state isolation gaps.
hugegraph-python-client/src/tests/api/test_auth_routing.py	Adds contract tests for debug redaction in `HGraphSession.request()` (plus FIXME about real resolver coverage).
hugegraph-python-client/src/pyhugegraph/utils/util.py	Implements `redact_sensitive_data()`, improves response parsing/typing, and raises `ResponseParseError` for malformed successes.
hugegraph-python-client/src/pyhugegraph/utils/huge_requests.py	Redacts sensitive kwargs in debug logs and avoids eager f-string logging.
hugegraph-python-client/src/pyhugegraph/utils/exceptions.py	Adds `ResponseParseError` and `ServerError` exception types.
hugegraph-python-client/src/pyhugegraph/api/gremlin.py	Stops blanket re-wrapping; validates response payload shape and preserves upstream exception types.
hugegraph-llm/src/tests/operators/llm_op/test_property_graph_extract.py	Updates expectations to preserve typed values and filter unknown properties.
hugegraph-llm/src/tests/operators/index_op/test_vector_index_query.py	Adds FIXME for flow→operator parameter pass-through contract coverage.
hugegraph-llm/src/tests/operators/hugegraph_op/test_commit_to_hugegraph.py	Adds FIXME markers for missing real failure-branch coverage.
hugegraph-llm/src/tests/operators/hugegraph_op/test_commit_to_hugegraph_load_into_graph.py	Adds an extraction→commit round-trip typed-value preservation contract test and additional FIXME markers.
hugegraph-llm/src/tests/models/llms/test_openai_client.py	Adds FIXME markers for provider failure-propagation contract gaps.
hugegraph-llm/src/tests/integration/test_rag_pipeline.py	Replaces test-local stand-ins with production operators and deterministic fakes for smoke coverage.
hugegraph-llm/src/tests/integration/test_kg_construction.py	Replaces mock KG pipeline with production flow/operator smoke tests using deterministic fake LLM.
hugegraph-llm/src/tests/integration/test_graph_rag_pipeline.py	Replaces test-local pipeline scaffolding with production flow build smoke tests.
hugegraph-llm/src/tests/indices/test_faiss_vector_index.py	Adds FIXME markers for cross-backend threshold semantics/contract coverage.
hugegraph-llm/src/tests/config/test_prompt_config.py	Adds FIXME markers for runtime YAML prompt contract coverage.
hugegraph-llm/src/tests/config/test_config.py	Adds FIXME markers for `.env` sync typing and side-effect contract coverage.
hugegraph-llm/src/tests/api/test_rag_api.py	Adds contract tests for config rollback, supported-provider validation, and request-scoped graph overrides.
hugegraph-llm/src/tests/api/test_admin_api.py	Adds contract tests for `/logs` traversal rejection, insecure token rejection, and valid streaming.
hugegraph-llm/src/hugegraph_llm/operators/llm_op/property_graph_extract.py	Preserves typed property values and filters to schema-defined fields.
hugegraph-llm/src/hugegraph_llm/api/rag_api.py	Adds request-scoped config snapshot/restore, provider validation/rollback, and request handling refactors.
hugegraph-llm/src/hugegraph_llm/api/models/rag_requests.py	Constrains provider types via `Literal` to reject unsupported providers at validation time.
hugegraph-llm/src/hugegraph_llm/api/admin_api.py	Hardens `/logs` via secure token checks and strict path validation.
docs/specs/2026-05-31-hugegraph-ai-code-scan-design.md	Adds code-scan design spec and invariants/priority model.
docs/plans/2026-05-31-hugegraph-ai-code-scan.md	Adds code-scan execution plan and deliverables/ledger structure.
.workflow/code-scan/README.md	Introduces scan ledger rules and pointers to spec/plan.
.workflow/code-scan/code-scan-state.json	Captures scan progress/state, touched files, and reported commands/results.
.workflow/code-scan/checkpoints/00-setup.md	Records setup checkpoint for scan process.
.workflow/code-scan/checkpoints/01-client-transport.md	Records client transport lane checkpoint results.
.workflow/code-scan/checkpoints/02-client-api-structure.md	Records client API/structure lane checkpoint results.
.workflow/code-scan/checkpoints/03-llm-api-config.md	Records LLM API/config lane checkpoint results.
.workflow/code-scan/checkpoints/04-llm-flow-operator.md	Records LLM flow/operator lane checkpoint results.
.workflow/code-scan/checkpoints/05-llm-index-model.md	Records LLM index/model lane checkpoint results.
.workflow/code-scan/checkpoints/06-test-quality.md	Records test-quality lane checkpoint results.
.workflow/code-scan/checkpoints/07-synthesis.md	Records cross-module synthesis checkpoint results.
.workflow/code-scan/reports/issues.md	Adds prioritized issue ledger with evidence and statuses.
.workflow/code-scan/reports/module-map.md	Adds a module/layer scan coverage map.
.workflow/code-scan/reports/test-quality-ledger.md	Adds a test-quality ledger summarizing gaps, FIXMEs, and coverage notes.
.workflow/code-scan/reports/final-code-scan-report.md	Adds final scan report summary and follow-up fix progress section.
.gitignore	Ignores local PR review scratch directories under `.workflow/`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- remove run-specific checklist state from scan plan - document scratch ledger as optional local state - avoid agent-specific wording in reusable docs

Copilot

Pull request overview

Copilot reviewed 28 out of 29 changed files in this pull request and generated 3 comments.

imbajin

I found a few remaining correctness/security gaps around failed config rollback, Admin UI log access, Gremlin contracts, and flow-test coverage. I also +1'd the existing Copilot comment for the request-scoped graph config race instead of duplicating it.

imbajin · 2026-06-03T07:41:20Z

Follow-up note on request_graph_config() concurrency:

The current snapshot/restore is only a transitional guard. It reduces permanent global config leakage, but it does not provide true concurrent request isolation because /rag, /rag/graph, and /text2gremlin still temporarily mutate process-global huge_settings.

I do not think we should fix this by adding a global lock in this PR. That critical section covers potentially long-running RAG/Text2Gremlin execution, including graph queries, vector retrieval, and LLM calls; locking it would effectively serialize concurrent requests.

Suggested scope for this PR:

keep the snapshot/restore improvement;
strengthen the comment to an explicit FIXME;
leave the real fix to a dedicated follow-up PR.

Expected follow-up direction: pass graph config through request-scoped flow/operator context and create per-request HugeGraph clients instead of reading/writing huge_settings globally.

- replace transitional TODO with explicit FIXME - document why global locking is not acceptable - point follow-up toward request-scoped graph config

- redact sensitive response bodies in client error logs - normalize Gremlin malformed success payload errors - guard Gradio admin log access with secure token checks - roll back concrete provider config fields on failures - cover edge property filtering and RAG flow smoke contracts

imbajin added 5 commits June 2, 2026 21:08

fix: replay code scan fixes onto main

82dd169

- rebuild goal-scan from the latest main branch - retain code-scan audit docs and resolved issue ledger - preserve response error contracts and redaction behavior - keep production smoke tests on namespaced fixtures

fix(client): validate gremlin response payload

ef5cd60

- reject malformed Gremlin response payloads explicitly - avoid leaking raw KeyError from ResponseData construction - keep None response handling as the existing no-result path - cover empty payload behavior with a focused regression test

fix(llm): address follow-up review comments

902fd6b

- remove request-wide graph config lock from RAG API - roll back provider type settings when config apply fails - let graph import apply typed defaults for missing properties - clarify scan report wording for post-scan fixes

chore(llm): note request-scoped graph config follow-up

fefe6b2

- add TODO for replacing temporary huge_settings mutation - document the request-scoped flow/operator context direction - keep behavior unchanged for the current PR

Copilot AI review requested due to automatic review settings June 3, 2026 05:01

dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Jun 3, 2026

Copilot started reviewing on behalf of imbajin June 3, 2026 05:01 View session

github-actions Bot added llm python-client labels Jun 3, 2026

dosubot Bot added bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request labels Jun 3, 2026

imbajin changed the title ~~refactor: enhance the tests frame & quality in core modules~~ refactor: fix code scan findings in core modules Jun 3, 2026

imbajin mentioned this pull request Jun 3, 2026

refactor(client): pin 1.7.0 shape with deterministic structural assertions #342

Merged

chore: remove code scan scratch artifacts

536d935

- stop tracking .workflow/code-scan execution records - ignore local code-scan scratch artifacts - keep durable code-scan docs under docs/

Copilot AI reviewed Jun 3, 2026

View reviewed changes

Comment thread hugegraph-llm/src/hugegraph_llm/api/rag_api.py

Comment thread .workflow/code-scan/reports/final-code-scan-report.md Outdated

docs(code-scan): make audit docs reusable

27b65df

- remove run-specific checklist state from scan plan - document scratch ledger as optional local state - avoid agent-specific wording in reusable docs

imbajin requested a review from Copilot June 3, 2026 06:47

Copilot started reviewing on behalf of imbajin June 3, 2026 06:47 View session

Copilot AI reviewed Jun 3, 2026

View reviewed changes

Comment thread hugegraph-llm/src/hugegraph_llm/api/rag_api.py

Comment thread hugegraph-python-client/src/pyhugegraph/utils/util.py Outdated

Comment thread hugegraph-llm/src/hugegraph_llm/operators/llm_op/property_graph_extract.py

imbajin commented Jun 3, 2026

View reviewed changes

imbajin added 2 commits June 3, 2026 15:43

docs(api): mark graph config isolation follow-up

58cd71a

- replace transitional TODO with explicit FIXME - document why global locking is not acceptable - point follow-up toward request-scoped graph config

imbajin requested review from VGalaxies and zyxxoo June 3, 2026 09:41

Pengzna approved these changes Jun 3, 2026

View reviewed changes

imbajin merged commit 4f708ea into main Jun 3, 2026
18 checks passed

imbajin deleted the goal-scan branch June 3, 2026 11:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: fix code scan findings in core modules#358

refactor: fix code scan findings in core modules#358
imbajin merged 9 commits into
mainfrom
goal-scan

imbajin commented Jun 3, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

imbajin left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

imbajin commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

imbajin commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Short Summary

Full Summary by CR

Bug Fixes

New Features

Documentation

Tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

imbajin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

imbajin commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

imbajin commented Jun 3, 2026 •

edited

Loading