Skip to content

refactor: fix code scan findings in core modules#358

Merged
imbajin merged 9 commits into
mainfrom
goal-scan
Jun 3, 2026
Merged

refactor: fix code scan findings in core modules#358
imbajin merged 9 commits into
mainfrom
goal-scan

Conversation

@imbajin

@imbajin imbajin commented Jun 3, 2026

Copy link
Copy Markdown
Member

Follow #357

Short Summary

  • fix CS-001 admin log path traversal and insecure default token access
  • fix client error contracts for sensitive logging, malformed successful responses, and Gremlin exception propagation
  • fix graph import typed-value preservation and request-scoped config/provider validation
  • replace fake integration tests with production flow/operator smoke coverage

Full Summary by CR

Bug Fixes

  • Enhance logging and error handling: Prevent path traversal, improve exception observability, and introduce more precise error types.
  • Mask request/response logs to reduce the risk of sensitive information leakage.
  • Prevent global configurations from leaking or interfering with each other during concurrent requests.

New Features

  • Restrict and validate supported LLM and reranker provider types.
  • Strengthen input validation and controlled log access for management interfaces.

Documentation

  • Add a complete code scanning plan, checkpoints, and final report documentation.

Tests

  • Expand and adjust unit/integration tests to cover security, configuration isolation, and exception paths.

imbajin added 5 commits June 2, 2026 21:08
- rebuild goal-scan from the latest main branch

- retain code-scan audit docs and resolved issue ledger

- preserve response error contracts and redaction behavior

- keep production smoke tests on namespaced fixtures
- make RAG request client_config scoped and restored after use

- avoid eager request redaction when debug logging is disabled

- preserve explicit response validation errors and simplify HTTPError flow

- clean admin error control flow and cross-platform path assertion

- expose empty Gremlin payloads instead of silently returning None
- reject malformed Gremlin response payloads explicitly

- avoid leaking raw KeyError from ResponseData construction

- keep None response handling as the existing no-result path

- cover empty payload behavior with a focused regression test
- remove request-wide graph config lock from RAG API

- roll back provider type settings when config apply fails

- let graph import apply typed defaults for missing properties

- clarify scan report wording for post-scan fixes
- add TODO for replacing temporary huge_settings mutation

- document the request-scoped flow/operator context direction

- keep behavior unchanged for the current PR
Copilot AI review requested due to automatic review settings June 3, 2026 05:01
@dosubot dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Jun 3, 2026
@dosubot dosubot Bot added bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request labels Jun 3, 2026
@imbajin imbajin changed the title refactor: enhance the tests frame & quality in core modules refactor: fix code scan findings in core modules Jun 3, 2026
- stop tracking .workflow/code-scan execution records
- ignore local code-scan scratch artifacts
- keep durable code-scan docs under docs/

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR strengthens core reliability/security contracts across hugegraph-python-client and hugegraph-llm (response parsing, error typing, sensitive-log redaction, admin log path hardening), while also restructuring/expanding tests and adding a restartable code-scan ledger/reporting set under .workflow/code-scan/.

Changes:

  • Hardened security boundaries (admin /logs path validation; redaction of password/token-like fields in client debug/error logs).
  • Tightened client error/response contracts (typed ServerError / ResponseParseError, stricter Gremlin response payload validation).
  • Replaced/augmented integration & contract tests to exercise production flows/operators and newly defined contracts; added code-scan plan/spec/report artifacts.

Reviewed changes

Copilot reviewed 42 out of 43 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
hugegraph-python-client/src/tests/api/test_schema.py Adds targeted FIXME markers for missing schema-builder contract coverage.
hugegraph-python-client/src/tests/api/test_response_validation.py Adds contract tests for parse errors, status typing, and log redaction behavior.
hugegraph-python-client/src/tests/api/test_gremlin.py Updates expected Gremlin error typing and adds contract tests for exception preservation and payload validation.
hugegraph-python-client/src/tests/api/test_graph.py Adds FIXME markers and highlights current integration test state isolation gaps.
hugegraph-python-client/src/tests/api/test_auth_routing.py Adds contract tests for debug redaction in HGraphSession.request() (plus FIXME about real resolver coverage).
hugegraph-python-client/src/pyhugegraph/utils/util.py Implements redact_sensitive_data(), improves response parsing/typing, and raises ResponseParseError for malformed successes.
hugegraph-python-client/src/pyhugegraph/utils/huge_requests.py Redacts sensitive kwargs in debug logs and avoids eager f-string logging.
hugegraph-python-client/src/pyhugegraph/utils/exceptions.py Adds ResponseParseError and ServerError exception types.
hugegraph-python-client/src/pyhugegraph/api/gremlin.py Stops blanket re-wrapping; validates response payload shape and preserves upstream exception types.
hugegraph-llm/src/tests/operators/llm_op/test_property_graph_extract.py Updates expectations to preserve typed values and filter unknown properties.
hugegraph-llm/src/tests/operators/index_op/test_vector_index_query.py Adds FIXME for flow→operator parameter pass-through contract coverage.
hugegraph-llm/src/tests/operators/hugegraph_op/test_commit_to_hugegraph.py Adds FIXME markers for missing real failure-branch coverage.
hugegraph-llm/src/tests/operators/hugegraph_op/test_commit_to_hugegraph_load_into_graph.py Adds an extraction→commit round-trip typed-value preservation contract test and additional FIXME markers.
hugegraph-llm/src/tests/models/llms/test_openai_client.py Adds FIXME markers for provider failure-propagation contract gaps.
hugegraph-llm/src/tests/integration/test_rag_pipeline.py Replaces test-local stand-ins with production operators and deterministic fakes for smoke coverage.
hugegraph-llm/src/tests/integration/test_kg_construction.py Replaces mock KG pipeline with production flow/operator smoke tests using deterministic fake LLM.
hugegraph-llm/src/tests/integration/test_graph_rag_pipeline.py Replaces test-local pipeline scaffolding with production flow build smoke tests.
hugegraph-llm/src/tests/indices/test_faiss_vector_index.py Adds FIXME markers for cross-backend threshold semantics/contract coverage.
hugegraph-llm/src/tests/config/test_prompt_config.py Adds FIXME markers for runtime YAML prompt contract coverage.
hugegraph-llm/src/tests/config/test_config.py Adds FIXME markers for .env sync typing and side-effect contract coverage.
hugegraph-llm/src/tests/api/test_rag_api.py Adds contract tests for config rollback, supported-provider validation, and request-scoped graph overrides.
hugegraph-llm/src/tests/api/test_admin_api.py Adds contract tests for /logs traversal rejection, insecure token rejection, and valid streaming.
hugegraph-llm/src/hugegraph_llm/operators/llm_op/property_graph_extract.py Preserves typed property values and filters to schema-defined fields.
hugegraph-llm/src/hugegraph_llm/api/rag_api.py Adds request-scoped config snapshot/restore, provider validation/rollback, and request handling refactors.
hugegraph-llm/src/hugegraph_llm/api/models/rag_requests.py Constrains provider types via Literal to reject unsupported providers at validation time.
hugegraph-llm/src/hugegraph_llm/api/admin_api.py Hardens /logs via secure token checks and strict path validation.
docs/specs/2026-05-31-hugegraph-ai-code-scan-design.md Adds code-scan design spec and invariants/priority model.
docs/plans/2026-05-31-hugegraph-ai-code-scan.md Adds code-scan execution plan and deliverables/ledger structure.
.workflow/code-scan/README.md Introduces scan ledger rules and pointers to spec/plan.
.workflow/code-scan/code-scan-state.json Captures scan progress/state, touched files, and reported commands/results.
.workflow/code-scan/checkpoints/00-setup.md Records setup checkpoint for scan process.
.workflow/code-scan/checkpoints/01-client-transport.md Records client transport lane checkpoint results.
.workflow/code-scan/checkpoints/02-client-api-structure.md Records client API/structure lane checkpoint results.
.workflow/code-scan/checkpoints/03-llm-api-config.md Records LLM API/config lane checkpoint results.
.workflow/code-scan/checkpoints/04-llm-flow-operator.md Records LLM flow/operator lane checkpoint results.
.workflow/code-scan/checkpoints/05-llm-index-model.md Records LLM index/model lane checkpoint results.
.workflow/code-scan/checkpoints/06-test-quality.md Records test-quality lane checkpoint results.
.workflow/code-scan/checkpoints/07-synthesis.md Records cross-module synthesis checkpoint results.
.workflow/code-scan/reports/issues.md Adds prioritized issue ledger with evidence and statuses.
.workflow/code-scan/reports/module-map.md Adds a module/layer scan coverage map.
.workflow/code-scan/reports/test-quality-ledger.md Adds a test-quality ledger summarizing gaps, FIXMEs, and coverage notes.
.workflow/code-scan/reports/final-code-scan-report.md Adds final scan report summary and follow-up fix progress section.
.gitignore Ignores local PR review scratch directories under .workflow/.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread hugegraph-llm/src/hugegraph_llm/api/rag_api.py
Comment thread .workflow/code-scan/reports/final-code-scan-report.md Outdated
- remove run-specific checklist state from scan plan
- document scratch ledger as optional local state
- avoid agent-specific wording in reusable docs

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 29 changed files in this pull request and generated 3 comments.

Comment thread hugegraph-llm/src/hugegraph_llm/api/rag_api.py
Comment thread hugegraph-python-client/src/pyhugegraph/utils/util.py Outdated

@imbajin imbajin left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a few remaining correctness/security gaps around failed config rollback, Admin UI log access, Gremlin contracts, and flow-test coverage. I also +1'd the existing Copilot comment for the request-scoped graph config race instead of duplicating it.

Comment thread hugegraph-llm/src/hugegraph_llm/api/rag_api.py Outdated
Comment thread hugegraph-llm/src/hugegraph_llm/api/admin_api.py
Comment thread hugegraph-python-client/src/tests/api/test_gremlin.py
Comment thread hugegraph-python-client/src/pyhugegraph/api/gremlin.py Outdated
Comment thread hugegraph-llm/src/tests/integration/test_graph_rag_pipeline.py
@imbajin

imbajin commented Jun 3, 2026

Copy link
Copy Markdown
Member Author

Follow-up note on request_graph_config() concurrency:

The current snapshot/restore is only a transitional guard. It reduces permanent global config leakage, but it does not provide true concurrent request isolation because /rag, /rag/graph, and /text2gremlin still temporarily mutate process-global huge_settings.

I do not think we should fix this by adding a global lock in this PR. That critical section covers potentially long-running RAG/Text2Gremlin execution, including graph queries, vector retrieval, and LLM calls; locking it would effectively serialize concurrent requests.

Suggested scope for this PR:

  • keep the snapshot/restore improvement;
  • strengthen the comment to an explicit FIXME;
  • leave the real fix to a dedicated follow-up PR.

Expected follow-up direction: pass graph config through request-scoped flow/operator context and create per-request HugeGraph clients instead of reading/writing huge_settings globally.

imbajin added 2 commits June 3, 2026 15:43
- replace transitional TODO with explicit FIXME
- document why global locking is not acceptable
- point follow-up toward request-scoped graph config
- redact sensitive response bodies in client error logs
- normalize Gremlin malformed success payload errors
- guard Gradio admin log access with secure token checks
- roll back concrete provider config fields on failures
- cover edge property filtering and RAG flow smoke contracts
@imbajin imbajin requested review from VGalaxies and zyxxoo June 3, 2026 09:41
@imbajin imbajin merged commit 4f708ea into main Jun 3, 2026
18 checks passed
@imbajin imbajin deleted the goal-scan branch June 3, 2026 11:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request llm python-client size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants