Skip to content

feat: Add queries to pre-compute linkedContainedInPlace, linkedMemberOf, and linkedMember triples#2015

Draft
SandeepTuniki wants to merge 8 commits into
masterfrom
linked-containment-membership-aggregations
Draft

feat: Add queries to pre-compute linkedContainedInPlace, linkedMemberOf, and linkedMember triples#2015
SandeepTuniki wants to merge 8 commits into
masterfrom
linked-containment-membership-aggregations

Conversation

@SandeepTuniki
Copy link
Copy Markdown
Contributor

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the aggregation logic by introducing the BigQueryExecutor and GraphAggregator classes to handle query execution and global graph aggregations, respectively. The AggregationUtils class was updated to orchestrate these new components. Feedback was provided to avoid hardcoding environment-specific values like the Spanner destination URI, recommending instead that the code should rely on environment variables and fail explicitly if they are missing to ensure configuration correctness.

Comment on lines +215 to +218
destination_uri = os.environ.get(
'SPANNER_DESTINATION_URI',
"https://spanner.googleapis.com/projects/datcom-store/instances/dc-kg-test/databases/dc_graph_2026_01_27"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Avoid hardcoding environment-specific values like the Spanner database URI. This makes the code brittle and difficult to maintain as environments change. It is better to rely on environment variables. In line with repository practices, allow the code to fail with a KeyError if the configuration is missing to enforce correctness rather than using defensive checks.

        destination_uri = os.environ['SPANNER_DESTINATION_URI']
References
  1. Avoid hardcoding environment-specific values or paths to prevent brittle code; use more robust configuration methods.
  2. When accessing configuration, it is acceptable to allow the code to fail on a missing key (e.g., KeyError) to enforce configuration correctness rather than using defensive defaults.

…AL_QUERY.

- Updated aggregation logic to pull data from Spanner using Connection ID.
- Switched to per-query temporary tables to avoid shared state.
- Implemented idempotency using FilteredEdges CTE with WHERE NOT EXISTS subquery.
- Added environment variable support for Spanner instance/database configuration.
- Added get_spanner_destination_uri method to BigQueryExecutor to derive URI from metadata.
- Removed redundant destination_uri parameters from GraphAggregator and AggregationUtils.
- Updated all aggregation queries to retrieve the destination URI directly from the executor.
Explicitly scoping the BigQuery client to the provided project_id for better resource management.
- Added run_all() method to GraphAggregator to manage the sequence of global aggregations.
- Simplified AggregationUtils.run_aggregation() by delegating orchestration to GraphAggregator.
- Improved code structure and separation of concerns.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant