feat: Add queries to pre-compute linkedContainedInPlace, linkedMemberOf, and linkedMember triples#2015
feat: Add queries to pre-compute linkedContainedInPlace, linkedMemberOf, and linkedMember triples#2015SandeepTuniki wants to merge 8 commits into
linkedContainedInPlace, linkedMemberOf, and linkedMember triples#2015Conversation
…rOf, and linkedMember recursive queries
There was a problem hiding this comment.
Code Review
This pull request refactors the aggregation logic by introducing the BigQueryExecutor and GraphAggregator classes to handle query execution and global graph aggregations, respectively. The AggregationUtils class was updated to orchestrate these new components. Feedback was provided to avoid hardcoding environment-specific values like the Spanner destination URI, recommending instead that the code should rely on environment variables and fail explicitly if they are missing to ensure configuration correctness.
| destination_uri = os.environ.get( | ||
| 'SPANNER_DESTINATION_URI', | ||
| "https://spanner.googleapis.com/projects/datcom-store/instances/dc-kg-test/databases/dc_graph_2026_01_27" | ||
| ) |
There was a problem hiding this comment.
Avoid hardcoding environment-specific values like the Spanner database URI. This makes the code brittle and difficult to maintain as environments change. It is better to rely on environment variables. In line with repository practices, allow the code to fail with a KeyError if the configuration is missing to enforce correctness rather than using defensive checks.
destination_uri = os.environ['SPANNER_DESTINATION_URI']References
- Avoid hardcoding environment-specific values or paths to prevent brittle code; use more robust configuration methods.
- When accessing configuration, it is acceptable to allow the code to fail on a missing key (e.g., KeyError) to enforce configuration correctness rather than using defensive defaults.
…AL_QUERY. - Updated aggregation logic to pull data from Spanner using Connection ID. - Switched to per-query temporary tables to avoid shared state. - Implemented idempotency using FilteredEdges CTE with WHERE NOT EXISTS subquery. - Added environment variable support for Spanner instance/database configuration.
- Added get_spanner_destination_uri method to BigQueryExecutor to derive URI from metadata. - Removed redundant destination_uri parameters from GraphAggregator and AggregationUtils. - Updated all aggregation queries to retrieve the destination URI directly from the executor.
Explicitly scoping the BigQuery client to the provided project_id for better resource management.
- Added run_all() method to GraphAggregator to manage the sequence of global aggregations. - Simplified AggregationUtils.run_aggregation() by delegating orchestration to GraphAggregator. - Improved code structure and separation of concerns.
No description provided.