BE-597: Array-based type filtering in the SelectCompiler#8866
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
PR SummaryMedium Risk Overview
Reviewed by Cursor Bugbot for commit d6638eb. Bugbot is set up for automated code reviews on this repo. Configure here. |
There was a problem hiding this comment.
Pull request overview
This PR optimizes entity type filtering in the Postgres SelectCompiler by compiling type predicates into GIN-indexable array operations over entity_edition_cache (instead of generating one type-join chain per predicate), reducing join multiplication and planner misestimates.
Changes:
- Introduces an
ArrayContains(@>) binary operator and compiles eligible type/base-url filters into@>/&&predicates (with bundling insideAll/Anygroups). - Updates query-path resolution so
EntityTypeEdge { IsOfType, BaseUrl|VersionedUrl, inheritance_depth: None }maps directly toentity_edition_cachetype arrays; removes redundantTypeBaseUrls/TypeVersionedUrlspaths. - Adds/updates snapshot & integration coverage to pin the new compilation output and alias-separation behavior.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tests/graph/integration/postgres/email_filter_protection.rs | Updates protection-filter paths to the canonical EntityTypeEdge form (depth None) used by array-backed compilation. |
| libs/@local/graph/store/src/filter/protection.rs | Switches protection config/type-path usage to EntityTypeEdge and removes references to deleted Type*Urls paths. |
| libs/@local/graph/store/src/filter/mod.rs | Rewrites for_entity_by_type_id to a single versionedUrl equality filter suited to array-backed compilation. |
| libs/@local/graph/store/src/entity/query.rs | Removes TypeBaseUrls/TypeVersionedUrls query paths and updates docs for DirectTypeCount semantics with EntityTypeEdge. |
| libs/@local/graph/postgres-store/src/store/postgres/query/statement/select.rs | Adds snapshot tests covering single predicates, All/Any bundling, negated bundling, and alias-separation regression cases. |
| libs/@local/graph/postgres-store/src/store/postgres/query/expression/conditional.rs | Adds Expression::array_contains helper constructing BinaryOperator::ArrayContains. |
| libs/@local/graph/postgres-store/src/store/postgres/query/expression/binary.rs | Adds BinaryOperator::ArrayContains and transpilation support (@>). |
| libs/@local/graph/postgres-store/src/store/postgres/query/entity.rs | Maps EntityTypeEdge (IsOfType + BaseUrl/VersionedUrl + no depth) to entity_edition_cache relations/columns. |
| libs/@local/graph/postgres-store/src/store/postgres/query/compile.rs | Implements text-array predicate compilation and All/Any bundling; rejects string ops on array-backed paths at compile time. |
| libs/@local/graph/postgres-store/src/store/postgres/knowledge/entity/summary.rs | Updates summary selection to use EntityTypeEdge(...VersionedUrl, depth None) for type IDs/titles plumbing. |
| libs/@local/graph/postgres-store/src/store/postgres/knowledge/entity/query.rs | Updates entity record selection to use the new canonical type path (EntityTypeEdge to cached versioned_urls). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Benchmark results
|
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 2002 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 1001 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 3314 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 1526 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 2078 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 1033 | Flame Graph |
policy_resolution_medium
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 102 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 51 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 269 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 107 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 133 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 63 | Flame Graph |
policy_resolution_none
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 2 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 8 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 3 | Flame Graph |
policy_resolution_small
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 52 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 25 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 94 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 26 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 66 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 29 | Flame Graph |
read_scaling_complete
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id;one_depth | 1 entities | Flame Graph | |
| entity_by_id;one_depth | 10 entities | Flame Graph | |
| entity_by_id;one_depth | 25 entities | Flame Graph | |
| entity_by_id;one_depth | 5 entities | Flame Graph | |
| entity_by_id;one_depth | 50 entities | Flame Graph | |
| entity_by_id;two_depth | 1 entities | Flame Graph | |
| entity_by_id;two_depth | 10 entities | Flame Graph | |
| entity_by_id;two_depth | 25 entities | Flame Graph | |
| entity_by_id;two_depth | 5 entities | Flame Graph | |
| entity_by_id;two_depth | 50 entities | Flame Graph | |
| entity_by_id;zero_depth | 1 entities | Flame Graph | |
| entity_by_id;zero_depth | 10 entities | Flame Graph | |
| entity_by_id;zero_depth | 25 entities | Flame Graph | |
| entity_by_id;zero_depth | 5 entities | Flame Graph | |
| entity_by_id;zero_depth | 50 entities | Flame Graph |
read_scaling_linkless
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id | 1 entities | Flame Graph | |
| entity_by_id | 10 entities | Flame Graph | |
| entity_by_id | 100 entities | Flame Graph | |
| entity_by_id | 1000 entities | Flame Graph | |
| entity_by_id | 10000 entities | Flame Graph |
representative_read_entity
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/block/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/book/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/building/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/organization/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/page/v/2
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/person/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/playlist/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/song/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/uk-address/v/1
|
Flame Graph |
representative_read_entity_type
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| get_entity_type_by_id | Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba
|
Flame Graph |
representative_read_multiple_entities
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_property | traversal_paths=0 | 0 | |
| entity_by_property | traversal_paths=255 | 1,resolve_depths=inherit:1;values:255;properties:255;links:127;link_dests:126;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:0;link_dests:0;type:false | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:1;link_dests:0;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:2;links:1;link_dests:0;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:2;properties:2;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=0 | 0 | |
| link_by_source_by_property | traversal_paths=255 | 1,resolve_depths=inherit:1;values:255;properties:255;links:127;link_dests:126;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:0;link_dests:0;type:false | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:2;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:2;properties:2;links:1;link_dests:0;type:true |
scenarios
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| full_test | query-limited | Flame Graph | |
| full_test | query-unlimited | Flame Graph | |
| linked_queries | query-limited | Flame Graph | |
| linked_queries | query-unlimited | Flame Graph |
🌟 What is the purpose of this PR?
Type filters (
type.versionedUrl/type.baseUrlwithout an inheritance depth) compiled to oneentity_is_of_type → ontology_temporal_metadata → ontology_idsjoin chain per predicate. On the trace that started this investigation (7 type exclusions), that meant 7 multiplying join groups, a planner misestimate, and a row-explosion the query had toDISTINCTaway again.This PR compiles those filters to array predicates (
@>/&&) on theentity_edition_cachetype arrays (from #8854) instead. Equality filters in the sameAll/Anygroup bundle into a single predicate over one parameter array, which the GIN index on the cache columns can serve.🔗 Related links
🚫 Blocked by
🔍 What does this change?
BinaryOperator::ArrayContains(@>) in the expression AST;&&(overlap) already existed.query/entity.rs):EntityTypeEdge { IsOfType, BaseUrl | VersionedUrl, inheritance_depth: None }resolves directly to theentity_edition_cachearray columns. Explicit inheritance depths keep the join chain (the cache arrays are all-depth).EntityQueryPath::TypeBaseUrls/TypeVersionedUrlsvariants are deleted; the edge form is the single canonical path (they were documented as not queryable from the API, so no public surface changes).query/compile.rs), type-driven via the column's declaredParameterTyperather than hardcoded paths:Equal/NotEqual/In(parameter, path)on text-array columns compile to containment predicates.All/Anygroups bundle same-column equality filters:All+equals →@>,Any+equals →&&,All+notEquals →NOT(&&),Any+notEquals →NOT(@>)— each an exact logical equivalent of the unbundled forms.StartsWith/EndsWith/ContainsSegmenton array-backed paths fail at compile time (UnsupportedTextArrayOperation) instead of producing a Postgres type error.Filter::for_entity_by_type_idbuilds a singleversionedUrlequality instead ofAll[baseUrl, version](the old shape was only correct through shared join aliases). AllIsOfType/IsOfBaseTypepolicy filters now go through the array path.In(param, type base URLs)) compile tobase_urls @> ARRAY[$n]instead of$n = ANY(base_urls)— semantically equivalent, but GIN-indexable.Pre-Merge Checklist 🚀
🚢 Has this modified a publishable library?
This PR:
📜 Does this require a change to the docs?
The changes in this PR:
🕸️ Does this require a change to the Turbo Graph?
The changes in this PR:
notEqualon type paths now uses set semantics — "the entity has no such type" — instead of the old per-join-row semantics, under which a multi-typed entity matchednotEqual(A)as long as it had any other type. The old behaviour made type exclusions effectively useless for multi-typed entities (an entity with types{notification, X}stayed visible despite anotEqual(notification)filter); the new behaviour is what the filters are meant to express. The same set semantics applies to separatebaseUrl/versionequality filters within oneallblock.leftEntity.type.versionedUrletc.) intentionally keep the join chain for the link traversal; only the terminal type lookup uses the cache.🐾 Next steps
notEqualexclusion semantics on a multi-typed entity.All/Any, sibling dedup) to widen bundling across artificial nesting and improve planner estimates — separate ticket.🛡 What tests cover this?
statement/select.rs: single equality (constructor + raw filter forms),Any/Allbundling, bundledNOT(&&)exclusion, base-URL policy form, and the own-type vs. linked-type alias-separation regression test.= ANY→@>).email_filter_protection.rsintegration tests cover the rewritten protection-filter path in CI.❓ How to test this?
queryEntitiesrequest with type filters (equal/notEqualon["type", "versionedUrl"], with and without anany/allgroup).entity_edition_cachearray predicates instead ofentity_is_of_typejoins, and results match the base branch — except multi-typed entities undernotEqual, which are now correctly excluded.