Customize Headers in network search remote requests#1066
Conversation
Co-authored-by: Ciaran Schutte <ciaranschutte@oicr.on.ca> Co-authored-by: Anders Richardsson <2107110+justincorrigible@users.noreply.github.com>
* Use SearchQueryResponse * createElasticClient * Types file updated * createOpenSearch updates * Use overture-stack/types import * SharedIndicesCreateParams type change * Remove yarn.lock * Update parameter types * Update Response Types * Minor edits * Search Params & Body updates, removed instance of Prettify, Update Import names * Revise Options & Config types * Minimal Search Options type * Consistent Naming
* Rename network config as nodes, remove config property to enable network search * search-server remove env flag for network search * Types clean up and improved nodes data reporting in network query - nodes hits now return if no aggregations are requested - nodes that failed to connect on startup are included in the nodes data query - overhauled the Result type to allow different data to be returned for each result case - created a seaprate config type for local vs remote nodes, not yet used * WIP: Enhanced types for graphql-router and network search on local node - Next Step is to define local node configs and build the local query from those - Converted all graphql-router/schema files to typescript - Added Context as a generic type for many of the Resolver types and related functions. This is used to pass information to the resolvers and custom resovler functions. At the moment this is only the server side filter function, but passing custom context to this is important. * Working network search within a catalog * Move shared types into dedicated files * Update config schema to match new config format * validateConfigs will list all missing properties from the config
| The `SearchClient` abstraction already exists as the right boundary — the migration should align the types and default configuration to OpenSearch while preserving compatibility for ES users. | ||
|
|
||
| **What's already done:** The integration test suite (`integration-tests/server`) already supports both engines via a `SEARCH_ENGINE` env var (defaults to `'elasticsearch'`). `buildSearchClient` accepts a `client` type parameter mapped to `SupportedSearchClients`. The architecture is ready; the missing pieces are the OpenSearch client dependency and a running OpenSearch instance in CI. | ||
| **What's already done:** The integration test suite (`integration-tests/server`) already supports both engines via a `SEARCH_ENGINE` env var (defaults to `'elasticsearch'`). `buildSearchClient` accepts a `client` type parameter mapped to `SupportedClientTypes`. The architecture is ready; the missing pieces are the OpenSearch client dependency and a running OpenSearch instance in CI. |
There was a problem hiding this comment.
There is no type named SupportedSearchClients, I believe this is leftover before a name change. I have replaced all instances if this with SupportedClientTypes and this SHOULD work as intended. If needed, suggest the desired name and we can make this consistent before merging (or resolve in a separate PR).
There was a problem hiding this comment.
I strongly prefer the more descriptive name I had applied (this needs to be read as a configuration option, not a code compliance thing), but we can redo that change once this is merged (along with a couple other name changes decided elsewhere), so your PR's scope doesn't grow. 👍
| * - Ensures every entry in `table.defaultSorting` has `desc` explicitly set, | ||
| * defaulting to `false` when the field is absent. | ||
| */ | ||
| const normalize = (fileDataJSON: any) => { |
There was a problem hiding this comment.
I don't think this is the best function name, but I'm trying to refactor only lightly as I add features.
| export type ExternalNetworkConfig = Partial<{ | ||
| [serverNetworkConfigExtendedProperties.PASSTHROUGH_HEADERS]: string[]; | ||
| }> & | ||
| NetworkConfig<ArrangerBaseContext>; |
There was a problem hiding this comment.
Config file types and graphql-router types are diverging, which is to be expected. It would be worthwhile doing a dedicated task on separating the types for these two contexts, and to provide validation schemas (zod) for the config files.
|
|
||
| // Network Search Nodes | ||
| "network": { | ||
| "passthroughHeaders": ["Authorization"], |
There was a problem hiding this comment.
I see what you're doing here, and could use some documentation (even if only as a comment in the line above)
There was a problem hiding this comment.
Yeup, I owe better (read: some) documentation for this feature
| @@ -1,3 +1,13 @@ | |||
| { | |||
| "network": [] | |||
| "network": { | |||
| /** | ||
| * Normalizes raw config file JSON for use by the search server. | ||
| * This will assume the network config provided has the expected format, it is not fully validated. | ||
| * TODO: Fully validate config file structure |
There was a problem hiding this comment.
nice! added an item to this effect in the roadmap, to do it both her and in the core module, with proper error feedback, and to add the corresponding tests
| "test": "NODE_OPTIONS=\"$NODE_OPTIONS --experimental-vm-modules\" jest --config=jest.config.ts" | ||
| }, | ||
| "type": "module" | ||
| "test": "jest --config=jest.config.ts" |
There was a problem hiding this comment.
did you test this? I think it breaks compatibility on some imports, hence why I added it. seems beyond the scope of this PR tbh
There was a problem hiding this comment.
Another merge conflict issue, will restore the script from main.
| [configOptionalProperties.GRAPHQL_MAX_ALIASES]: number; | ||
| [configOptionalProperties.GRAPHQL_MAX_DEPTH]: number; |
There was a problem hiding this comment.
please readd these. looks like merge conflict mistake
There was a problem hiding this comment.
may I ask to confirm the related logic wasn't also removed?
| const headers = new Headers(); | ||
| for (const [key, value] of Object.entries(req.headers)) { | ||
| if (value !== undefined) { | ||
| const valueAsString = Array.isArray(value) ? value.join(', ') : value; | ||
| headers.set(key, valueAsString); | ||
| } | ||
| } | ||
| const request: RequestContextProps = { headers }; | ||
|
|
||
| // Add to context based on external parameters |
There was a problem hiding this comment.
while I of course may be misreading the passthrough logic overall, it looks like this applies the same headers to every remote node in the network. i.e. there's no per-node control.
for instance If remote nodes have different trust levels, e.g. one is internal, another is a third-party node, forwarding Authorization to all of them may leak credentials.
... either the docs/config schema should explicitly warn that this affects all nodes equally, or better yet, this should be an opt-in kind of deal so we don't go against OWASP A01: Broken Access Control recommendations.
| headers: { | ||
| 'Content-Type': 'application/json', | ||
| 'X-REQUEST-TYPE': 'GraphQL', | ||
| ...(customRequestProps?.headers || {}), |
There was a problem hiding this comment.
hm yeah, here it is... the passthrough thing is non-discriminate and could leak credentials. thinking the configuration should state explicitly which nodes to send which headers to, somehow (granted, that sounds complex)
| const isFieldAggregationSupported = (fieldObject: AggregationField): fieldObject is SupportedAggregationField => { | ||
| export const isFieldAggregationSupported = ( | ||
| fieldObject: AggregationField, | ||
| ): fieldObject is SupportedAggregationField => { |
There was a problem hiding this comment.
i really need to wrap my head around this "is" thing. I forget it exists and can be so useful
There was a problem hiding this comment.
I actually strongly discourage using type predicates now. TS will infer the predicate logic if the function returns boolean and narrows the type of the input variable. If TS can't calculate that the type is actually narrowed correctly then manually declaring the type predicate is about as risky as using any type assertion.
| if (response.success) { | ||
| const documentName = config.documentType; | ||
| const responseData = response.data[documentName]; | ||
| // TODO: Response content is not validated, we expect the return structure based on the GraphQL query we requested |
There was a problem hiding this comment.
this sounds easier than it may actually be, from a model-agnostic perspective, considering an aggregator may not be fully aware of the nodes' model and harmonization (or at least some mapping) may be necessary
| [facetsProperties.AGGS]: [], | ||
| }, | ||
| [configRootProperties.MATCHBOX]: [], | ||
| [configRootProperties.NETWORK_AGGREGATION]: { [configArrangerNetworkProperties.REMOTE_NODES]: [] }, |
There was a problem hiding this comment.
surprised at first, but i guess it makes sense in the decoupling of "core" from "aggregator" types 👍
There was a problem hiding this comment.
This is because I updated the config type to make the remoteNodes property optional so there is no need to set an empty list by default.
justincorrigible
left a comment
There was a problem hiding this comment.
Great work overall (as usual)! 👏
two items of "concern" I'd like to surface here:
- the OWASP a01 thing I pointed out, passing headers to all nodes.
- the possible regression on GraphQL security functionality (thanks for putting up with the Git shenanigans)
suggestion from @justincorrigible Co-authored-by: Anders Richardsson <2107110+justincorrigible@users.noreply.github.com>
suggestion from @justincorrigible Co-authored-by: Anders Richardsson <2107110+justincorrigible@users.noreply.github.com>
Summary
Introduces a mechanism to customize the network requests made to remote nodes during network search. With this initial setup, we have added the ability to add custom headers to outgoing requests to each remote node sent during network aggregation searches.
When the arranger
graphql-routeris created, an optional function can be provided as part of the network config. This CustomizeRemoteRequestFn will receive the Arranger context (including incoming request headers), as well as the config for a remote node. This function can then determine whether to inject headers into the outgoing request.The
search-serverhas used this to enable a new network configuration:passthroughHeaders. If the network configuration file includes apassthroughHeadersproperty (string[]), then headers of the given name will be taken from the incoming search query and added to the outgoing request to every remote node.This functionality is requested to enable transporting Authorization credentials from the requesting user through to the other nodes in the network which have implemented a custom authorization scheme in front of the arranger servers.
Description of Changes
modules/graphql-router
customizeRemoteRequestoptional config property.apps/search-server
networkconfig now accepts apassthroughHeadersproperty with an array of string valuespassthroughHeadersvalue will be copied from the incoming request into each of the outgoing requests to a remote nodeReadiness Checklist
.env.schemafile and documented in the README