Skip to content

Customize Headers in network search remote requests#1066

Open
joneubank wants to merge 28 commits into
mainfrom
feat/custom-remote-node-requests
Open

Customize Headers in network search remote requests#1066
joneubank wants to merge 28 commits into
mainfrom
feat/custom-remote-node-requests

Conversation

@joneubank
Copy link
Copy Markdown
Contributor

Summary

Introduces a mechanism to customize the network requests made to remote nodes during network search. With this initial setup, we have added the ability to add custom headers to outgoing requests to each remote node sent during network aggregation searches.

When the arranger graphql-router is created, an optional function can be provided as part of the network config. This CustomizeRemoteRequestFn will receive the Arranger context (including incoming request headers), as well as the config for a remote node. This function can then determine whether to inject headers into the outgoing request.

The search-server has used this to enable a new network configuration: passthroughHeaders. If the network configuration file includes a passthroughHeaders property (string[]), then headers of the given name will be taken from the incoming search query and added to the outgoing request to every remote node.

This functionality is requested to enable transporting Authorization credentials from the requesting user through to the other nodes in the network which have implemented a custom authorization scheme in front of the arranger servers.

Description of Changes

modules/graphql-router

  • Adds to the ArrangerBaseContext information about the incoming request
    • headers are made available with this change. Future additions could expand this.
  • Added to graphqlRouter the customizeRemoteRequest optional config property.
    • This takes a function that can add properties to each outgoing request to a remote node in a network query.
    • The output of this function can supply headers to add to the outgoing request
    • This function is called once for each remote node, so that custom logic can be provided for modifying each node's request.

apps/search-server

  • network config now accepts a passthroughHeaders property with an array of string values
    • headers with a name matching a passthroughHeaders value will be copied from the incoming request into each of the outgoing requests to a remote node
    • Updated the example config files to demonstrate the usage of this new property

Readiness Checklist

  • Self Review
    • I have performed a self review of code
    • I have run the application locally and manually tested the feature
    • I have checked all updates to correct typos and misspellings
  • Formatting
    • Code follows the project style guide
    • Autmated code formatters (ie. Prettier) have been run
  • Local Testing
    • Successfully built all packages locally
    • Successfully ran all test suites, all unit and integration tests pass
  • Updated Tests
    • Unit and integration tests have been added that describe the bug that was fixed or the features that were added
  • Documentation
    • All new environment variables added to .env.schema file and documented in the README
    • All changes to server HTTP endpoints have open-api documentation
    • All new functions exported from their module have TSDoc comment documentation

justincorrigible and others added 21 commits March 5, 2026 15:04
Co-authored-by: Ciaran Schutte <ciaranschutte@oicr.on.ca>
Co-authored-by: Anders Richardsson <2107110+justincorrigible@users.noreply.github.com>
* Use SearchQueryResponse

* createElasticClient

* Types file updated

* createOpenSearch updates

* Use overture-stack/types import

* SharedIndicesCreateParams type change

* Remove yarn.lock

* Update parameter types

* Update Response Types

* Minor edits

* Search Params & Body updates, removed instance of Prettify, Update Import names

* Revise Options & Config types

* Minimal Search Options type

* Consistent Naming
* Rename network config as nodes, remove config property to enable network search

* search-server remove env flag for network search

* Types clean up and improved nodes data reporting in network query
- nodes hits now return if no aggregations are requested
- nodes that failed to connect on startup are included in the nodes data query
- overhauled the Result type to allow different data to be returned for each result case
- created a seaprate config type for local vs remote nodes, not yet used

* WIP: Enhanced types for graphql-router and network search on local node
- Next Step is to define local node configs and build the local query from those
- Converted all graphql-router/schema files to typescript
- Added Context as a generic type for many of the Resolver types and related functions. This is used to pass information to the resolvers and custom resovler functions. At the moment this is only the server side filter function, but passing custom context to this is important.

* Working network search within a catalog
* Move shared types into dedicated files
* Update config schema to match new config format
* validateConfigs will list all missing properties from the config
Comment thread .dev/roadmap.md
The `SearchClient` abstraction already exists as the right boundary — the migration should align the types and default configuration to OpenSearch while preserving compatibility for ES users.

**What's already done:** The integration test suite (`integration-tests/server`) already supports both engines via a `SEARCH_ENGINE` env var (defaults to `'elasticsearch'`). `buildSearchClient` accepts a `client` type parameter mapped to `SupportedSearchClients`. The architecture is ready; the missing pieces are the OpenSearch client dependency and a running OpenSearch instance in CI.
**What's already done:** The integration test suite (`integration-tests/server`) already supports both engines via a `SEARCH_ENGINE` env var (defaults to `'elasticsearch'`). `buildSearchClient` accepts a `client` type parameter mapped to `SupportedClientTypes`. The architecture is ready; the missing pieces are the OpenSearch client dependency and a running OpenSearch instance in CI.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no type named SupportedSearchClients, I believe this is leftover before a name change. I have replaced all instances if this with SupportedClientTypes and this SHOULD work as intended. If needed, suggest the desired name and we can make this consistent before merging (or resolve in a separate PR).

Copy link
Copy Markdown
Member

@justincorrigible justincorrigible May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly prefer the more descriptive name I had applied (this needs to be read as a configuration option, not a code compliance thing), but we can redo that change once this is merged (along with a couple other name changes decided elsewhere), so your PR's scope doesn't grow. 👍

* - Ensures every entry in `table.defaultSorting` has `desc` explicitly set,
* defaulting to `false` when the field is absent.
*/
const normalize = (fileDataJSON: any) => {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the best function name, but I'm trying to refactor only lightly as I add features.

Comment on lines +26 to +29
export type ExternalNetworkConfig = Partial<{
[serverNetworkConfigExtendedProperties.PASSTHROUGH_HEADERS]: string[];
}> &
NetworkConfig<ArrangerBaseContext>;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config file types and graphql-router types are diverging, which is to be expected. It would be worthwhile doing a dedicated task on separating the types for these two contexts, and to provide validation schemas (zod) for the config files.


// Network Search Nodes
"network": {
"passthroughHeaders": ["Authorization"],
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're doing here, and could use some documentation (even if only as a comment in the line above)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeup, I owe better (read: some) documentation for this feature

@@ -1,3 +1,13 @@
{
"network": []
"network": {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

/**
* Normalizes raw config file JSON for use by the search server.
* This will assume the network config provided has the expected format, it is not fully validated.
* TODO: Fully validate config file structure
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! added an item to this effect in the roadmap, to do it both her and in the core module, with proper error feedback, and to add the corresponding tests

Comment thread apps/search-server/src/configs/fromFiles/normalize.ts Outdated
Comment thread apps/search-server/src/configs/fromFiles/normalize.ts
Comment thread apps/search-server/configTemplates/network.json Outdated
Comment thread integration-tests/import/package.json Outdated
"test": "NODE_OPTIONS=\"$NODE_OPTIONS --experimental-vm-modules\" jest --config=jest.config.ts"
},
"type": "module"
"test": "jest --config=jest.config.ts"
Copy link
Copy Markdown
Member

@justincorrigible justincorrigible May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you test this? I think it breaks compatibility on some imports, hence why I added it. seems beyond the scope of this PR tbh

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another merge conflict issue, will restore the script from main.

Comment on lines -150 to -151
[configOptionalProperties.GRAPHQL_MAX_ALIASES]: number;
[configOptionalProperties.GRAPHQL_MAX_DEPTH]: number;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please readd these. looks like merge conflict mistake

Copy link
Copy Markdown
Member

@justincorrigible justincorrigible May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may I ask to confirm the related logic wasn't also removed?

Comment on lines +251 to +260
const headers = new Headers();
for (const [key, value] of Object.entries(req.headers)) {
if (value !== undefined) {
const valueAsString = Array.isArray(value) ? value.join(', ') : value;
headers.set(key, valueAsString);
}
}
const request: RequestContextProps = { headers };

// Add to context based on external parameters
Copy link
Copy Markdown
Member

@justincorrigible justincorrigible May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while I of course may be misreading the passthrough logic overall, it looks like this applies the same headers to every remote node in the network. i.e. there's no per-node control.

for instance If remote nodes have different trust levels, e.g. one is internal, another is a third-party node, forwarding Authorization to all of them may leak credentials.

... either the docs/config schema should explicitly warn that this affects all nodes equally, or better yet, this should be an opt-in kind of deal so we don't go against OWASP A01: Broken Access Control recommendations.

headers: {
'Content-Type': 'application/json',
'X-REQUEST-TYPE': 'GraphQL',
...(customRequestProps?.headers || {}),
Copy link
Copy Markdown
Member

@justincorrigible justincorrigible May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm yeah, here it is... the passthrough thing is non-discriminate and could leak credentials. thinking the configuration should state explicitly which nodes to send which headers to, somehow (granted, that sounds complex)

const isFieldAggregationSupported = (fieldObject: AggregationField): fieldObject is SupportedAggregationField => {
export const isFieldAggregationSupported = (
fieldObject: AggregationField,
): fieldObject is SupportedAggregationField => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i really need to wrap my head around this "is" thing. I forget it exists and can be so useful

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually strongly discourage using type predicates now. TS will infer the predicate logic if the function returns boolean and narrows the type of the input variable. If TS can't calculate that the type is actually narrowed correctly then manually declaring the type predicate is about as risky as using any type assertion.

if (response.success) {
const documentName = config.documentType;
const responseData = response.data[documentName];
// TODO: Response content is not validated, we expect the return structure based on the GraphQL query we requested
Copy link
Copy Markdown
Member

@justincorrigible justincorrigible May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sounds easier than it may actually be, from a model-agnostic perspective, considering an aggregator may not be fully aware of the nodes' model and harmonization (or at least some mapping) may be necessary

[facetsProperties.AGGS]: [],
},
[configRootProperties.MATCHBOX]: [],
[configRootProperties.NETWORK_AGGREGATION]: { [configArrangerNetworkProperties.REMOTE_NODES]: [] },
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

surprised at first, but i guess it makes sense in the decoupling of "core" from "aggregator" types 👍

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because I updated the config type to make the remoteNodes property optional so there is no need to set an empty list by default.

Copy link
Copy Markdown
Member

@justincorrigible justincorrigible left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work overall (as usual)! 👏
two items of "concern" I'd like to surface here:

  • the OWASP a01 thing I pointed out, passing headers to all nodes.
  • the possible regression on GraphQL security functionality (thanks for putting up with the Git shenanigans)

joneubank and others added 3 commits May 21, 2026 14:46
suggestion from @justincorrigible

Co-authored-by: Anders Richardsson <2107110+justincorrigible@users.noreply.github.com>
suggestion from @justincorrigible

Co-authored-by: Anders Richardsson <2107110+justincorrigible@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants