Skip to content

Antalya 26.3: address OAuth security audit#1777

Open
zvonand wants to merge 57 commits into
antalya-26.3from
fix/antalya-26.3/oauth-address-audit
Open

Antalya 26.3: address OAuth security audit#1777
zvonand wants to merge 57 commits into
antalya-26.3from
fix/antalya-26.3/oauth-address-audit

Conversation

@zvonand
Copy link
Copy Markdown
Collaborator

@zvonand zvonand commented May 11, 2026

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Address OAuth security audit

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

zvonand and others added 18 commits May 11, 2026 14:41
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zvonand zvonand added port-antalya PRs to be ported to all new Antalya releases antalya-26.3 labels May 11, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 11, 2026

Workflow [PR], commit [28d2595]

zvonand and others added 9 commits May 11, 2026 20:25
`test_arrowflight_interface` failed with `Token authentication is
not configured` after the M-19 split between Basic and Bearer
authentication paths.

pyarrow's `authenticate_basic_token` performs a `Handshake` with
`Basic <base64>`, captures the `authorization` response header, and
reuses it verbatim on every subsequent RPC. The middleware
unconditionally emitted `Bearer <token>` in `SendingHeaders`, so the
client's next call arrived as `Bearer <base64(user:password)>` and
was routed through the token-auth path -- which has no processors
configured in this test, hence the failure.

Echo back the same scheme the client sent: Basic stays on the Basic
path, Bearer keeps going through token validation.

CI report:
https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1777&sha=f1a7b011b93262d89f3e02dbff3876c965259f38&name_0=PR&name_1=Integration%20tests%20%28amd_binary%2C%203%2F5%29
test_sql_create_jwt_user_with_processor_pin / _with_claims /
_no_pin_uses_auto_discovery asserted on `SHOW CREATE USER` output via the
default `TabSeparated` format, which escapes single quotes -- the test then
compared an unescaped SQL literal against an escaped one. Switch the three
`SHOW CREATE USER` calls to `FORMAT TSVRaw` so the SQL literal makes it
through verbatim.

test_keycloak_auth/* all failed with `Token authentication is disabled`
because the `openid` processor's `<configuration_endpoint>` is HTTP, and
H-26 now refuses non-HTTPS URLs returned in the OIDC discovery document
(defense in depth against poisoned discovery substituting cloud-metadata
or in-cluster targets). The test Keycloak speaks plain HTTP, so wire
`keycloak_discovery` in manual trust-chain mode -- exactly the escape
hatch the H-26 error message documents -- and update the
`test_openid_discovery` docstring accordingly.

CI:
- https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1777&sha=bb1b56b0ad9ec85064ffd52990e8cf0fa403a858&name_0=PR&name_1=Integration%20tests%20%28amd_asan%2C%20db%20disk%2C%20old%20analyzer%2C%203%2F6%29
- https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1777&sha=bb1b56b0ad9ec85064ffd52990e8cf0fa403a858&name_0=PR&name_1=Integration%20tests%20%28amd_asan%2C%20db%20disk%2C%20old%20analyzer%2C%202%2F6%29
H-26 unconditionally rejects non-HTTPS URLs returned by an OIDC discovery
document. The check is defense in depth (the primary defense is
<remote_url_allow_hosts>), but it has no escape hatch for operators who
knowingly run their IdP over plain HTTP -- they were forced to abandon
`configuration_endpoint` and hand-wire the trust chain.

Add a per-processor `<allow_http_discovery_urls>` flag, false by default.
When true, the HTTPS scheme check on userinfo_endpoint / introspection_endpoint
/ jwks_uri is suppressed (the <remote_url_allow_hosts> check still runs),
and a warning is logged at processor construction so the gap is visible
in operator logs.

Revert the keycloak integration test back to using `configuration_endpoint`
with the new opt-out, since the test Keycloak speaks HTTP.
L-02 added a cooldown branch in `JWKSClient::getJWKS`: when
`now - last_request_send < refresh_timeout` and no JWKS is cached, throw
"in cooldown after a recent failed fetch" so concurrent waiters do not
re-hammer a failing endpoint. The L-02 change updated `last_request_send`
unconditionally before issuing the HTTP call, so a thrown fetch still
arms the cooldown for subsequent callers.

But `last_request_send` is default-initialized to `time_point{}`, i.e.
the `steady_clock` epoch. On Linux the epoch is roughly system boot time
(and containerized hosts with isolated `CLOCK_MONOTONIC` can have a very
recent epoch), so `now - time_point{}` equals the steady-clock uptime.
When the host uptime is less than `jwks_cache_lifetime` (default 3600s)
the cooldown branch triggers on the very first call -- before any real
fetch has been attempted -- and serves "in cooldown after a recent
failed fetch" on freshly-started CI runners. That is what the
`test_keycloak_auth` flake on the targeted shard shows:

  https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1777&sha=ec127073b3a4fdb7d1bf3f43978ecbce7173a549&name_0=PR&name_1=Integration%20tests%20%28amd_asan%2C%20targeted%29

Change `last_request_send` to `std::optional<time_point>`; treat
`nullopt` as "no attempt has been made yet" and skip the cooldown check
in that case. Also drop the leftover `high_resolution_clock` in the
write-locked branch (libc++ aliases it to `steady_clock` so it built,
but mixing clock types reads as accidental).
`test_sql_create_jwt_user_with_processor_pin` and
`test_sql_create_jwt_user_with_claims` asserted that the rejected
response did NOT contain the username. ClickHouse's auth-failure body
embeds the username (`Code: 516. DB::Exception: <user>: Authentication
failed: password is incorrect, or there is no user with such name.
(AUTHENTICATION_FAILED)`), so the assertion tripped on the exact case it
was meant to verify. Switch to a positive check on the
`AUTHENTICATION_FAILED` marker.

CI:
https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1777&sha=495a8cf5f3052eb33a760169cb81bb2b5770d25c&name_0=PR&name_1=Integration%20tests%20%28amd_asan%2C%20targeted%29
pyarrow's flight client aborts (`Fatal Python error: Aborted`) on
`DoPut` / `GetFlightInfo` when the server response carries an
`Authorization` header. Only the Handshake response needs it (that's
where `authenticate_basic_token` reads the token from); echoing it on
every response was a leftover from the original code that pyarrow
tolerated only by accident.
M-19 split the ArrowFlight middleware's Basic and Bearer paths so real
JWTs would route to token validation. That broke pyarrow's flow because
the response-side middleware kept echoing `Bearer <base64-creds>` on
every response, and after M-19 that echo routed itself through the
token-validation path on the next call. 942572b patched the echo
prefix; ff7f55c scoped the echo to Handshake only. Net result: two
follow-up commits to chase one feature change that ArrowFlight does not
need in this branch (it has never supported real JWTs and no one is
asking for it).

Restore `ArrowFlightHandler.cpp` to `origin/antalya-26.3` and let
ArrowFlight keep its pre-existing 'Bearer is an alias for Basic'
behavior. JWT/Bearer support for ArrowFlight is out of scope for the
OAuth audit branch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

antalya-26.3 port-antalya PRs to be ported to all new Antalya releases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant