Skip to content

perf(waterdata): compact CQL2 JSON to halve POST chunk count#2

Closed
thodson-usgs wants to merge 1 commit into
httpx-migrationfrom
waterdata-compact-cql2
Closed

perf(waterdata): compact CQL2 JSON to halve POST chunk count#2
thodson-usgs wants to merge 1 commit into
httpx-migrationfrom
waterdata-compact-cql2

Conversation

@thodson-usgs
Copy link
Copy Markdown
Owner

Stacked on DOI-USGS#285 (base: httpx-migration).

What

monitoring-locations is the one service that sends its multi-value filter as a CQL2 POST body (it doesn't accept comma-separated multi-value GET params). That body was pretty-printed with json.dumps(..., indent=4) (~39 bytes/value). The body counts against both the server's ~8 KB request-size limit and the chunk planner's byte budget (chunking._request_bytes = URL + body), so the indentation was doubling the per-value cost and therefore doubling the chunk count for large id lists.

This switches _cql2_param to the tightest separators json.dumps(..., separators=(",", ":")) (~17 bytes/value).

Impact

Chunk count (and API requests) for a monitoring-locations id-list query, at the production 8000-byte limit, via the real ChunkPlan planner:

n_ids indent=4 (old) compact (new) reduction
200 1 1
500 4 2 2× fewer
1000 8 4 2× fewer
2000 16 8 2× fewer
5000 32 16 2× fewer

Live end-to-end: a 500-id query returns all 500 rows in 2 sub-requests (was 4), no 403.

Safety

Empirically pinned the server's request-size ceilings: GET URLs 414 above ~8.2–8.4 KB; POST bodies 403 above ~8.2–8.4 KB. The chunker's existing 8000-byte limit keeps compact bodies (≤ ~7.75 KB) safely under the 403 cutoff, so no limit change is needed. A compactness assertion on the monitoring-locations POST test locks the format in.

Why not route daily/time-series through POST too?

Investigated and rejected: the server enforces the same ~8 KB cap on total request size whether the bytes are in the URL (414) or the body (403). A compact POST body fits ~450 sites; an 8000-byte GET URL fits ~450 sites — same capacity. So POST can't reduce the chunk count for the GET-based time-series endpoints; this compact win applies only to the pre-existing POST path. The real levers for those (the ~8 KB WAF rule and the 10 K-row page cap) are server-side.

🤖 Generated with Claude Code

monitoring-locations is the one service that POSTs a CQL2 body (it doesn't
support comma-separated multi-value GET). The body was pretty-printed via
json.dumps(indent=4), ~39 B/value, so it counted ~2x against both the
server's ~8 KB request-size cap and the chunk planner's byte budget. The
tightest separators (~17 B/value) roughly double how many ids fit per
sub-request, halving the chunk count and API requests for large id lists:

  n_ids   indent=4   compact
    500       4          2
   1000       8          4
   5000      32         16

Live check: a 500-id query returns all 500 rows in 2 sub-requests (was 4).
The WAF body limit (403) is empirically ~8.2-8.4 KB, so 8000-byte compact
bodies stay safely under it. Locked in with a compactness assertion on the
monitoring-locations POST test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@thodson-usgs
Copy link
Copy Markdown
Owner Author

Superseded: reopened against DOI-USGS:main as a standalone PR (the change doesn't depend on DOI-USGS#285).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant