Skip to content

perf(waterdata): compact CQL2 JSON to halve POST chunk count#292

Draft
thodson-usgs wants to merge 1 commit into
DOI-USGS:mainfrom
thodson-usgs:compact-cql2-json
Draft

perf(waterdata): compact CQL2 JSON to halve POST chunk count#292
thodson-usgs wants to merge 1 commit into
DOI-USGS:mainfrom
thodson-usgs:compact-cql2-json

Conversation

@thodson-usgs
Copy link
Copy Markdown
Collaborator

What

monitoring-locations is the one service that sends its multi-value filter as a CQL2 POST body (it doesn't accept comma-separated multi-value GET params). That body was pretty-printed with json.dumps(..., indent=4) (~39 bytes/value). The body counts against both the server's ~8 KB request-size limit and the chunk planner's byte budget (chunking._request_bytes = URL + body), so the indentation was doubling the per-value cost and therefore doubling the chunk count for large id lists.

This switches _cql2_param to the tightest separators, json.dumps(..., separators=(",", ":")) (~17 bytes/value).

Impact

Sub-requests for a monitoring-locations id-list query at the production 8000-byte limit, via the real ChunkPlan planner:

n_ids indent=4 (old) compact (new) reduction
200 1 1
500 4 2 2× fewer
1000 8 4 2× fewer
2000 16 8 2× fewer
5000 32 16 2× fewer

Verification

  • Compact body: {"op":"and","args":[{"op":"in","args":[{"property":"monitoring_location_id"},["USGS-05407000","USGS-05428500"]]}]} — no whitespace.
  • Live: a 2-id query returns the 2 correct sites; a 500-id query fans into 2 compact chunks and returns all 500 rows.
  • Offline suite green; added a compactness assertion to the monitoring-locations POST test.

Safety

Empirically pinned the server's request-size ceilings: GET URLs 414 above ~8.2–8.4 KB; POST bodies 403 above ~8.2–8.4 KB. The chunker's existing 8000-byte limit keeps compact bodies (≤ ~7.75 KB) safely under the 403 cutoff, so no limit change is needed.

Note: why not route GET endpoints (daily, …) through POST?

Investigated and rejected. The server enforces the same ~8 KB cap on total request size whether the bytes are in the URL (414) or the body (403). A compact POST body fits ~450 sites; an 8000-byte GET URL fits ~450 sites — same capacity. So POST can't reduce the chunk count for the GET-based time-series endpoints; this compact win applies only to the pre-existing POST path. The real levers there (the ~8 KB edge limit and the 10,000-row page cap) are server-side.

🤖 Generated with Claude Code

monitoring-locations is the one service that POSTs a CQL2 body (it doesn't
support comma-separated multi-value GET). The body was pretty-printed via
json.dumps(indent=4), ~39 B/value, so it counted ~2x against both the
server's ~8 KB request-size cap and the chunk planner's byte budget. The
tightest separators (~17 B/value) roughly double how many ids fit per
sub-request, halving the chunk count and API requests for large id lists:

  n_ids   indent=4   compact
    500       4          2
   1000       8          4
   5000      32         16

Live check: a 500-id query returns all 500 rows in 2 sub-requests (was 4).
The WAF body limit (403) is empirically ~8.2-8.4 KB, so 8000-byte compact
bodies stay safely under it. Locked in with a compactness assertion on the
monitoring-locations POST test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant