perf(waterdata): compact CQL2 JSON to halve POST chunk count#292
Draft
thodson-usgs wants to merge 1 commit into
Draft
perf(waterdata): compact CQL2 JSON to halve POST chunk count#292thodson-usgs wants to merge 1 commit into
thodson-usgs wants to merge 1 commit into
Conversation
monitoring-locations is the one service that POSTs a CQL2 body (it doesn't
support comma-separated multi-value GET). The body was pretty-printed via
json.dumps(indent=4), ~39 B/value, so it counted ~2x against both the
server's ~8 KB request-size cap and the chunk planner's byte budget. The
tightest separators (~17 B/value) roughly double how many ids fit per
sub-request, halving the chunk count and API requests for large id lists:
n_ids indent=4 compact
500 4 2
1000 8 4
5000 32 16
Live check: a 500-id query returns all 500 rows in 2 sub-requests (was 4).
The WAF body limit (403) is empirically ~8.2-8.4 KB, so 8000-byte compact
bodies stay safely under it. Locked in with a compactness assertion on the
monitoring-locations POST test.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
monitoring-locationsis the one service that sends its multi-value filter as a CQL2 POST body (it doesn't accept comma-separated multi-value GET params). That body was pretty-printed withjson.dumps(..., indent=4)(~39 bytes/value). The body counts against both the server's ~8 KB request-size limit and the chunk planner's byte budget (chunking._request_bytes= URL + body), so the indentation was doubling the per-value cost and therefore doubling the chunk count for large id lists.This switches
_cql2_paramto the tightest separators,json.dumps(..., separators=(",", ":"))(~17 bytes/value).Impact
Sub-requests for a
monitoring-locationsid-list query at the production 8000-byte limit, via the realChunkPlanplanner:Verification
{"op":"and","args":[{"op":"in","args":[{"property":"monitoring_location_id"},["USGS-05407000","USGS-05428500"]]}]}— no whitespace.monitoring-locationsPOST test.Safety
Empirically pinned the server's request-size ceilings: GET URLs
414above ~8.2–8.4 KB; POST bodies403above ~8.2–8.4 KB. The chunker's existing 8000-byte limit keeps compact bodies (≤ ~7.75 KB) safely under the 403 cutoff, so no limit change is needed.Note: why not route GET endpoints (
daily, …) through POST?Investigated and rejected. The server enforces the same ~8 KB cap on total request size whether the bytes are in the URL (
414) or the body (403). A compact POST body fits ~450 sites; an 8000-byte GET URL fits ~450 sites — same capacity. So POST can't reduce the chunk count for the GET-based time-series endpoints; this compact win applies only to the pre-existing POST path. The real levers there (the ~8 KB edge limit and the 10,000-row page cap) are server-side.🤖 Generated with Claude Code