feat(ai-aliyun-content-moderation): role-aware request_check_mode and O(n) content chunking#13598
Open
nic-6443 wants to merge 5 commits into
Open
feat(ai-aliyun-content-moderation): role-aware request_check_mode and O(n) content chunking#13598nic-6443 wants to merge 5 commits into
nic-6443 wants to merge 5 commits into
Conversation
…st/all) and O(n) content chunking
There was a problem hiding this comment.
Pull request overview
This PR optimizes the ai-aliyun-content-moderation plugin’s hot path and adds role-aware request moderation scoping so multi-turn chat requests don’t re-moderate the full conversation on every call.
Changes:
- Improve request/response chunking performance by switching from repeated
utf8.subslicing to a byte cursor +utf8.offsetapproach (O(n²) → O(n)). - Speed up RFC-3986-compatible signing by replacing multiple
string.gsubpasses with a singlengx.re.gsubpass. - Add
request_check_mode(last/all) to moderate only user-role messages, plus schema guards (minimum = 1) and corresponding tests/docs updates.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
apisix/plugins/ai-aliyun-content-moderation.lua |
Adds request_check_mode, improves signing and chunking performance, and tightens schema constraints. |
apisix/plugins/ai-protocols/openai-chat.lua |
Refactors request text extraction and adds user-only extraction for request moderation. |
apisix/plugins/ai-protocols/anthropic-messages.lua |
Refactors request text extraction and adds user-only extraction for request moderation. |
t/plugin/ai-aliyun-content-moderation.t |
Adds coverage for last/all mode behavior, role-awareness, multi-chunk paths, and schema rejection. |
docs/en/latest/plugins/ai-aliyun-content-moderation.md |
Documents request_check_mode and updates length-limit constraints. |
docs/zh/latest/plugins/ai-aliyun-content-moderation.md |
Documents request_check_mode and updates length-limit constraints. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… openai-chat extract_user_content
…h_limit and clean up url_encoding comment
…ents; drop all-content fallback (moderate user content only) and add extract_user_content to responses/bedrock/embeddings
…uest check_single_content created a new resty.http object and ran connect()/set_keepalive() on every moderation call. In realtime streaming mode a response fires one call per stream_check_cache_size characters (tens of calls), so the per-call connection churn dominated. Cache one httpc on ctx, reuse it across all moderation calls of the request, and return it to the keepalive pool once at request end; a failed connection is dropped and re-established on the next call. Single-worker saturated benchmark against a local moderation mock: realtime throughput +3.6% at cache_size=128 and +6.0% at cache_size=64; final_packet and non-streaming checks unchanged. No behaviour change.
shreemaan-abhishek
approved these changes
Jun 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
The
ai-aliyun-content-moderationplugin had hot-path inefficiencies in the moderation path, re-moderated the entire conversation on every request, and opened a new HTTP connection per moderation call. This PR fixes the performance issues and adds a moderation-scope option.1. O(n²) → O(n) content chunking.
content_moderation()splits text into*_check_length_limit-sized chunks. It usedutf8.sub(content, index, ...), which locates theindex-th character by scanning from the start of the string on every chunk, making the loop O(n²) in content length. Replaced with a byte cursor +utf8.offset(content, length_limit + 1, cur)(scans only the current chunk window) + byte-basedstring.sub, giving O(n). Output is byte-identical to the previous chunking (verified for ASCII and multibyte text).2. Faster RFC-3986 signing.
url_encoding()ran five separate Luastring.gsubpasses over the percent-encoded chunk to escape the sub-delimiters! ' ( ) *. Profiling a ~2 KB chunk showed this single function at ~148 µs — 30–40× more than every other per-chunk operation (json.encode/hmac_sha1/encode_argsare all ~4–5 µs). Replaced with one JIT-compiledngx.re.gsub(escaped, "[!'()*]", repl, "jo")pass (~7 µs, same output).3. Schema guard. Added
minimum = 1torequest_check_length_limitandresponse_check_length_limit. A non-positive value made the chunking loop never advance (utf8.offset(content, length_limit + 1, cur)returnscur), an infinite loop. It is now rejected at config time, consistent withstream_check_cache_size.4.
request_check_mode(last|all). Request moderation now targets user input only. By default (last) it moderates just the latest consecutive block ofusermessages (the newest user turn) instead of the whole conversation, so multi-turn requests no longer re-send the entire history to the moderation service every turn.allmoderates everyusermessage. Both modes ignoresystem/assistant/toolmessages (the query moderation service is meant for user input). Note this changes the previous behavior, which moderated all messages regardless of role.5. Reuse one HTTP connection per request.
check_single_content()opened a newresty.httpobject and ranconnect()/set_keepalive()on every moderation call. In realtime response mode a single response fires one call perstream_check_cache_sizecharacters (tens of calls), so per-call connection churn dominated. Now onehttpcis cached onctx, reused across all of a request's moderation calls, and returned to the keepalive pool once at request end; a failed connection is dropped and re-established on the next call. This is perf-only with no behavior change. (Minor: on an aborted/errored streaming response the held connection is closed at request teardown instead of pooled — no leak; cosocket ops aren't allowed in thelogphase so there is no clean release hook for those paths.)Performance — request moderation
Single
usermessage,ai-proxy+ai-aliyun-content-moderation, local mock moderation endpoint that always passes,wrk -t1 -c1. Baseline =ai-proxyonly:The
before→aftergain grows with size because the O(n²) term is removed; the single-pass encoding adds a roughly uniform ~2× on top.Performance — streaming response moderation (connection reuse)
Single worker pinned and saturated, streaming response checked in realtime mode (one moderation call per
stream_check_cache_sizechars), local mock moderation endpoint:The gain scales with the number of moderation calls per response (smaller
cache_size/ longer responses).final_packetand non-streaming checks are unchanged.Tests
Added cases to
t/plugin/ai-aliyun-content-moderation.t:last/allmode behavior, role-awareness (non-user messages are skipped; a non-user last message means no user turn to check), multi-chunk detection across a multibyte boundary, and schema rejection ofrequest_check_length_limit: 0. Change 5 is perf-only with no behavior change; the existing realtime cases (run withkeepalive=true) already exercise cross-call connection reuse.Checklist