feat(ai-aliyun-content-moderation): role-aware request_check_mode and O(n) content chunking by nic-6443 · Pull Request #13598 · apache/apisix

nic-6443 · 2026-06-23T04:31:09Z

Description

The ai-aliyun-content-moderation plugin had hot-path inefficiencies in the moderation path, re-moderated the entire conversation on every request, and opened a new HTTP connection per moderation call. This PR fixes the performance issues and adds a moderation-scope option.

1. O(n²) → O(n) content chunking. content_moderation() splits text into *_check_length_limit-sized chunks. It used utf8.sub(content, index, ...), which locates the index-th character by scanning from the start of the string on every chunk, making the loop O(n²) in content length. Replaced with a byte cursor + utf8.offset(content, length_limit + 1, cur) (scans only the current chunk window) + byte-based string.sub, giving O(n). Output is byte-identical to the previous chunking (verified for ASCII and multibyte text).

2. Faster RFC-3986 signing. url_encoding() ran five separate Lua string.gsub passes over the percent-encoded chunk to escape the sub-delimiters ! ' ( ) *. Profiling a ~2 KB chunk showed this single function at ~148 µs — 30–40× more than every other per-chunk operation (json.encode/hmac_sha1/encode_args are all ~4–5 µs). Replaced with one JIT-compiled ngx.re.gsub(escaped, "[!'()*]", repl, "jo") pass (~7 µs, same output).

3. Schema guard. Added minimum = 1 to request_check_length_limit and response_check_length_limit. A non-positive value made the chunking loop never advance (utf8.offset(content, length_limit + 1, cur) returns cur), an infinite loop. It is now rejected at config time, consistent with stream_check_cache_size.

4. request_check_mode (last | all). Request moderation now targets user input only. By default (last) it moderates just the latest consecutive block of user messages (the newest user turn) instead of the whole conversation, so multi-turn requests no longer re-send the entire history to the moderation service every turn. all moderates every user message. Both modes ignore system/assistant/tool messages (the query moderation service is meant for user input). Note this changes the previous behavior, which moderated all messages regardless of role.

5. Reuse one HTTP connection per request. check_single_content() opened a new resty.http object and ran connect()/set_keepalive() on every moderation call. In realtime response mode a single response fires one call per stream_check_cache_size characters (tens of calls), so per-call connection churn dominated. Now one httpc is cached on ctx, reused across all of a request's moderation calls, and returned to the keepalive pool once at request end; a failed connection is dropped and re-established on the next call. This is perf-only with no behavior change. (Minor: on an aborted/errored streaming response the held connection is closed at request teardown instead of pooled — no leak; cosocket ops aren't allowed in the log phase so there is no clean release hook for those paths.)

Performance — request moderation

Single user message, ai-proxy + ai-aliyun-content-moderation, local mock moderation endpoint that always passes, wrk -t1 -c1. Baseline = ai-proxy only:

message size	baseline QPS	before QPS	after QPS	before→after
64k	1928	71.8	162.4	2.3x
128k	1377	30.3	84.6	2.8x
256k	899	11.3	43.1	3.8x
512k	508	3.75	21.9	5.9x
1M	266	1.10	11.2	10.1x

The before→after gain grows with size because the O(n²) term is removed; the single-pass encoding adds a roughly uniform ~2× on top.

Performance — streaming response moderation (connection reuse)

Single worker pinned and saturated, streaming response checked in realtime mode (one moderation call per stream_check_cache_size chars), local mock moderation endpoint:

stream_check_cache_size	calls/response	before tokens/s	after tokens/s	gain
128	~48	59,938	62,073	+3.6%
64	~96	45,074	47,790	+6.0%

The gain scales with the number of moderation calls per response (smaller cache_size / longer responses). final_packet and non-streaming checks are unchanged.

Tests

Added cases to t/plugin/ai-aliyun-content-moderation.t: last/all mode behavior, role-awareness (non-user messages are skipped; a non-user last message means no user turn to check), multi-chunk detection across a multibyte boundary, and schema rejection of request_check_length_limit: 0. Change 5 is perf-only with no behavior change; the existing realtime cases (run with keepalive=true) already exercise cross-call connection reuse.

Checklist

I have explained the need for this PR and the problem it solves
I have added the relevant tests
I have updated the documentation (en + zh)

…st/all) and O(n) content chunking

Copilot

Pull request overview

This PR optimizes the ai-aliyun-content-moderation plugin’s hot path and adds role-aware request moderation scoping so multi-turn chat requests don’t re-moderate the full conversation on every call.

Changes:

Improve request/response chunking performance by switching from repeated utf8.sub slicing to a byte cursor + utf8.offset approach (O(n²) → O(n)).
Speed up RFC-3986-compatible signing by replacing multiple string.gsub passes with a single ngx.re.gsub pass.
Add request_check_mode (last/all) to moderate only user-role messages, plus schema guards (minimum = 1) and corresponding tests/docs updates.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`apisix/plugins/ai-aliyun-content-moderation.lua`	Adds `request_check_mode`, improves signing and chunking performance, and tightens schema constraints.
`apisix/plugins/ai-protocols/openai-chat.lua`	Refactors request text extraction and adds user-only extraction for request moderation.
`apisix/plugins/ai-protocols/anthropic-messages.lua`	Refactors request text extraction and adds user-only extraction for request moderation.
`t/plugin/ai-aliyun-content-moderation.t`	Adds coverage for `last`/`all` mode behavior, role-awareness, multi-chunk paths, and schema rejection.
`docs/en/latest/plugins/ai-aliyun-content-moderation.md`	Documents `request_check_mode` and updates length-limit constraints.
`docs/zh/latest/plugins/ai-aliyun-content-moderation.md`	Documents `request_check_mode` and updates length-limit constraints.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… openai-chat extract_user_content

…h_limit and clean up url_encoding comment

…ents; drop all-content fallback (moderate user content only) and add extract_user_content to responses/bedrock/embeddings

…uest check_single_content created a new resty.http object and ran connect()/set_keepalive() on every moderation call. In realtime streaming mode a response fires one call per stream_check_cache_size characters (tens of calls), so the per-call connection churn dominated. Cache one httpc on ctx, reuse it across all moderation calls of the request, and return it to the keepalive pool once at request end; a failed connection is dropped and re-established on the next call. Single-worker saturated benchmark against a local moderation mock: realtime throughput +3.6% at cache_size=128 and +6.0% at cache_size=64; final_packet and non-streaming checks unchanged. No behaviour change.

feat(ai-aliyun-content-moderation): role-aware request_check_mode (la…

257712e

…st/all) and O(n) content chunking

Copilot AI review requested due to automatic review settings June 23, 2026 04:31

dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request performance generate flamegraph for the current PR labels Jun 23, 2026

Copilot started reviewing on behalf of nic-6443 June 23, 2026 04:31 View session

Copilot AI reviewed Jun 23, 2026

View reviewed changes

Comment thread apisix/plugins/ai-aliyun-content-moderation.lua Outdated

Comment thread apisix/plugins/ai-aliyun-content-moderation.lua Outdated

nic-6443 added 3 commits June 23, 2026 12:39

fix(ai-aliyun-content-moderation): guard non-table message entries in…

e7ebc46

… openai-chat extract_user_content

fix(ai-aliyun-content-moderation): use integer type for *_check_lengt…

1ea9ef1

…h_limit and clean up url_encoding comment

fix(ai-aliyun-content-moderation): address review — relocate doc comm…

ed4b265

…ents; drop all-content fallback (moderate user content only) and add extract_user_content to responses/bedrock/embeddings

dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jun 23, 2026

shreemaan-abhishek approved these changes Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ai-aliyun-content-moderation): role-aware request_check_mode and O(n) content chunking#13598

feat(ai-aliyun-content-moderation): role-aware request_check_mode and O(n) content chunking#13598
nic-6443 wants to merge 5 commits into
apache:masterfrom
nic-6443:fix-aliyun-cm-moderation-scope

nic-6443 commented Jun 23, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nic-6443 commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Performance — request moderation

Performance — streaming response moderation (connection reuse)

Tests

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nic-6443 commented Jun 23, 2026 •

edited

Loading