Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion docs/RUNBOOK.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,9 @@ New migrations go in `platforms/cloudflare/migrations/` as `NNNN_description.sql

## Weekly item-retention prune

A second cron entry (`0 16 * * 0`, Sunday 23:00 ICT) fires `scheduled()` with `event.cron` set to that string, which routes to `D1Repository.pruneOldItems` instead of the hourly refresh fan-out. It deletes all but the `FEEDREADER_MAX_ITEMS_PER_SOURCE` (default 1000) most recent items per source, ordered the same way the feed itself sorts (`coalesce(published_at, first_seen_at) DESC, ...`).
`scheduled()`'s single hourly trigger does double duty: every firing, `isWeeklyPruneWindow` checks whether `event.scheduledTime` landed on Sunday 16:00 UTC (23:00 ICT) and, if so, runs `D1Repository.pruneOldItems` before the normal refresh fan-out. It deletes all but the `FEEDREADER_MAX_ITEMS_PER_SOURCE` (default 1000) most recent items per source, ordered the same way the feed itself sorts (`coalesce(published_at, first_seen_at) DESC, ...`).

**Why not a second Cron Trigger:** that was the original design, but deploying a second `[triggers]` cron entry alongside the hourly one made `wrangler deploy` fail outright — `[ERROR] Some triggers failed to deploy for feedreader: - A request to the Cloudflare API (/accounts/.../workers/scripts/feedreader/schedules) failed.`, with no further detail (Wrangler doesn't currently surface the underlying API error for trigger-deploy failures — see [cloudflare/workers-sdk#14288](https://github.com/cloudflare/workers-sdk/issues/14288)). The script, bindings, and vars deployed fine each time; only the schedules update failed, leaving only the original single cron registered. Root cause against the live Cloudflare API was never confirmed (no token available outside CI to reproduce with verbose logging). If you want to retry the two-cron-trigger approach, capture `wrangler deploy --log-level debug` output to get the actual API error body before assuming it'll work differently this time.

1000/source was sized against rows-read cost, not disk: `listFeedItems` reads every row matching its WHERE clause with no SQL `LIMIT` (sorting/pagination happens in memory — see `core/sources/listInMemory.ts`), so an unfiltered home-page hit reads `sources × cap` rows from D1 every time. At 4 sources × 1000 that's 4,000 rows/request — cheap in isolation, but worth keeping in mind against D1 Free's 5M-rows-read/day budget if traffic grows or source count grows well past 4. Raising the cap or adding more sources should come with either moving sort/pagination into SQL (the existing `idx_items_feed_order` index already matches the sort order but is unused by the current query shape) or checking D1 Free's rows-read budget isn't at risk.

Expand Down
20 changes: 14 additions & 6 deletions platforms/cloudflare/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,19 @@ const KNOWN_SOURCES = new Set([
"alphaxiv",
]);

// Second [triggers] cron in wrangler.toml — Sunday 23:00 ICT (Asia/Ho_Chi_Minh,
// UTC+7, no DST) = Sunday 16:00 UTC. Fires alongside (not instead of) the
// hourly refresh cron; event.cron tells scheduled() which one triggered.
const WEEKLY_PRUNE_CRON = "0 16 * * 0";
// A second [triggers] cron entry for the weekly prune was tried and reverted:
// Cloudflare's schedules API rejected the multi-cron deploy ("Some triggers
// failed to deploy ... a request to the Cloudflare API
// (/accounts/.../workers/scripts/feedreader/schedules) failed", no further
// detail surfaced — Wrangler doesn't expose the underlying error, see
// cloudflare/workers-sdk#14288). Instead, the existing hourly trigger does
// double duty: every invocation checks whether it landed in the weekly prune
// window before doing its normal refresh. Sunday 23:00 ICT (Asia/Ho_Chi_Minh,
// UTC+7, no DST) = Sunday 16:00 UTC.
function isWeeklyPruneWindow(scheduledTime: number): boolean {
const date = new Date(scheduledTime);
return date.getUTCDay() === 0 && date.getUTCHours() === 16;
}

// Backstop only — the cache key already changes whenever the underlying
// source data refreshes (see latestSuccessAt), so this just bounds
Expand Down Expand Up @@ -62,15 +71,14 @@ export default {
},

async scheduled(event: ScheduledController, env: Env): Promise<void> {
if (event.cron === WEEKLY_PRUNE_CRON) {
if (isWeeklyPruneWindow(event.scheduledTime)) {
const { maxItemsPerSource } = loadConfig(env);
const deleted = await new D1Repository(env.DB).pruneOldItems(
maxItemsPerSource,
);
console.log(
`pruneOldItems: deleted ${deleted} item(s) beyond ${maxItemsPerSource} per source`,
);
return;
}
await fanOutRefresh(env, build());
},
Expand Down
22 changes: 12 additions & 10 deletions platforms/cloudflare/wrangler.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,16 @@ enabled = true
ip = "127.0.0.1"
port = 8788

# Hourly refresh, UTC. Asia/Ho_Chi_Minh (UTC+7, no DST) hourly wall-clock
# boundaries are the same instants as UTC hourly wall-clock boundaries, so no
# offset math is needed for the hourly cron (see docs/RUNBOOK.md).
# Hourly, UTC. Asia/Ho_Chi_Minh (UTC+7, no DST) hourly wall-clock boundaries
# are the same instants as UTC hourly wall-clock boundaries, so no offset
# math is needed (see docs/RUNBOOK.md).
#
# Second entry: weekly item-retention prune, Sunday 23:00 ICT = Sunday 16:00
# UTC (fixed +7h offset, no DST). Routed in src/index.ts's scheduled() by
# matching event.cron against this exact string.
# A second cron entry for the weekly prune was tried and reverted — Cloudflare's
# schedules API rejected the multi-cron deploy (see src/index.ts's
# isWeeklyPruneWindow and docs/RUNBOOK.md). The hourly trigger does double
# duty instead: scheduled() checks the wall-clock on every firing.
[triggers]
crons = ["0 * * * *", "0 16 * * 0"]
crons = ["0 * * * *"]

[assets]
directory = "../../web-static"
Expand All @@ -48,9 +49,10 @@ service = "feedreader"
APP_VERSION = "dev"
FEEDREADER_ITEMS_PER_SOURCE = "20"
FEEDREADER_USER_AGENT = "feedreader/0.1"
# Per-source row cap enforced by the weekly prune cron (see [triggers] above
# and docs/RUNBOOK.md) — keeps the items table, and every full-table-scan
# query against it, bounded as more sources are added.
# Per-source row cap enforced by the weekly prune window inside the hourly
# cron handler (see src/index.ts's isWeeklyPruneWindow and docs/RUNBOOK.md)
# — keeps the items table, and every full-table-scan query against it,
# bounded as more sources are added.
FEEDREADER_MAX_ITEMS_PER_SOURCE = "1000"

# REFRESH_SECRET is a secret, not a var — set it with:
Expand Down