Skip to content

Pentest fixes and hardening#321

Open
level09 wants to merge 73 commits into
auto-update-simplifiedfrom
pentest-fixes
Open

Pentest fixes and hardening#321
level09 wants to merge 73 commits into
auto-update-simplifiedfrom
pentest-fixes

Conversation

@level09

@level09 level09 commented Apr 21, 2026

Copy link
Copy Markdown
Collaborator

Fixes for the 7ASec BAY-01 engagement, on top of baseline 8615ff11b. One commit per finding, tagged fix(BAY-01-NNN): so the retest maps fixes by git log --grep BAY-01 pentest-fixes.

42 of 43 fixed. Open: BAY-01-017 (release signing), a product decision, not a code gap.

Finding to commit

001 80f56b9bc   015 4f8b40e68   028 0cddfca22   039 aa7ecb076
002 d0f4581da   016 18059510d   029 bc467b9df   040 9bfca8af5
003 3ce769fe8   018 ce2afebcb   030 fa3069dcd   041 726d34582
004 4b66981d5   019 ee18f11ee   031 dc8e62fec   042 b186e1b96
005 597e7e6c1   020 f55f7a01c   032 adf941e8d   043 c0adefa7c
006 507bab008   021 f541199f7   033 5bbacd183
007 ecc4aa76f   022 85e7f540a   034 279b3c8f8
008 9800fbed6   023 0cba2461f   035 f1990806b
009 0614f8aad   024 09f3f60b5   036 5abe376ea
010 f34e6790c   025 4708fe422   037 9ee85af9f
011 dec0516ea   026 76f431fff   038 6f562d461
012 8b0c2d5c3   027 a5280d017
013 e4f26caac   014 ddabe8eaa

017 is unaddressed pending the signing decision. minisign keypair exists, public key pinned; manual sudo bayanat update stays as the fallback.

Partials worth flagging

  • 020: filename enumeration closed (opaque names). Per-item auth on /api/serve/inline/<filename> is a separate parent-binding change, deferred.
  • 028 issue 6: passwords are out of argv (in-container config file + REDISCLI_AUTH). Delivery stays via env vars because Compose can't mount env-sourced secrets into read_only containers.
  • 031: app DB role stripped of superuser/createdb/createrole/replication and CONNECT revoked from PUBLIC, but keeps table ownership because dynamic-fields runs ALTER TABLE at runtime.

Retest

Backend regression tests: uv run pytest tests/test_pentest_fixes.py. No frontend harness in this repo; 034 verified by review. Each commit message has root cause and where to verify. Fresh install recommended (005 changed the install flow: the installer now generates and prints the admin password once).

Satisfies SLSA v1.2 Source track requirement for a documented
Safe Expunging Process. Covers scope, permitted reasons, two-maintainer
approval, procedure, and consumer notification.
@coderabbitai

coderabbitai Bot commented Apr 21, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 213e2dfc-2a85-4d20-b445-fef8a61d66cf

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pentest-fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

level09 added 5 commits April 21, 2026 15:31
Bitnami removed bitnami/nginx:1.24 from Docker Hub in August 2025,
breaking compose builds. Point to the bitnamilegacy mirror that
Bitnami publishes for deprecated tags.
The celery-ocr service defined in docker-compose.yml builds with
ROLE=celery-ocr, but flask/Dockerfile only handled 'flask' and
'celery' branches, so the image had no Python runtime and the
container exited 127 with 'celery: not found'. Share the celery
install path with celery-ocr, which needs the same deps.
The compose file defaulted ENV_FILE to .env, which is the local
development file where POSTGRES_HOST=localhost. That mount made
the bayanat container try to reach postgres via a local socket
instead of the compose service. .env.docker already ships with
the correct service hostnames (postgres, redis) and is the
intended default for this compose file.
flask create-db builds the full schema from models, so running
db upgrade after it conflicts on indexes the latest migrations
try to create. Detect fresh vs existing DB via flask db current.
@level09 level09 self-assigned this Apr 21, 2026
level09 added 20 commits April 22, 2026 00:48
Swap ubuntu:24.04 for python:3.12-slim-bookworm in the runtime stage
and use the official ghcr.io/astral-sh/uv image as the builder. Keeps
the ROLE-based conditional install for celery-only deps (exiftool,
ffmpeg) and drops incidental ubuntu bulk that is not actually used by
the app.

- builder stage installs only build headers; runtime stage installs
  only shared libraries (libpq5, pango, cairo, harfbuzz, libxml2,
  libxslt, libjpeg62-turbo, libopenjp2, libffi8, dejavu fonts)
- libimage-exiftool-perl is kept in the builder because pyexifinfo's
  setup.py probes for the exiftool binary during wheel install
- XDG_CACHE_HOME=/tmp/.cache silences fontconfig cache warnings during
  weasyprint PDF generation

Verified end-to-end against a fresh compose stack: weasyprint PDF gen,
psycopg2, pillow, lxml, exiftool, ffmpeg, nginx->uwsgi->Flask login
all work. Image sizes drop ~160MB (bayanat) and ~210MB (celery).
The previous `service nginx status` check failed on the bitnami nginx
image, which has no sysvinit, so the container always reported
unhealthy even when nginx was serving correctly. Switch to a bash TCP
probe against port 80 — no external tools required (curl and wget are
not present in the image).
## Summary

- Installer's systemd unit for `bayanat-celery` only subscribed to the
default `celery` queue, but `ocr_single` tasks route to a dedicated
`ocr` queue (`enferno/tasks/__init__.py`).
- Any bulk OCR dispatched via the admin UI (`POST /admin/api/ocr/bulk`)
or the `flask ocr process` CLI piled up in Redis with no consumer.
- Single-media OCR (per-item UI button / `flask ocr extract`) was
unaffected — it takes a sync path through
`process_media_extraction_task()`.

## Fix

Add `-Q celery,ocr` to the `ExecStart` line so the worker consumes both
queues.

## Affected versions

- v4.0.0 (GA install command ships the broken unit).

## Upgrade path

### Fresh install

Nothing special — new installs get the fixed unit.

### Existing v4.0.0 installs (in-place patch, no reinstall needed)

```
sudo sed -i.bak 's|worker --autoscale 2,5 -B$|& -Q celery,ocr|' /etc/systemd/system/bayanat-celery.service
sudo systemctl daemon-reload
sudo systemctl restart bayanat-celery
```

Verify with `sudo journalctl -u bayanat-celery --since "30 seconds ago"
| grep queues` — should list both `celery` and `ocr`.

## Test plan

- [x] Reproduced on auto-update-simplified deployment: 6 tasks stuck in
`ocr` queue with 0 processed, worker idle.
- [x] Applied the same fix to prod2 manually: queue drains, `ocr_single`
tasks processed end-to-end via Google Vision API.
- [x] All 4 test media processed successfully (confidence 24-98%, zero
failures).
- [ ] Tag `v4.0.1` after merge and update docs installer URL to pin
v4.0.1.
Bulletin/actor/incident history routes only checked the global
view_*_history permission, never the per-item access of the parent
record. A user with history-view permission but no group access to a
restricted item could read its full revision payload via the history
API.

Resolve the parent entity and gate on current_user.can_access(parent)
before the history query. Log denied attempts to Activity. Location
history is unaffected (Location has no role-based access).
The PUT /api/extraction/<id> endpoint resolved the Extraction row
directly by ID without re-checking the parent Media's group
membership, mirroring the GET sibling. Any DA/Admin could overwrite
text/status on extractions in groups they don't belong to, and the
success response leaked the full text+history payload back.

Resolve the parent Media, gate on current_user.can_access(media),
and trim the success response to to_compact_dict so even authorised
calls don't echo the full text/history block.
api_csv_analyze, api_xls_sheet and api_xls_analyze concatenated the
caller-supplied filename onto IMPORT_DIR with no containment check, so
'../../../../app/.env' resolved outside the import directory and the
resulting file content was returned as parsed CSV. With Admin
credentials this leaked SECRET_KEY, SECURITY_TOTP_SECRETS and
SECURITY_PASSWORD_SALT.

Add _resolve_import_path() that joins via werkzeug.safe_join, resolves
symlinks, and asserts the candidate is inside IMPORT_DIR. Reject with
400 on traversal or missing filename, with a warning log.
The /login endpoint had no server-side throttle: Flask-Limiter was
only attached to /csrf, the session-scoped failure counter is reset
by starting a new session, and reCAPTCHA is off by default. An
attacker could issue an unbounded number of POSTs against /login.

Add a Redis-backed throttle keyed independently on (username) and
(ip) with a 15-minute sliding window: 10 failures per username,
30 per IP. Throttle check runs in before_app_request for POST /login
and returns 429 once either ceiling is hit. Counters increment in
after_app_request on failed login and clear on success. Failed
attempts are logged via the regular logger; reCAPTCHA stays as an
optional secondary friction layer.
Interactive CRUD runs Bulletin.description / ActorProfile.description
through SanitizedField, but the import helpers wrote raw, untrusted
strings straight to those columns. Both fields are rendered with
v-html in BulletinCard and ActorProfiles, so an attacker-controlled
CSV cell or external metadata payload could carry stored XSS that
fires when an analyst opens the record.

Wrap the import-side writes in sanitize_string() so the same bleach
allowlist used by SanitizedField applies before persistence.
Covers sheet_import.set_description (actor profile), and the three
bulletin.description sinks in media_import (update_description,
video info description, text_content).
One test file covering 001 / 002 / 004 / 007 / 008. Each test mirrors
the auditor's PoC payload so a future regression flips the test. Mix
of e2e (history endpoint, extraction PUT, traversal endpoints) and
unit (login throttle helpers, sanitize_string) depending on what was
practical to wire through fixtures.

7ASec retest map: run 'uv run pytest tests/test_pentest_fixes.py -v'.
The previous fix reinvented sliding-window rate limiting on top of raw
Redis incr/expire. Replace it with two stacked Flask-Limiter
decorators applied to security.login view post-registration: one keyed
by username, one by IP. Storage, TTL, sliding window, 429 response and
X-RateLimit-* headers all come from the existing limiter setup.

Limits live in settings.py (LOGIN_RATE_LIMIT_PER_USERNAME and
LOGIN_RATE_LIMIT_PER_IP), env-overridable, no UI exposure. TestConfig
uses tighter values so the e2e test trips quickly.

Drops the custom helpers in rate_limit_utils.py, the throttle code in
user/views.py, the _FakeRedis test class, three throttle helper unit
tests, and the weak resolver unit test. Adds one e2e test that posts
bad credentials repeatedly until 429.

Net: -94 lines, leaning on a battle-tested library.
The installer created the bayanat role with createuser -s (full
superuser) and inserted 'local all bayanat trust' into pg_hba.conf.
Any local OS user could then open a socket connection and claim the
bayanat role with no password and full superuser privileges.

Drop -s on createuser so the role is a plain owner. Create pg_trgm
and postgis as the postgres superuser at install time so the app
role never needs CREATE EXTENSION privilege. Replace the 'trust'
pg_hba rule with 'peer' so only processes running as the bayanat OS
user can authenticate as the bayanat PG role.

Idempotent for upgrades: existing installs get ALTER USER ...
NOSUPERUSER and the trust rule is removed before the peer rule is
inserted.
Celery export tasks (PDF / JSON / CSV / media) iterated the
caller-supplied IDs and serialised every row, with no per-record
group check. A user with can_export but no group access could submit
restricted IDs in an export request; once an admin approved it (the
approval UI gives no signal that items cross access boundaries),
the requester received the full payload.

Add _accessible_items helper that yields only rows the export
requester is authorised to see (mirrors current_user.can_access on
the direct GET path) and route all three generator tasks plus the
media-attachment task through it. CSV path uses an inline check
because each iteration mutates a DataFrame. Restricted items get a
warning log; missing requester yields nothing (fail closed).

UI surface (admin approval warning when items are out-of-group)
deferred to a follow-up; this commit closes the data leak.
…r CLI

The /api/create-admin route was reachable over the network without
authentication during the install window. The supported quick-install
script started uWSGI + Caddy before the operator visited the wizard,
so the first network client could claim a fully privileged admin
account. The same hole reopened any time the admin role had zero
users.

Drop the route entirely (and its sibling /api/check-admin). The
installer now provisions the admin out-of-band: it generates a
high-entropy password and runs 'flask install --username admin
--password ...' before the service is exposed, then prints the
credentials once for the operator. The setup wizard now requires an
authenticated Admin session and starts from the language step.

flask install gains --username and --password options for
non-interactive use; if username is given without a password it
generates and prints one.

Drive-by: pre-commit ruff dropped 4 pre-existing unused imports
(Bulletin, DynamicField, DateHelper, celery_app) and one redundant
f-string in commands.py.
The credentials banner already prints username + password, but the
operator still has to construct the login URL themselves. Add the
fully-qualified /login URL to the banner and explain what the wizard
covers after sign-in, so the post-install handoff is one copy + one
click.
The native installer (bayanat script) provisions the initial admin via
flask install before the service is reachable. The Docker entrypoint
was not aligned: it ran create-db on a fresh volume but never created
an admin, leaving Pattern A's network surface (wizard requires auth,
no admin = no login) effectively unusable for compose users.

Add a flask install --username admin call after schema creation. With
no --password supplied the command generates a random one and prints
the credentials banner to stdout, which docker-compose logs flask
captures so the operator can retrieve it after first start.
Two issues from review:
- The entrypoint comment referenced 'docker-compose logs flask' but
  the service in docker-compose.yml is named bayanat.
- The deployment docs still told operators to run 'flask install'
  after startup, which now silently no-ops because the entrypoint
  bootstrapped the admin already, leaving them without credentials.

Fix the comment and rewrite the docs: the first-run password is in
'docker-compose logs bayanat | grep -A4 "Generated password"';
the CLI fallback only mints a fresh password if no admin exists.
Reviewer flagged that the native installer's _bootstrap_admin generated
the password in shell and passed it via 'flask install --password "$pw"',
exposing it through /proc/<pid>/cmdline and 'ps' for the lifetime of
the child.

Add a --password-stdin flag to the flask install CLI (mirroring the
docker login --password-stdin pattern). The installer now feeds the
password over a pipe:

    printf '%s\n' "$pw" | flask install --username admin --password-stdin

The password never appears in argv or environ. The Docker entrypoint
already used the safer shape (flask install --username admin generates
and prints) and is unchanged.
f-string substitution into postgresql:// and redis:// URLs broke
when passwords contained URL-special characters (/, @, #, +, =).
Surfaced during Docker bootstrap testing: a base64 password with
'/' caused 'Port could not be cast to integer' at app boot.

quote(safe='') around POSTGRES_USER/POSTGRES_PASSWORD/REDIS_PASSWORD
fixes Postgres URI plus all four Redis URLs (broker, result, session,
cache). Verified roundtrip via redis.from_url decode.
Wizard no longer creates the admin user; the installer (native and
Docker entrypoint) prints generated credentials and operators sign in
before the wizard runs. Update installation and docker guides to match,
and switch examples to docker compose v2 with --env-file .env.docker
so substitution of POSTGRES_USER/REDIS_PASSWORD does not silently fall
back to empty values.

@apodacaduron apodacaduron left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BAY-01-001:

  • If the user does not belong to the same access group as a restricted bulletin, the history endpoint correctly blocks the request: GET /admin/api/bulletinhistory/3 returns {"message": "Restricted Access"}
  • Direct access to the restricted bulletin also correctly blocks: GET /admin/api/bulletin/3 returns restricted access and triggers the admin notification ✅
  • Related follow-up: the relations endpoint still returns 200 for the same restricted parent bulletin. Example: GET /admin/api/bulletin/relations/3?class=actor returns masked entity stubs plus relationship metadata like related_as, probability, comment, and user_id. This leaks relationship existence/metadata for a restricted parent record and should probably return 403. ⚠️
  • Related follow-up: the graph endpoint also returns 200 for the same restricted bulletin. Both GET /admin/api/graph/json?id=3&type=bulletin&expanded=false and GET /admin/api/graph/json?id=3&type=bulletin&expanded=true return masked nodes, but still expose graph topology and metadata, including restricted entity IDs, relation existence, relation type (Perpetrator), connected accessible records, and connected locations (Mexico, USA). This should probably return 403 when the root entity is restricted from the current user. ⚠️
  • Related follow-up: the flowmap endpoint masks the restricted actor but still leaks event/location timeline data. Example: POST /admin/api/flowmap/actors-for-locations with {"location_ids":[1],"q":[{}]} returns actor 1 as restricted, but includes current_event and next_event details such as event type (Detained/Abducted, Death), locations (Mexico, USA), location IDs, and event dates. Restricted actors should probably be excluded from the flowmap result or returned without event details. ⚠️

@apodacaduron

Copy link
Copy Markdown
Contributor

BAY-01-036

  • Uploading a .xls file returns HTTP 500 Internal Server Error ❌
  • The fix pins engine="openpyxl" but openpyxl can't read .xls files, so the error still happens, just differently
  • .xls is still in the allowlist so users can still upload it

@apodacaduron

Copy link
Copy Markdown
Contributor

BAY-01-037

  • sheet=null and sheet=[0] on /import/api/xls/analyze correctly return 417 ✅
  • sheet=true on /import/api/xls/analyze returns HTTP 500 ❌
  • sheet=true on /import/api/process-sheet returns HTTP 500 ❌ — this endpoint has no type check

@apodacaduron

Copy link
Copy Markdown
Contributor

BAY-01-038:

  • XLSX import works correctly with text headers (a, b, c) mapped to actor name fields ✅
  • With numeric headers (1, 2, 3), the mapping screen accepts them, but the imported actor name fields are not populated and the actor is created as UNKNOWN UNKNOWN ⚠️

@apodacaduron

Copy link
Copy Markdown
Contributor

BAY-01-039:

  • Covered while testing BAY-01-008: the main sheet-import description and mismatch paths now sanitize the XSS payloads before they reach the actor description ✅
  • Related follow-up remains in the Missing Person import fields, which can still render raw imported HTML through MissingPersonField.js with CSP blocking execution ⚠️

@apodacaduron

Copy link
Copy Markdown
Contributor

BAY-01-040:

  • Tested malformed export request bodies like null, [], "hello", {}, {"config": null}, and {"config": "foo"}.
  • They no longer crash the server with HTTP 500; the API now returns a controlled {"message": "Error creating export request"} response ✅

@apodacaduron

Copy link
Copy Markdown
Contributor

BAY-01-042:

  • Tested the malformed import API bodies from the report. Those now return controlled responses instead of HTTP 500 ✅
  • Follow-up: /import/api/process-sheet still returns HTTP 500 for malformed files input, for example {}, {"files": null}, or {"files": "x"}. This looks like the same class of issue and should validate that files is a non-empty list before processing ⚠️

@apodacaduron

Copy link
Copy Markdown
Contributor

BAY-01-015 looks fixed.

I tested by changing the export ID in the URL/API to another user’s export. The server returned 403 Forbidden, so the export metadata is no longer exposed.

Minor note: 403 vs 404 can still hint that an export ID exists, but that is low impact and not the main issue.

Unrelated UX issues noticed while testing:

  • If an Admin changes their own role from Admin to Mod while on /admin/users/, they land on a raw Forbidden page with no way out. They have to manually change the URL to a valid path.
  • As a Mod, /admin/api/updates/available returns 403 and shows a “You do not have permission to view this resource” snackbar.

level09 added 2 commits June 19, 2026 13:01
Bulletin/Actor to_dict() embedded full media metadata (filename, etag,
title, OCR extraction text) in the medias array for any user who could
read the parent, even those blocked from the media endpoints by
_require_media_access. A non-Admin without can_access_media therefore
still received the data the direct endpoints deny them.

Gate the medias loop on a shared can_view_media() helper that mirrors
the _require_media_access policy (Admin or can_access_media) and trusts
CLI/Celery via has_request_context, matching check_history_access.
The single-OCR endpoint requires can_edit (assignment boundary) since
OCR writes an extraction, but /api/ocr/bulk scoped items only by
can_access (visibility), letting a non-Admin with can_access_media
OCR-mutate media they can read but are not assigned to edit.

Post-filter the resolved media ids by can_edit for non-Admins. The
query-level access filter can't express can_edit (needs the parent's
assignment + status), so it is done in Python; Admins skip the filter.
@apodacaduron

Copy link
Copy Markdown
Contributor

BAY-01-018 looks fixed for the main reported issue.

I tested a bulk Bulletin update as a Mod with:

  • Bulletin 12: accessible
  • Bulletin 11: restricted

Only Bulletin 12 was updated. Activity Monitor only showed Bulletin 12, and only Bulletin 12 got a new revision history entry.

Separate issue found while testing Incident bulk updates:
When using the assignRelated option, related Bulletins can get duplicate revision history entries.

Test:

  • Set chunk size to 1 for testing
  • Bulk updated 2 incidents with assignRelated=true
  • Incident 1 was related to Bulletin 12
  • Incident 2 was related to Bulletins 12, 11, and 10

Result:
Bulletin 12 received 3 new revision history entries from one bulk operation.

So the original restricted-ID logging issue seems fixed, but assignRelated can still create duplicate history entries for related Bulletins.

@apodacaduron

Copy link
Copy Markdown
Contributor

BAY-01-022 looks fixed for the normal Bulletin update API, but I found a remaining bulk-update edge case.

Test:

  • Bulletin 10 was assigned to a DA user.
  • Its status was set to Peer Review Assigned.
  • As the assigned DA user, a direct normal update returned 403: {"message": "Item is locked for peer review"}.
  • However, using a Mod account, the bulk update endpoint still allowed updating Bulletin 10 while it was in Peer Review Assigned.

So the normal edit path is protected, but bulk update can still mutate peer-review-locked Bulletins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants