LCORE-2499: prevent HF from downloading#1854
Conversation
WalkthroughThis PR adds HuggingFace model caching to CI and Docker services to mitigate rate-limiting. The workflow pre-downloads a model to ChangesHuggingFace Model Caching
🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/e2e_tests.yaml:
- Around line 117-121: Update the static cache key used with actions/cache@v4
(the key value "hf-sentence-transformers-all-mpnet-base-v2") to include a
versioning component so caches can be invalidated automatically; modify the
`key` to append a dynamic value such as `${{ github.run_number }}`, `${{
github.sha }}`, or a hash of lockfiles, and add a `restore-keys` fallback
pattern to speed up restores on partial hits (keep the `path` and action
unchanged, only adjust the `key` and add `restore-keys`).
- Line 118: Replace the floating action reference "uses: actions/cache@v4" with
a pinned commit SHA (e.g., "uses:
actions/cache@3624ceb22c1c5a301c8db4169662070a689d9ea8") to hard-pin the action
for supply-chain security and add a trailing comment noting the semantic version
(v4.1.1) you pinned from for future maintainability; update any tests or docs
that expect the unpinned form if necessary.
- Line 127: Replace the unpinned pip install command "pip install -q
sentence-transformers" with a pinned version to ensure CI reproducibility; pick
the exact version used by the project (from pyproject.toml or requirements.txt)
or a known-compatible release and update the workflow step to install that
specific version instead of the floating package.
In `@docker-compose.yaml`:
- Around line 61-62: Remove the hardcoded HF_HUB_OFFLINE=1 entry that forces
offline mode and instead make offline mode conditional: stop setting
HF_HUB_OFFLINE in the compose env block and document/expect the CI pipeline to
export HF_HUB_OFFLINE=1 (or set it in the CI job environment) when HF_CACHE_PATH
is pre-populated; alternatively add a local setup script to pre-download the
embedding into the mounted HF_CACHE_PATH (or ./tmp/.hf-cache) before running
docker compose up so the container can start without offline mode; apply the
same change to the docker-compose-library.yaml and ensure references to
HF_CACHE_PATH remain unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: d23081cc-fdbd-4c27-b2a0-8197a258dbe2
📒 Files selected for processing (3)
.github/workflows/e2e_tests.yamldocker-compose-library.yamldocker-compose.yaml
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: check_dependencies
- GitHub Check: Pylinter
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 2
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:50.251Z
Learning: In the lightspeed-stack repository, when a user claims a fix is done but the code still shows the original issue, verify the current state of the code before accepting the fix.
📚 Learning: 2026-05-20T08:09:30.641Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:30.641Z
Learning: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set `provider_shield_id` to the *guard model identifier* (e.g., `meta-llama/Llama-Guard-3-8B`). Do not use a chat/generative model id (e.g., `openai/gpt-4o-mini`): a chat-model id (or `native_override`) indicates only an override landed and does **not** mean the safety shield is actually gating queries. Ensure any E2E coverage for the related implementation (JIRA/E2E tests) exercises a real Llama Guard model to verify that the shield is effective.
Applied to files:
docker-compose.yamldocker-compose-library.yaml
📚 Learning: 2026-05-20T08:09:36.724Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: src/client.py:104-108
Timestamp: 2026-05-20T08:09:36.724Z
Learning: In the lightspeed-stack repo, the synthesized `run.yaml` file handling in `src/client.py` (`_synthesize_library_config`) uses a fixed `/tmp` path intentionally in the PoC (PR `#1580`). The durable production requirements are tracked in spec doc R10 (docs/design/llama-stack-config-merge/llama-stack-config-merge.md): persistent known path overwritten each boot, file mode 0600 set via explicit create flag (not umask), and a `--synthesized-config-output` CLI flag for debugging. The PoC code is scheduled for removal pre-merge; the implementation JIRA "Unified llama_stack.config schema + synthesizer" inherits R10's requirements.
Applied to files:
docker-compose-library.yaml
📚 Learning: 2026-05-06T08:35:54.687Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1690
File: .github/workflows/e2e_tests_providers.yaml:279-285
Timestamp: 2026-05-06T08:35:54.687Z
Learning: In .github/workflows/e2e_tests_providers.yaml and related e2e workflow files, the show_logs step should not use docker compose logs with --tail or --since (i.e., keep logs unbounded). The quick connectivity test runs once immediately after container startup, so the log output is small and a log tail limit is unnecessary. If you adjust this, add a rationale comment in the workflow explaining why unbounded logs are acceptable and ensure CI behavior remains deterministic.
Applied to files:
.github/workflows/e2e_tests.yaml
🪛 zizmor (1.25.2)
.github/workflows/e2e_tests.yaml
[error] 118-118: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)
(unpinned-uses)
🔇 Additional comments (3)
docker-compose.yaml (2)
82-116: LGTM!The
lightspeed-stackservice in server mode does not include HuggingFace cache mounts or offline mode settings, which is architecturally correct: thellama-stackservice handles embedding model loading, andlightspeed-stackorchestrates requests without direct HuggingFace dependency.
24-24: HuggingFace cache mount target matches the llama-stack container default
docker-compose.yamlmounts${HF_CACHE_PATH:-./tmp/.hf-cache}to/opt/app-root/src/.cache/huggingfaceforllama-stack(image built fromdeploy/llama-stack/test.containerfile).- That container sets
ENV HOME="/opt/app-root/src"and there are noHF_HOME/HF_CACHE/TRANSFORMERS_CACHEoverrides in the llama-stack entrypoint/config, so HuggingFace’s default$HOME/.cache/huggingfacematches the mount target.docker-compose-library.yaml (1)
61-62: ⚡ Quick winMake
HF_HUB_OFFLINE=1conditional to avoid breaking local runs
docker-compose-library.yamlhardcodesHF_HUB_OFFLINE=1, which applies to all environments and forces HuggingFace into offline mode—so startup will require pre-populated HF cache content (e.g., via a CI pre-download step or manual cache population). Repo search finds no docs/scripts explaining local HF cache/offline setup, so local development can easily fail. Default offline to off locally (e.g.,HF_HUB_OFFLINE=${HF_HUB_OFFLINE:-0}) and/or document the required local pre-download/cache-mount steps.
| - name: Cache HuggingFace embedding model | ||
| uses: actions/cache@v4 | ||
| with: | ||
| path: /tmp/hf-cache | ||
| key: hf-sentence-transformers-all-mpnet-base-v2 |
There was a problem hiding this comment.
Add versioning to cache key to enable invalidation.
The cache key hf-sentence-transformers-all-mpnet-base-v2 is static and doesn't include version information. If the model is updated upstream or the sentence-transformers library changes model handling, stale cached artifacts will persist until manually invalidated.
📦 Suggested improvement
- name: Cache HuggingFace embedding model
uses: actions/cache@v4
with:
path: /tmp/hf-cache
- key: hf-sentence-transformers-all-mpnet-base-v2
+ key: hf-sentence-transformers-all-mpnet-base-v2-${{ hashFiles('**/pyproject.toml') }}
+ restore-keys: |
+ hf-sentence-transformers-all-mpnet-base-v2-Include a hash of dependency lockfiles or a date component (e.g., ${{ github.run_number }} or ${{ github.sha }}) in the key. Use restore-keys as a fallback to speed up cache misses.
🧰 Tools
🪛 zizmor (1.25.2)
[error] 118-118: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)
(unpinned-uses)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/e2e_tests.yaml around lines 117 - 121, Update the static
cache key used with actions/cache@v4 (the key value
"hf-sentence-transformers-all-mpnet-base-v2") to include a versioning component
so caches can be invalidated automatically; modify the `key` to append a dynamic
value such as `${{ github.run_number }}`, `${{ github.sha }}`, or a hash of
lockfiles, and add a `restore-keys` fallback pattern to speed up restores on
partial hits (keep the `path` and action unchanged, only adjust the `key` and
add `restore-keys`).
| grep -A 3 "llama_stack:" lightspeed-stack.yaml | ||
|
|
||
| - name: Cache HuggingFace embedding model | ||
| uses: actions/cache@v4 |
There was a problem hiding this comment.
Pin action to commit SHA for supply-chain security.
The actions/cache@v4 reference is not pinned to a specific commit SHA. GitHub Actions security best practices require pinning to immutable commit hashes to prevent supply-chain attacks.
🔒 Recommended fix
- - name: Cache HuggingFace embedding model
- uses: actions/cache@v4
+ - name: Cache HuggingFace embedding model
+ uses: actions/cache@3624ceb22c1c5a301c8db4169662070a689d9ea8 # v4.1.1Use actions/cache@3624ceb22c1c5a301c8db4169662070a689d9ea8 (current v4.1.1) or the latest commit from the v4 branch. Add a comment with the version for maintainability.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| uses: actions/cache@v4 | |
| uses: actions/cache@3624ceb22c1c5a301c8db4169662070a689d9ea8 # v4.1.1 |
🧰 Tools
🪛 zizmor (1.25.2)
[error] 118-118: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)
(unpinned-uses)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/e2e_tests.yaml at line 118, Replace the floating action
reference "uses: actions/cache@v4" with a pinned commit SHA (e.g., "uses:
actions/cache@3624ceb22c1c5a301c8db4169662070a689d9ea8") to hard-pin the action
for supply-chain security and add a trailing comment noting the semantic version
(v4.1.1) you pinned from for future maintainability; update any tests or docs
that expect the unpinned form if necessary.
| env: | ||
| HF_HOME: /tmp/hf-cache | ||
| run: | | ||
| pip install -q sentence-transformers |
There was a problem hiding this comment.
Pin sentence-transformers version for reproducibility.
Installing sentence-transformers without a version constraint can lead to non-deterministic CI behavior if the library releases breaking changes or updates model-loading logic.
📌 Suggested fix
- pip install -q sentence-transformers
+ pip install -q sentence-transformers==3.3.1Pin to the version currently in use (check pyproject.toml or requirements.txt if available). If the project doesn't use sentence-transformers directly, pin to a known-compatible version.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pip install -q sentence-transformers | |
| pip install -q sentence-transformers==3.3.1 |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/e2e_tests.yaml at line 127, Replace the unpinned pip
install command "pip install -q sentence-transformers" with a pinned version to
ensure CI reproducibility; pick the exact version used by the project (from
pyproject.toml or requirements.txt) or a known-compatible release and update the
workflow step to install that specific version instead of the floating package.
| # Prevent HuggingFace Hub update checks (HTTP 429 rate-limiting in CI from parallel jobs). | ||
| - HF_HUB_OFFLINE=1 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check for documentation on local vs CI compose usage
rg -i -C3 'docker.compose.*local|local.*docker|development.*setup' --type=md README.md CONTRIBUTING.md docs/
# Check if there's a separate compose override file for local development
fd 'docker-compose.*override|docker-compose.*local' --type=fRepository: lightspeed-core/lightspeed-stack
Length of output: 5038
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate docker-compose.yaml
echo "== docker-compose.yaml candidates =="
fd -i 'docker-compose\.ya?ml' . || true
# Show the referenced lines (if file at repo root; otherwise find the closest match and print the requested vicinity)
echo
echo "== docker-compose.yaml excerpt around lines 55-75 =="
for f in $(fd -i 'docker-compose\.ya?ml' . -x echo 2>/dev/null | tr '\n' ' '); do
echo "--- $f ---"
nl -ba "$f" | sed -n '55,80p' || true
done
echo
echo "== occurrences of HF_HUB_OFFLINE / HF_CACHE_PATH =="
rg -n --hidden --no-ignore -S 'HF_HUB_OFFLINE|HF_CACHE_PATH' . || true
echo
echo "== occurrences of huggingface offline/cache/pre-download =="
rg -n --hidden --no-ignore -S 'hugging.?face|hf-cache|hf cache|pre-?download' . || trueRepository: lightspeed-core/lightspeed-stack
Length of output: 11390
🏁 Script executed:
#!/bin/bash
set -euo pipefail
mapfile -t files < <(fd -i 'docker-compose\.ya?ml' .)
echo "== docker-compose.yaml candidates =="
printf '%s\n' "${files[@]:-<none>}"
echo
echo "== docker-compose.yaml excerpt around lines 55-80 (each match) =="
if [ "${`#files`[@]}" -eq 0 ]; then
exit 0
fi
for f in "${files[@]}"; do
echo "--- $f ---"
nl -ba "$f" | sed -n '55,80p'
echo
done
echo "== occurrences of HF_HUB_OFFLINE / HF_CACHE_PATH =="
rg -n --hidden --no-ignore -S 'HF_HUB_OFFLINE|HF_CACHE_PATH' . || true
echo
echo "== occurrences of huggingface offline/cache/pre-download =="
rg -n --hidden --no-ignore -S 'hugging.?face|hf-cache|HF_HOME|HF_DATASETS_CACHE|pre-?download' . || trueRepository: lightspeed-core/lightspeed-stack
Length of output: 133
🏁 Script executed:
#!/bin/bash
set -euo pipefail
mapfile -t files < <(fd -i 'docker-compose\.ya?ml' .)
echo "== docker-compose.yaml candidates =="
printf '%s\n' "${files[@]:-<none>}"
echo
echo "== docker-compose.yaml excerpt around lines 55-80 (each match) =="
if [ "${`#files`[@]}" -eq 0 ]; then
exit 0
fi
for f in "${files[@]}"; do
echo "--- $f ---"
nl -ba "$f" | sed -n '55,80p'
echo
done
echo "== occurrences of HF_HUB_OFFLINE / HF_CACHE_PATH =="
rg -n --hidden --no-ignore -S 'HF_HUB_OFFLINE|HF_CACHE_PATH' . || true
echo
echo "== occurrences of huggingface offline/cache/pre-download =="
rg -n --hidden --no-ignore -S 'hugging.?face|hf-cache|HF_HOME|HF_DATASETS_CACHE|pre-?download' . || trueRepository: lightspeed-core/lightspeed-stack
Length of output: 133
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== docker-compose.yaml lines 50-80 =="
sed -n '50,80p' docker-compose.yaml
echo
echo "== docker-compose-library.yaml lines 50-80 =="
sed -n '50,80p' docker-compose-library.yaml
echo
echo "== Search: HF_HUB_OFFLINE / HF_CACHE_PATH guidance in docs README workflows =="
rg -n --hidden --no-ignore -S 'HF_HUB_OFFLINE|HF_CACHE_PATH|hf-cache|HF_HOME' README.md CONTRIBUTING.md docs .github/workflows || true
echo
echo "== Show e2e_tests workflow around HF cache pre-download lines =="
# Tight context around the lines indicated earlier
rg -n 'Cache HuggingFace embedding model|Pre-download HuggingFace embedding model|HF_CACHE_PATH=/tmp/hf-cache' .github/workflows/e2e_tests.yaml -n
awk 'NR>=90 && NR<=160 {print NR":"$0}' .github/workflows/e2e_tests.yamlRepository: lightspeed-core/lightspeed-stack
Length of output: 6528
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== docker-compose override/local candidates =="
fd -i "docker-compose.*override.*\.ya?ml|docker-compose.*local.*\.ya?ml|docker-compose\.override\.ya?ml" .
echo
echo "== docker-compose_library.yaml override/local candidates =="
fd -i "docker-compose-library.*override.*\.ya?ml|docker-compose-library.*local.*\.ya?ml" .Repository: lightspeed-core/lightspeed-stack
Length of output: 184
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Find Containerfile/Dockerfile references to .cache/huggingface =="
rg -n --hidden --no-ignore -S '\.cache/huggingface|HF_HOME|HF_HUB_OFFLINE|sentence-transformers' deploy .github/workflows || true
echo
echo "== Show relevant parts of deploy/lightspeed-stack/Containerfile =="
if [ -f deploy/lightspeed-stack/Containerfile ]; then
nl -ba deploy/lightspeed-stack/Containerfile | sed -n '90,170p'
fi
echo
echo "== Show relevant parts of deploy/llama-stack/test.containerfile (if present) =="
if [ -f deploy/llama-stack/test.containerfile ]; then
nl -ba deploy/llama-stack/test.containerfile | sed -n '1,120p'
fiRepository: lightspeed-core/lightspeed-stack
Length of output: 894
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== deploy/lightspeed-stack/Containerfile (around .cache/huggingface) =="
sed -n '105,150p' deploy/lightspeed-stack/Containerfile
echo
echo "== deploy/llama-stack/test.containerfile (around line ~36) =="
sed -n '1,80p' deploy/llama-stack/test.containerfileRepository: lightspeed-core/lightspeed-stack
Length of output: 3857
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n --hidden --no-ignore -S 'HF_HUB_OFFLINE' README.md CONTRIBUTING.md docs .github/workflows deploy || true
echo
rg -n --hidden --no-ignore -S 'hf-cache|HF_CACHE_PATH|pre-download HuggingFace|sentence-transformers/all-mpnet-base-v2' README.md docs .github/workflows || trueRepository: lightspeed-core/lightspeed-stack
Length of output: 2310
Hardcoded HF_HUB_OFFLINE=1 will block local embedding downloads unless the HF cache is pre-populated.
docker-compose.yaml (and docker-compose-library.yaml) unconditionally sets HF_HUB_OFFLINE=1 while mounting ${HF_CACHE_PATH:-./tmp/.hf-cache} into the container’s HuggingFace cache. CI avoids failures by pre-downloading the embedding model into /tmp/hf-cache and exporting HF_CACHE_PATH=/tmp/hf-cache before docker compose up.
Local docker compose up has no cache pre-populated (Containerfile only creates the cache directory), and the docs describe that the embedding model can be downloaded automatically on first start-up—offline mode contradicts this.
# Prevent HuggingFace Hub update checks (HTTP 429 rate-limiting in CI from parallel jobs).
- HF_HUB_OFFLINE=1
💡 Recommended fixes
Option 1: Conditional offline mode via environment variable (preferred)
# Prevent HuggingFace Hub update checks (HTTP 429 rate-limiting in CI from parallel jobs).
- - HF_HUB_OFFLINE=1
+ - HF_HUB_OFFLINE=${HF_HUB_OFFLINE:-0}Set HF_HUB_OFFLINE=1 explicitly in the CI workflow environment (where the cache is already pre-filled). Apply the same change to docker-compose-library.yaml.
Option 2: Local pre-download step + docs
Add a local setup script to pre-download the embedding model into ./tmp/.hf-cache and document running it before docker compose up.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docker-compose.yaml` around lines 61 - 62, Remove the hardcoded
HF_HUB_OFFLINE=1 entry that forces offline mode and instead make offline mode
conditional: stop setting HF_HUB_OFFLINE in the compose env block and
document/expect the CI pipeline to export HF_HUB_OFFLINE=1 (or set it in the CI
job environment) when HF_CACHE_PATH is pre-populated; alternatively add a local
setup script to pre-download the embedding into the mounted HF_CACHE_PATH (or
./tmp/.hf-cache) before running docker compose up so the container can start
without offline mode; apply the same change to the docker-compose-library.yaml
and ensure references to HF_CACHE_PATH remain unchanged.
Description
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit