[e2e] Add nightly e2e test for submitting examples to flink standalone cluster#708
[e2e] Add nightly e2e test for submitting examples to flink standalone cluster#708matrixsparse wants to merge 1 commit into
Conversation
|
Hi @wenjin272, this PR implements the CI pipeline for #642 as discussed. Could you PTAL when you have time? |
8189bc8 to
704e45c
Compare
| on: | ||
| schedule: | ||
| - cron: '0 0 * * *' | ||
| workflow_dispatch: |
There was a problem hiding this comment.
Nightly + manual dispatch means a regression in examples/**, python/flink_agents/examples/**, or tools/install.sh can sit undetected for up to 24h. Would a path-filtered pull_request: trigger for those paths make sense here, with the cron staying as the safety net for transitive-dep changes? The Flink download + full build is non-trivial wall time per PR, so the nightly-only choice is defensible too — curious which trade-off you prefer.
There was a problem hiding this comment.
Agreed. added a path-filtered pull_request trigger for those paths. The cron stays as the safety net for transitive-dep changes. The path filter is narrow enough that most PRs won't trigger it, so wall-time cost is acceptable.
| failed=$((failed + 1)) | ||
| fi | ||
| done | ||
| printf "\nTotal: %d Passed: %d Failed: %d\n" "$total" "$passed" "$failed" |
There was a problem hiding this comment.
If install_flink, build_project, stage_dist_jars, or start_cluster dies under set -e, no result is ever recorded, so print_summary walks an empty RESULT_NAMES and prints Total: 0 Passed: 0 Failed: 0 before cleanup propagates the original non-zero exit code. The CI job still fails on the exit code, but a person scanning the log sees a "zero failures" summary right before the red X, which is misleading when triaging a 45-minute nightly run.
One way it could read, if useful:
if (( total == 0 )); then
log_error "Test setup failed before any example was submitted"
return
firight above the existing if (( failed > 0 )) check.
There was a problem hiding this comment.
Fixed exactly as you suggested.
xintongsong
left a comment
There was a problem hiding this comment.
Thanks for working on this, @matrixsparse . It's a good idea to test with the example jobs nightly.
I'm not sure about only validates the job submission success. I think currently all example jobs can run with local LLMs in Ollama. That shouldn't be a problem against verifying the full execution. Did I miss anything?
| log_ok "Staged: $(basename "$flink_jar")" | ||
| } | ||
|
|
||
| package_examples() { |
There was a problem hiding this comment.
I think build_project should have already built the examples. We should not need to re-build them.
There was a problem hiding this comment.
Good catch. Removed the redundant package_examples().
There was a problem hiding this comment.
You're right. I've updated the script to install Ollama, pull qwen3:8b, and wait for each job to reach FINISHED status instead of just verifying submission.
There was a problem hiding this comment.
In addition to verifying the jobs reaching the FINISH status, I think we can also check for the error logs to identify if the job is running properly. Flink's e2e test already have it and we may copy / reuse those approaches. See test-scripts/common.sh.
| log_section "Step 6: submit Java examples" | ||
| submit_java_example "org.apache.flink.agents.examples.ReActAgentExample" | ||
| submit_java_example "org.apache.flink.agents.examples.WorkflowSingleAgentExample" | ||
| submit_java_example "org.apache.flink.agents.examples.WorkflowMultipleAgentExample" | ||
|
|
||
| log_section "Step 7: submit Python examples" | ||
| submit_python_example "$ROOT_DIR/python/flink_agents/examples/quickstart/react_agent_example.py" | ||
| submit_python_example "$ROOT_DIR/python/flink_agents/examples/quickstart/workflow_single_agent_example.py" | ||
| submit_python_example "$ROOT_DIR/python/flink_agents/examples/quickstart/workflow_multiple_agent_example.py" |
There was a problem hiding this comment.
Is it intended to not cover all the example jobs?
There was a problem hiding this comment.
Yes, intentional. All 6 quickstart examples (3 Java + 3 Python) are covered. The RAG examples (python/flink_agents/examples/rag/) are excluded because they require a vector store and an embedding model that aren't provisioned in this CI setup. Added a comment in the script explaining this. We can add RAG coverage in a follow-up once vector store infrastructure is available in CI.
There was a problem hiding this comment.
I think there are 5 examples in Java, and 6 examples in Python (5 in quickstart/ and 1 in rag/). And there could be more in future. Is is possible to iterate over the example directory and submit everything it finds? (Might need to reorganize the example directory to follow certain pattern.)
As for the rag example, it uses a local ollama embedding model and a local chroma vector store, so there should be no problem running it locally in CI.
704e45c to
f3b3bfb
Compare
Purpose of change
Add automated e2e test for submitting Java/Python quickstart examples to a Flink standalone cluster, replacing the current manual verification process before each release.
Closes #642
Changes
e2e-test/test-scripts/test_submit_examples_to_flink.sh: Test script that installs Flink viainstall.sh, starts a standalone cluster, submits all 6 examples (3 Java + 3 Python), verifies submission success, and cleans up..github/workflows/nightly-e2e.yml: Nightly GitHub Actions workflow that runs the test daily at UTC 00:00, with manual trigger support.Key design decisions
tools/install.sh --non-interactive(from [tools]Import Wizard for Installation Setup #599) for Flink installation