Logging per test case status#5847
Conversation
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
Thanks for adding per-test-case logging! This is a nice enhancement for test result visibility.
✅ Architecture & Design
- Appropriate location: Changes to
tools/conftest.pyare correct for test infrastructure modifications - Clean integration: The
caseslist is added to the existingtest_statusdictionary structure, maintaining backwards compatibility - Good import addition: Adding
FailureandSkippedfromjunitparseralongside existingErrorimport is correct
✅ Implementation Correctness
- Result status logic is sound: The priority order (FAILED > ERROR > skipped > passed) handles overlapping statuses correctly
- Empty cases list: All error paths (
STARTUP_HANG,TIMEOUT,CRASHED, report parse failure) now correctly include"cases": []- nice attention to detail - Safe defaults: Using
case.name or "",case.classname or "", andfloat(case.time or 0.0)preventsNonepropagation
🔍 Minor Observations
-
Line 518-520: The
list(case.result)conversion assumescase.resultis iterable. While this works with junitparser's API, a brief comment noting this behavior might help future maintainers. -
Output formatting: The
↳prefix with 4-space indent creates a nice visual hierarchy in the output table. The empty strings for Wall time and # Tests columns on case rows make sense since those are file-level metrics.
📋 Pre-commit Check
The pre-commit check is failing - please run ./isaaclab.sh --format to address any formatting issues.
✅ Test Coverage
This is test infrastructure code. The changes are self-testing in that any issues would surface immediately when running tests.
This is an automated review. Please address any concerns and reach out if you have questions.
Update (f6d72b4): Nice cleanup! The case status logic has been extracted into a _case_result_status() helper function. This improves code readability and maintainability. ✅
Update (5780763): This is a substantial and excellent refactor of run_individual_tests() 🎉
What changed:
- The original 250+ line function has been decomposed into ~15 focused helper functions
- Each helper has a clear single responsibility (e.g.,
_build_test_command,_run_test_with_retries,_handle_startup_hang,_parse_test_report) - The main
run_individual_tests()is now a clean 15-line loop
Quality highlights:
- ✅
_status_summary()builder ensures consistent status dict structure across all paths - ✅
_record_infrastructure_status()cleanly separates infrastructure failures from test failures - ✅
_run_test_with_retries()encapsulates the retry logic with proper state management - ✅
_print_retry_context()DRYs up the diagnostic output formatting - ✅ Per-case results (
caseslist) are now properly populated via_parse_test_report()
Minor note: The summary table output now includes per-case rows with the ↳ prefix - this is a nice visual improvement for identifying which specific test cases passed/failed.
This refactor significantly improves maintainability. Future changes (new failure modes, additional metrics, etc.) can be isolated to individual helpers. Well done! 👏
Update (bbd0270): The helper function refactoring from the previous commit has been reverted. The code is now back to a single run_individual_tests() function with inline logic (~270 lines).
What changed:
- Removed:
_build_test_command,_read_test_content,_print_retry_context,_run_test_with_retries,_status_summary,_write_synthetic_failure,_handle_startup_hang,_handle_timeout,_handle_missing_report,_parse_test_report,_record_infrastructure_status,_record_report_status,_test_timeout_config,_run_individual_test - The retry loops, error handling, and report parsing are now inline within the main function
Observation: While the inlined version is more self-contained, it reverts the maintainability gains from the previous refactor. If the team prefers the explicit inline approach over the decomposed helper style, that's a valid choice - but it may be worth discussing the tradeoffs with the maintainers.
Functionality preserved: ✅ All the per-case logging and status tracking features remain intact. The _case_result_status() helper is still present.
Update (d4d6010): The refactoring is back! 🎉 This commit re-introduces helper functions with some nice improvements:
New/restored helper functions:
_print_retry_diagnostics()- consolidated retry diagnostic output_run_test_with_retries()- encapsulates the retry loop logic_append_output_tail()- DRY helper for truncating output tails_write_error_result()- creates synthetic JUnit reports for process errors_process_startup_hang()/_process_timeout_without_report()/_process_missing_report()- handle specific failure modes_process_error_status()- builds consistent error status dicts_parse_test_report()- extracts report parsing with per-case collection_summary_from_report()- calculates final pass/fail from parsed report_test_file_has_cameras()- isolated cold-cache check_build_test_command()- constructs pytest command
Improvements over 5780763:
- Cleaner function names (e.g.,
_process_startup_hangvs_handle_startup_hang) _append_output_tail()eliminates duplicated output formatting- Explicit return values from
_run_test_with_retries()make data flow clearer _summary_from_report()cleanly separates report parsing from pass/fail determination
Result: The main loop in run_individual_tests() is now ~60 lines of clear orchestration logic. This is a solid balance between decomposition and readability. ✅
Update (6d1f645): Second revert - the helper function refactoring has been reverted again.
Current state:
- Back to the simpler structure: one
_case_result_status()helper + inline logic inrun_individual_tests() - All the per-case logging functionality is preserved
- The
caseslist correctly captures test case name, classname, time, and result - Error paths (
STARTUP_HANG,TIMEOUT,CRASHED, parse failure) all set"cases": []
Final assessment: The core feature (per-test-case status logging) is complete and working correctly. The team appears to be iterating on whether to extract helpers vs keep inline logic - both approaches are valid for this infrastructure code. The current implementation is straightforward and achieves the goal. 👍
Greptile SummaryThis PR enhances the test runner's summary table to show per-test-case status as indented sub-rows (prefixed with
Confidence Score: 4/5Safe to merge; the change is additive and only affects the summary table printed at the end of a test run. The case-collection logic is correct and all early-exit paths properly initialise No files require special attention beyond the two style nits in Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Parse JUnitXml report] --> B[Iterate suites]
B --> C[Iterate cases in suite]
C --> D{case.result empty?}
D -- Yes --> E[case_status = 'passed']
D -- No --> F{any Failure?}
F -- Yes --> G[case_status = 'FAILED']
F -- No --> H{any Error?}
H -- Yes --> I[case_status = 'ERROR']
H -- No --> J{any Skipped?}
J -- Yes --> K[case_status = 'skipped']
J -- No --> L[case_status = 'passed']
E & G & I & K & L --> M[Append to cases list]
M --> N[Store in test_status with cases key]
N --> O[Render sub-row in PrettyTable with ↳ prefix]
Reviews (1): Last reviewed commit: "Logging per test case status" | Re-trigger Greptile |
| if not case_results: | ||
| case_status = "passed" | ||
| elif any(isinstance(r, Failure) for r in case_results): | ||
| case_status = "FAILED" | ||
| elif any(isinstance(r, Error) for r in case_results): | ||
| case_status = "ERROR" | ||
| elif any(isinstance(r, Skipped) for r in case_results): | ||
| case_status = "skipped" | ||
| else: | ||
| case_status = "passed" |
There was a problem hiding this comment.
The result strings for case-level status mix lowercase and uppercase:
"passed" and "skipped" are lowercase while "FAILED" and "ERROR" are uppercase. This is inconsistent within the same data structure — if any downstream code ever does a case-sensitive comparison against these values (e.g. filtering, aggregating, or serialising), results with "passed" and "FAILED" would need to be treated differently. Using a uniform casing convention (all-uppercase or all-lowercase) avoids that ambiguity.
| if not case_results: | |
| case_status = "passed" | |
| elif any(isinstance(r, Failure) for r in case_results): | |
| case_status = "FAILED" | |
| elif any(isinstance(r, Error) for r in case_results): | |
| case_status = "ERROR" | |
| elif any(isinstance(r, Skipped) for r in case_results): | |
| case_status = "skipped" | |
| else: | |
| case_status = "passed" | |
| if not case_results: | |
| case_status = "passed" | |
| elif any(isinstance(r, Failure) for r in case_results): | |
| case_status = "failed" | |
| elif any(isinstance(r, Error) for r in case_results): | |
| case_status = "error" | |
| elif any(isinstance(r, Skipped) for r in case_results): | |
| case_status = "skipped" | |
| else: | |
| case_status = "passed" |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
seems good consistent change
There was a problem hiding this comment.
Thanks for confirming! 👍
There was a problem hiding this comment.
I think Piotr wanted to make ERROR and FAILED stand out in uppercase, this was intentional
There was a problem hiding this comment.
That's a fair point — using uppercase for FAILED and ERROR as a visual signal for actionable/attention-requiring states, while keeping passed and skipped lowercase, is a reasonable intentional design choice. The visual distinction helps them stand out in the table output, which is exactly what you'd want when scanning test results.
I'll withdraw my concern — the mixed casing is purposeful here, not accidental.
Tip: You can customize Greptile's behavior for this repo with .greptile/rules.md and .greptile/config.json.
| if suite.name == "pytest": | ||
| suite.name = os.path.splitext(file_name)[0] | ||
| for case in suite: | ||
| case_results = list(case.result) if case.result else [] |
There was a problem hiding this comment.
In junitparser 2.x
case.result already returns a list; wrapping it in list(...) creates an unnecessary shallow copy. The idiom case.result or [] expresses the intent more directly — fall back to an empty list when case.result is falsy (either None or []).
| case_results = list(case.result) if case.result else [] | |
| case_results = case.result or [] |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
f6d72b4 to
23c079c
Compare
AntoineRichard
left a comment
There was a problem hiding this comment.
LGTM! Thanks, it will make figuring out errors faster.
Description
Logging status per case as follows:
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context.
List any dependencies that are required for this change.
Fixes # (issue)
Type of change
Screenshots
Please attach before and after screenshots of the change if applicable.
Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there