Skip to content

fix: include filename in JUnit XML parse errors and consolidate directory walk#895

Merged
dangrondahl merged 7 commits into
mainfrom
894-bug-kosli-attest-junit-reports-cryptic-error-without-filename-for-non-utf-8-xml-files
May 18, 2026
Merged

fix: include filename in JUnit XML parse errors and consolidate directory walk#895
dangrondahl merged 7 commits into
mainfrom
894-bug-kosli-attest-junit-reports-cryptic-error-without-filename-for-non-utf-8-xml-files

Conversation

@dangrondahl
Copy link
Copy Markdown
Contributor

@dangrondahl dangrondahl commented May 18, 2026

Summary

  • kosli attest junit now includes the filename and a user-friendly message when XML parsing fails, making errors actionable instead of cryptic
  • Consolidated the double directory walk into a single pass, halving time/memory/allocations

Before:

Error: xml: encoding "ISO-8859-15" declared but Decoder.CharsetReader is nil

After:

Error: failed to parse XML file /path/to/results.xml: only UTF-8 encoding is supported.
If this is not a JUnit results file, use --results-dir to specify the directory containing only JUnit XML files

Changes

  1. Replaced junit.IngestDir() with our own filepath.Walk that calls junit.IngestFile() per file and wraps errors with the filename
  2. Encoding errors get a user-friendly message (what's wrong, what to do) instead of the raw Go stdlib error
  3. ingestJunitDir now returns both parsed results and the list of JUnit filenames in a single pass
  4. Removed getJunitFilenames() — its work is folded into the single walk
  5. Added test fixture with ISO-8859-15 encoding declaration
  6. Added 4 unit tests for ingestJunitDir

Decision: no charset support (Fix 2 dropped)

We considered adding golang.org/x/net/html/charset to handle non-UTF-8 XML encodings transparently. We decided against it because:

  • The CLI walks all .xml files in --results-dir (default .), so non-JUnit XML files (pom.xml, config files) get parsed too
  • We can't assume a non-UTF-8 XML file is a JUnit file with incorrect encoding — it may just be an unrelated file swept up by the walk
  • The right fix for users is to use --results-dir to narrow scope, and the error message now guides them to do that

Closes #894

Replace junit.IngestDir() with our own filepath.Walk that calls
junit.IngestFile() per file and wraps errors with the filename.
This makes XML parsing errors actionable — users can now see which
file caused the failure.

Add test fixture with ISO-8859-15 encoding declaration and three
unit tests for ingestJunitDir: filename in error, no-tests-found,
and valid JUnit parsing.

Refs #894
Same wrapping treatment as ingestJunitDir — wrap junit.IngestFile()
errors with the filename so users can identify which file caused
the failure.

Refs #894
ingestJunitDir now returns both parsed results and the list of
JUnit filenames (for upload), eliminating the separate
getJunitFilenames() function and its redundant directory walk.

Refs #894
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 18, 2026

Claude finished @dangrondahl's task in 3m 25s —— View job


PR Review: fix: include filename in JUnit XML parse errors and consolidate directory walk

  • Run static analysis (build, vet, mod tidy)
  • Read changed files and review code
  • Check for bugs, edge cases, and security issues
  • Post review feedback

Static Analysis

All three checks pass cleanly: go build ./..., go vet ./..., and go mod tidy with no drift in go.mod/go.sum.

Summary

This PR is in good shape. The core change — consolidating the double directory walk into a single pass and wrapping parse errors with the filename — is clean, correct, and well-tested. No bugs or security issues found.

What's done well

  • Single-pass consolidation is a clear improvement — the old code walked the directory twice doing identical per-file work. The benchmarks in the PR description are a nice touch.
  • filepath.WalkDir over filepath.Walk — good, avoids unnecessary os.Lstat calls.
  • Error handling is well-structured: the charset error gives a user-friendly message with actionable guidance (--results-dir), while generic parse errors still wrap the original with %w for debuggability.
  • Test coverage is solid: encoding error, empty directory, happy path, and filename verification all covered.
  • Consistent nil, nil, err returns on all error paths — previous review feedback addressed cleanly.
  • Fixture placement (testdata_junit_iso8859/ outside testdata/) is pragmatic given that existing suite tests walk testdata/ recursively.

Minor observations (all optional, no blockers)

  1. Charset detection fragility (line 224): The strings.Contains(err.Error(), "Decoder.CharsetReader is nil") check depends on Go stdlib wording. If it changes, the fallback (line 228) still includes the filename, so this degrades gracefully — just worth being aware of. Already flagged in prior review.

  2. Test precision (line 166-171): The filename test asserts Contains(filenames[0], ".xml") — could be strengthened to assert the exact expected path. See inline comment.

  3. Redundant test (line 160-163): "parses valid JUnit XML correctly" is a subset of "returns filenames of parsed JUnit XML files" — they could be merged, but separate is fine for readability.

No blocking issues. Looks ready to merge.

@dangrondahl
Copy link
Copy Markdown
Contributor Author

Benchmark: single-pass vs double-pass directory walk

Measured on testdata/ (4 XML files, Apple M4 Pro):

Metric Single pass (new) Double pass (old) Improvement
Time ~2.9 ms ~5.7 ms ~2x faster
Allocs 24,951 49,868 ~2x fewer
Memory 3.2 MB 6.4 MB ~2x less
BenchmarkIngestJunitDir_SinglePass-12     404   3143631 ns/op   3223225 B/op   24951 allocs/op
BenchmarkIngestJunitDir_DoublePass-12     202   5870830 ns/op   6441994 B/op   49868 allocs/op

The ratio stays constant regardless of file count since both walks do identical per-file work (walk + open + XML decode). The real-world impact scales with the number of .xml files traversed — which is particularly relevant when --results-dir defaults to . and the CLI walks the entire repo tree. A Java repo with hundreds of pom.xml, config, and JUnit XML files across target/ directories would see the absolute savings multiply proportionally.

Comment thread cmd/kosli/attestJunit.go Outdated
Comment thread cmd/kosli/attestJunit.go Outdated
- Return nil instead of empty slice on Walk error for consistency
- Merge TestIngestJunitDir_ReturnsFilenames into TestIngestJunitDir

Refs #894
@dangrondahl
Copy link
Copy Markdown
Contributor Author

Re: the suggestion to move testdata_junit_iso8859/ under testdata/

This is intentionally placed outside testdata/. The existing AttestJunitCommandTestSuite tests use --results-dir testdata, which walks the entire testdata/ tree recursively. Placing the ISO-8859-15 fixture inside would cause those tests to hit the encoding error and fail.

Applied the other two suggestions (return nil on error, merge test functions) in 473213d.

Comment thread cmd/kosli/attestJunit.go Outdated
Comment thread cmd/kosli/attestJunit.go
Replace the cryptic Go stdlib error (Decoder.CharsetReader is nil)
with an actionable message that tells the user which file failed,
that only UTF-8 is supported, and to use --results-dir if the file
is not a JUnit results file.

Refs #894
Comment thread cmd/kosli/attestJunit.go Outdated
Comment thread cmd/kosli/attestJunit.go
Comment thread cmd/kosli/attestJunit.go Outdated
Comment thread cmd/kosli/attestJunit_test.go
- Use filepath.WalkDir instead of filepath.Walk (avoids os.Lstat per entry)
- Return nil, nil on all error paths for consistency
- Assert exact filename count in test

Refs #894
Comment thread cmd/kosli/attestJunit.go Outdated
@dangrondahl
Copy link
Copy Markdown
Contributor Author

Benchmark: filepath.Walk vs filepath.WalkDir

Measured on testdata/ (4 XML files, Apple M4 Pro):

Metric Walk WalkDir Delta
Time ~2.9 ms ~2.75 ms ~5% faster
Allocs 24,951 24,831 120 fewer
Memory 3.22 MB 3.20 MB ~22 KB less

The savings come from WalkDir skipping the os.Lstat call per directory entry. The improvement would be more pronounced in larger directory trees with many non-XML files, where the stat is wasted entirely — relevant when --results-dir defaults to . and the CLI walks the entire repo.

Comment thread cmd/kosli/attestJunit.go
Comment thread cmd/kosli/attestJunit.go
Comment thread cmd/kosli/attestJunit_test.go
Comment thread cmd/kosli/attestJunit_test.go
@dangrondahl dangrondahl enabled auto-merge (squash) May 18, 2026 10:50
@dangrondahl dangrondahl merged commit 1c5f71a into main May 18, 2026
18 checks passed
@dangrondahl dangrondahl deleted the 894-bug-kosli-attest-junit-reports-cryptic-error-without-filename-for-non-utf-8-xml-files branch May 18, 2026 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: kosli attest junit reports cryptic error without filename for non-UTF-8 XML files

2 participants