Skip to content

GH-3354: Override LocalInputFile.toString to return the path#3557

Open
1fanwang wants to merge 1 commit into
apache:masterfrom
1fanwang:fix/GH-3354-localinputfile-tostring
Open

GH-3354: Override LocalInputFile.toString to return the path#3557
1fanwang wants to merge 1 commit into
apache:masterfrom
1fanwang:fix/GH-3354-localinputfile-tostring

Conversation

@1fanwang
Copy link
Copy Markdown

Rationale for this change

When a non-Parquet file is opened through LocalInputFile, ParquetFileReader.readFooter builds the error message via file.toString(). LocalInputFile inherits the default Object.toString(), so the error reads:

org.apache.parquet.io.LocalInputFile@7852e922 is not a Parquet file (length is too low: 5)

HadoopInputFile already overrides toString() to return its path.toString(). This PR does the same for LocalInputFile. The reporter on #3354 saw this through Delta Kernel's InputFile impl, but parquet-java's own LocalInputFile has the identical shape and reproduces locally.

(The broader fix — adding default String getPath() to the InputFile interface so ParquetFileReader can call it with a sensible fallback for any third-party impl — is left for a follow-up if reviewers want it. This PR keeps scope to the built-in impl.)

What changes are included in this PR?

  • parquet-common/src/main/java/org/apache/parquet/io/LocalInputFile.java: override toString() to return path.toString().
  • parquet-common/src/test/java/org/apache/parquet/io/TestLocalInputOutput.java: new test inputFileToStringReturnsPath asserting LocalInputFile.toString() equals the underlying path string. Fails on master (expected:</var/.../...tmp> but was:<org.apache.parquet.io.LocalInputFile@61443d8f>), passes with the fix.

Are these changes tested?

Yes. New inputFileToStringReturnsPath test in TestLocalInputOutput. Failed before the fix, passes after; the 4 pre-existing tests in the file continue to pass.

Are there any user-facing changes?

The "is not a Parquet file" error now shows the path instead of the object hash. No API change.

Closes #3354

ParquetFileReader uses InputFile.toString() to build the "is not a Parquet
file" error message. HadoopInputFile overrides toString() to return its
Path, so the error message is actionable for Hadoop-backed reads. For
LocalInputFile and any third-party InputFile that does not override
toString(), the message degrades to "ClassName@hashcode is not a Parquet
file (length is too low: 10)", which gives no clue which file was bad.

Override toString() on LocalInputFile so the same code path produces a
path-bearing message for local reads, matching HadoopInputFile.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The error message "is not a Parquet file" doesn't show the Parquet path anymore

1 participant