GH-3354: Override LocalInputFile.toString to return the path#3557
Open
1fanwang wants to merge 1 commit into
Open
GH-3354: Override LocalInputFile.toString to return the path#35571fanwang wants to merge 1 commit into
1fanwang wants to merge 1 commit into
Conversation
ParquetFileReader uses InputFile.toString() to build the "is not a Parquet file" error message. HadoopInputFile overrides toString() to return its Path, so the error message is actionable for Hadoop-backed reads. For LocalInputFile and any third-party InputFile that does not override toString(), the message degrades to "ClassName@hashcode is not a Parquet file (length is too low: 10)", which gives no clue which file was bad. Override toString() on LocalInputFile so the same code path produces a path-bearing message for local reads, matching HadoopInputFile.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rationale for this change
When a non-Parquet file is opened through
LocalInputFile,ParquetFileReader.readFooterbuilds the error message viafile.toString().LocalInputFileinherits the defaultObject.toString(), so the error reads:HadoopInputFilealready overridestoString()to return itspath.toString(). This PR does the same forLocalInputFile. The reporter on #3354 saw this through Delta Kernel'sInputFileimpl, but parquet-java's ownLocalInputFilehas the identical shape and reproduces locally.(The broader fix — adding
default String getPath()to theInputFileinterface soParquetFileReadercan call it with a sensible fallback for any third-party impl — is left for a follow-up if reviewers want it. This PR keeps scope to the built-in impl.)What changes are included in this PR?
parquet-common/src/main/java/org/apache/parquet/io/LocalInputFile.java: overridetoString()to returnpath.toString().parquet-common/src/test/java/org/apache/parquet/io/TestLocalInputOutput.java: new testinputFileToStringReturnsPathassertingLocalInputFile.toString()equals the underlying path string. Fails on master (expected:</var/.../...tmp> but was:<org.apache.parquet.io.LocalInputFile@61443d8f>), passes with the fix.Are these changes tested?
Yes. New
inputFileToStringReturnsPathtest inTestLocalInputOutput. Failed before the fix, passes after; the 4 pre-existing tests in the file continue to pass.Are there any user-facing changes?
The "is not a Parquet file" error now shows the path instead of the object hash. No API change.
Closes #3354