{{ message }}
fix(fsspec): handle zero-byte files in __len__#3353
Merged
Conversation
rambleraptor
approved these changes
May 12, 2026
rambleraptor
left a comment
Collaborator
There was a problem hiding this comment.
This looks fine to me code-wise. Love the test.
Personal curiosity, but any reason why this would be useful? Zero-byte files are an interesting edge case
Contributor
Author
Fokko
approved these changes
May 15, 2026
Fokko
left a comment
Contributor
There was a problem hiding this comment.
Oof, great catch @fallintoplace Thanks for fixing this. @rambleraptor thanks for the review 👍
This was referenced May 24, 2026
Fokko
pushed a commit
that referenced
this pull request
May 25, 2026
…tion (#3412) Inspired by the walrus issue in #3353 ## Summary Several `if x := dict.get(k):` checks treated legitimate falsy values as missing: - `lower_bounds.get(field_id)` / `upper_bounds.get(field_id)` return `bytes`. `b""` is a valid serialization of an empty string and was being skipped, causing metrics-based row filtering to return `ROWS_MIGHT_MATCH` when it should have been `ROWS_CANNOT_MATCH` (and vice versa for strict eval). - `accessors[...].get(file.partition)` can return `0` or `""` for an `IdentityTransform` partition. The walrus dropped it, so projected partition columns were filled with `null` instead of the actual value. - `inspect.py` `lower_bound` / `upper_bound` rendering had the same `b""` issue, showing `None` instead of `""` in `readable_metrics`. All conditions are switched to explicit `is not None` checks. A small `_readable_bound` helper deduplicates the inspect rendering. ## Changes - `pyiceberg/expressions/visitors.py` — `_InclusiveMetricsEvaluator` and `_StrictMetricsEvaluator` bound lookups (21 sites). - `pyiceberg/io/pyarrow.py` — `_get_column_projection_values` and `ArrowProjectionVisitor` missing-field handling. - `pyiceberg/table/inspect.py` — extract `_readable_bound`, use it in both the `entries` and `_get_files_from_manifest` rendering paths. ## Tests - `tests/expressions/test_evaluator.py` — inclusive and strict evaluators with empty-string bounds, covering lower-only, upper-only, and both-bounds branches. - `tests/io/test_pyarrow.py` — parametrized identity-transform projection with falsy partition values (`0`, `""`, `None`). - `tests/table/test_inspect.py` — `_readable_bound` helper plus integration tests via `tbl.inspect.entries()` and `tbl.inspect.files()` for empty-string and null bounds.
Fokko
pushed a commit
that referenced
this pull request
May 25, 2026
Inspired by the walrus issue in #3353 ## Problem `PartitionField` and `SortField` accept the v3 `source-ids` list (added in [#1554](#1554)) and map it to the legacy singular `source-id`. Both validators try to reject an empty list: ```python if "source-id" not in data and (source_ids := data["source-ids"]): if isinstance(source_ids, list): if len(source_ids) == 0: raise ValueError("Empty source-ids is not allowed") ... data["source-id"] = source_ids[0] ``` The walrus uses truthiness, and `[]` is falsy — so the `len(source_ids) == 0` branch is unreachable. Passing `{"source-ids": []}` silently skips the mapping, and Pydantic then reports a generic "field required" error instead of the intended message. A missing `source-ids` key also raises `KeyError` instead of being handled cleanly. ## Fix Replace the walrus with an explicit key check in both validators: ```python if "source-id" not in data and "source-ids" in data: source_ids = data["source-ids"] ... ``` This makes the empty-list validation reachable and avoids the `KeyError`. ## Tests Added regression tests that deserialize `{"source-ids": []}` and assert `ValueError("Empty source-ids is not allowed")` is raised, for both `PartitionField` and `SortField`.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Fix
FsspecInputFile.__len__andFsspecOutputFile.__len__so zero-byte files return0instead of being treated as missing metadata.Both methods previously used truthiness checks on
object_info.get(...), which caused valid sizes like0to fall through to the runtime error path.Changes
Sizekey presence explicitlysizekey presence explicitlyVerification
python -m pytest tests/io/test_fsspec.py -k zero_length_of_file -q