{{ message }}
feat: support Time64Type[ns] via downcast to microseconds (#1169)#3578
Open
anxkhn wants to merge 1 commit into
Open
feat: support Time64Type[ns] via downcast to microseconds (#1169)#3578anxkhn wants to merge 1 commit into
anxkhn wants to merge 1 commit into
Conversation
PyArrow time64[ns] previously raised "Unsupported type: time64[ns]" because Iceberg's time type is microsecond precision by spec. Mirror the existing ns -> us timestamp handling for time: - _ConvertToIceberg.primitive: when a time64[ns] is encountered, downcast to TimeType() (with a warning) if downcast-ns-timestamp-to-us-on-write is set, otherwise raise a TypeError pointing at that config property. - ArrowProjectionVisitor._cast_if_needed: add a TimeType branch so a time64[ns] array is actually cast to time64[us] on write when the flag is set. Adds unit tests for the schema conversion (us kept, ns error, ns downcast) and the write/cast path (ns -> us with the flag, error without it).
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Closes #1169
Rationale for this change
PyArrow
pa.time64("ns")was unsupported on write and raisedUnsupported type: time64[ns], even though Iceberg'stimetype is microsecondprecision by spec. This forced users to manually downcast a
time64[ns]columnbefore they could write it, while the analogous
timestamp[ns]case has beenhandled by an opt-in downcast since #848.
This change mirrors that existing
ns -> ustimestamp behavior fortime, gated onthe same
downcast-ns-timestamp-to-us-on-writeconfiguration property:pyiceberg/io/pyarrow.py,_ConvertToIceberg.primitive: atime64[ns]PyArrowtype now maps to Iceberg
TimeType()(with a warning) whendowncast-ns-timestamp-to-us-on-writeis set, and otherwise raises aTypeErrorpointing the user at that property.
time64[us]keeps working unchanged.pyiceberg/io/pyarrow.py,ArrowProjectionVisitor._cast_if_needed: a newTimeTypebranch casts atime64[ns]array totime64[us](safe=False) on writewhen the flag is set, so the data, not just the schema mapping, is actually
downcast. This is guarded by the existing
target_type != values.typecheck, so thesupported
us -> uspath is untouched.This matches the acceptance criteria left by @kevinjqliu on the earlier (stale-closed)
PR #1215, and the implementation pattern referenced there (the timestamp downcast from
#848).
One point for reviewers: this reuses the timestamp-named flag
downcast-ns-timestamp-to-us-on-writefortimeas well, which is what the issueasks for and is consistent with prior maintainer guidance. If you'd prefer a dedicated
flag for the
timetype, I'm happy to change it.Note on prior work / assignment: this issue has a long history of attempts
(#1188, #1206, #1215) that were all auto-closed by the stale bot for inactivity rather
than on merit, and there is no open PR for it today. It is still assigned to
@zaryab-ali from the original 2024 attempt, and @jaimeferj later offered to revive it.
I picked it up because it has been inactive for a long time and users are still asking
for it; happy to defer or coordinate if either of you is still working on it.
Are these changes tested?
Yes, with unit tests (no Docker/Spark required):
tests/io/test_pyarrow_visitor.pytest_pyarrow_time64_us_to_iceberg-usstill maps toTimeType()(unchanged).test_pyarrow_time64_ns_to_iceberg- updated: without the flag,nsnow raises aTypeErrorwith the newdowncast-ns-timestamp-to-us-on-writeguidance message.test_pyarrow_time64_ns_to_iceberg_downcast(new) - with the flag,nsmaps toTimeType()and round-trips back topa.time64("us").tests/io/test_pyarrow.pytest__to_requested_schema_time_ns_downcast(new) - atime64[ns]column is castto
time64[us]on write with the flag, values preserved.test__to_requested_schema_time_ns_without_downcast_raises_exception(new) -without the flag, the projection raises
Unsupported schema projection from time64[ns] to time64[us].All five pass;
make lint(ruff, ruff-format, mypy, license headers,uv-lock) isgreen and no dependencies /
uv.lockchange. I also verified end-to-end against alocal
SqlCatalog(sqlite metadata + local-FS warehouse): withPYICEBERG_DOWNCAST_NS_TIMESTAMP_TO_US_ON_WRITE=true, creating, appending, andscanning a table with a
time64[ns]column writes and reads it back astime64[us]with values intact; without the flag, the write fails with a clear error citing the
config property.
The Docker/Spark integration suite was not run in my environment; the behavior is
covered by the unit and
SqlCatalogtests above.Are there any user-facing changes?
Yes. Writing a PyArrow table with a
time64[ns]column no longer hard-fails:downcast-ns-timestamp-to-us-on-writeset, the column is downcast to Icebergtime(microseconds), consistent with howtimestamp[ns]is already handled.just reporting an unsupported type.