feat: support Time64Type[ns] via downcast to microseconds (#1169) by anxkhn · Pull Request #3578 · apache/iceberg-python · GitHub
Skip to content

feat: support Time64Type[ns] via downcast to microseconds (#1169)#3578

Open
anxkhn wants to merge 1 commit into
apache:mainfrom
anxkhn:loop/iceberg-python__001
Open

feat: support Time64Type[ns] via downcast to microseconds (#1169)#3578
anxkhn wants to merge 1 commit into
apache:mainfrom
anxkhn:loop/iceberg-python__001

Conversation

@anxkhn

@anxkhn anxkhn commented Jun 28, 2026

Copy link
Copy Markdown

Closes #1169

Rationale for this change

PyArrow pa.time64("ns") was unsupported on write and raised
Unsupported type: time64[ns], even though Iceberg's time type is microsecond
precision by spec. This forced users to manually downcast a time64[ns] column
before they could write it, while the analogous timestamp[ns] case has been
handled by an opt-in downcast since #848.

This change mirrors that existing ns -> us timestamp behavior for time, gated on
the same downcast-ns-timestamp-to-us-on-write configuration property:

  • pyiceberg/io/pyarrow.py, _ConvertToIceberg.primitive: a time64[ns] PyArrow
    type now maps to Iceberg TimeType() (with a warning) when
    downcast-ns-timestamp-to-us-on-write is set, and otherwise raises a TypeError
    pointing the user at that property. time64[us] keeps working unchanged.
  • pyiceberg/io/pyarrow.py, ArrowProjectionVisitor._cast_if_needed: a new
    TimeType branch casts a time64[ns] array to time64[us] (safe=False) on write
    when the flag is set, so the data, not just the schema mapping, is actually
    downcast. This is guarded by the existing target_type != values.type check, so the
    supported us -> us path is untouched.

This matches the acceptance criteria left by @kevinjqliu on the earlier (stale-closed)
PR #1215, and the implementation pattern referenced there (the timestamp downcast from
#848).

One point for reviewers: this reuses the timestamp-named flag
downcast-ns-timestamp-to-us-on-write for time as well, which is what the issue
asks for and is consistent with prior maintainer guidance. If you'd prefer a dedicated
flag for the time type, I'm happy to change it.

Note on prior work / assignment: this issue has a long history of attempts
(#1188, #1206, #1215) that were all auto-closed by the stale bot for inactivity rather
than on merit, and there is no open PR for it today. It is still assigned to
@zaryab-ali from the original 2024 attempt, and @jaimeferj later offered to revive it.
I picked it up because it has been inactive for a long time and users are still asking
for it; happy to defer or coordinate if either of you is still working on it.

Are these changes tested?

Yes, with unit tests (no Docker/Spark required):

  • tests/io/test_pyarrow_visitor.py
    • test_pyarrow_time64_us_to_iceberg - us still maps to TimeType() (unchanged).
    • test_pyarrow_time64_ns_to_iceberg - updated: without the flag, ns now raises a
      TypeError with the new downcast-ns-timestamp-to-us-on-write guidance message.
    • test_pyarrow_time64_ns_to_iceberg_downcast (new) - with the flag, ns maps to
      TimeType() and round-trips back to pa.time64("us").
  • tests/io/test_pyarrow.py
    • test__to_requested_schema_time_ns_downcast (new) - a time64[ns] column is cast
      to time64[us] on write with the flag, values preserved.
    • test__to_requested_schema_time_ns_without_downcast_raises_exception (new) -
      without the flag, the projection raises
      Unsupported schema projection from time64[ns] to time64[us].

All five pass; make lint (ruff, ruff-format, mypy, license headers, uv-lock) is
green and no dependencies / uv.lock change. I also verified end-to-end against a
local SqlCatalog (sqlite metadata + local-FS warehouse): with
PYICEBERG_DOWNCAST_NS_TIMESTAMP_TO_US_ON_WRITE=true, creating, appending, and
scanning a table with a time64[ns] column writes and reads it back as time64[us]
with values intact; without the flag, the write fails with a clear error citing the
config property.

The Docker/Spark integration suite was not run in my environment; the behavior is
covered by the unit and SqlCatalog tests above.

Are there any user-facing changes?

Yes. Writing a PyArrow table with a time64[ns] column no longer hard-fails:

  • With downcast-ns-timestamp-to-us-on-write set, the column is downcast to Iceberg
    time (microseconds), consistent with how timestamp[ns] is already handled.
  • Without it, the error message now explains how to enable the downcast instead of
    just reporting an unsupported type.

PyArrow time64[ns] previously raised "Unsupported type: time64[ns]" because
Iceberg's time type is microsecond precision by spec. Mirror the existing
ns -> us timestamp handling for time:

- _ConvertToIceberg.primitive: when a time64[ns] is encountered, downcast to
  TimeType() (with a warning) if downcast-ns-timestamp-to-us-on-write is set,
  otherwise raise a TypeError pointing at that config property.
- ArrowProjectionVisitor._cast_if_needed: add a TimeType branch so a time64[ns]
  array is actually cast to time64[us] on write when the flag is set.

Adds unit tests for the schema conversion (us kept, ns error, ns downcast) and
the write/cast path (ns -> us with the flag, error without it).
@anxkhn

anxkhn commented Jun 28, 2026

Copy link
Copy Markdown
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[feature request] Support Time64Type[ns]

1 participant