Fix composite model export (Qwen3 and seq2seq models) by xieofxie · Pull Request #1037 · microsoft/winml-cli · GitHub
Skip to content

Fix composite model export (Qwen3 and seq2seq models)#1037

Open
xieofxie wants to merge 3 commits into
mainfrom
hualxie/fix_qwen3
Open

Fix composite model export (Qwen3 and seq2seq models)#1037
xieofxie wants to merge 3 commits into
mainfrom
hualxie/fix_qwen3

Conversation

@xieofxie

@xieofxie xieofxie commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Summary

Refactors winml export to support composite (seq2seq) models such as Qwen3, T5, and BART. Previously, running winml export on a composite model would fail because the command treated every model as a single-component export. Now, composite models are automatically detected and each sub-component (e.g. encoder / decoder / decoder-with-past) is exported separately into its own ONNX file.

Changes

src/winml/modelkit/loader/resolution.py

  • Added resolve_composite_components() — a shared entry point that resolves a composite model's _SUB_MODEL_CONFIG (sub-name → task mapping). Works with both explicit --task and auto-detected task paths.

src/winml/modelkit/commands/export.py

  • Extracted per-component export logic into _run_component_export() inner function (I/O tensor resolution, config assembly, model load, and ONNX export).
  • Added composite detection: when a model is composite, the command fans out into one export per sub-component, writing each to a stem-suffixed flat path <output_stem>_<sub_name>.onnx (matching the sibling winml config fan-out and the runtime's *_model.onnx discovery).
  • Non-composite models continue through the single-export path unchanged.
  • Intentional loud guards during composite resolution (empty registry RuntimeError / model-task incompatibility ValueError) are re-raised (the latter as a UsageError, mirroring winml config) instead of being masked as "not composite".
  • Each composite sub-component is loaded and exported independently because a composite's sub-tasks map to distinct model classes/heads; the per-component reload is intentional and documented inline.

src/winml/modelkit/commands/config.py

  • Switched to the new shared resolve_composite_components() helper, removing duplicated resolution logic.

src/winml/modelkit/export/htp/exporter.py

  • Added clear_sidecar() method to HTPExporter for cleaning up external data sidecar files after export.

src/winml/modelkit/onnx/__init__.py

  • Exported the new clear_sidecar symbol.

Tests

  • Added unit tests for composite export fan-out (flat stem-suffixed layout), the --input-specs rejection, that resolution errors surface instead of being swallowed, and resolve_composite_components.

@xieofxie xieofxie requested a review from a team as a code owner July 3, 2026 06:57

@DingmaomaoBJTU DingmaomaoBJTU left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the composite-export fan-out and the external-data cleanup. The refactor is clean overall — extracting _run_component_export and the shared resolve_composite_components reads well, and _cleanup_stale_external_data correctly keys off the final on-disk references, so it's safe whether the consolidated re-save ends up inline or external.

A few things worth considering (see inline):

  1. For composites, --output is silently treated as a directory, which is inconsistent with the winml config fan-out convention and with this PR's own description.
  2. The broad except Exception around composite detection swallows the intentional loud RuntimeError from _composite_registry().
  3. The full model is reloaded once per sub-component.

Nits (no inline):

  • The PR description says a clear_sidecar() method and a clear_sidecar export were added, but the code actually adds HTPExporter._cleanup_stale_external_data() and exports get_external_data_files. Please sync the description.
  • The composite --input-specs UsageError is raised after the Starting HTP export... banner prints; hoisting the check above the banner would read cleaner.
  • The new composite tests mock load_hf_model / resolve_export_config / export_onnx, so they verify the fan-out plumbing (paths, call counts) but not that the sub-models actually differ — worth a follow-up integration check.

(Note: I couldn't run the suite locally — this machine is win_arm64 and torch ships no matching wheel, so this is a static review.)

Comment thread src/winml/modelkit/commands/export.py Outdated
Comment thread src/winml/modelkit/commands/export.py
Comment thread src/winml/modelkit/commands/export.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants