[Feature Request] Added the baidu/ERNIE-4.5-0.3B-PT model to serve natively in MAX by PMukund219 · Pull Request #6717 · modular/modular · GitHub
Skip to content

[Feature Request] Added the baidu/ERNIE-4.5-0.3B-PT model to serve natively in MAX#6717

Open
PMukund219 wants to merge 8 commits into
modular:mainfrom
PMukund219:my-fix
Open

[Feature Request] Added the baidu/ERNIE-4.5-0.3B-PT model to serve natively in MAX#6717
PMukund219 wants to merge 8 commits into
modular:mainfrom
PMukund219:my-fix

Conversation

@PMukund219

Copy link
Copy Markdown

Linked issue

Fixes #6685

Type of change

  • New feature or public API

Motivation

ERNIE-4.5 (Ernie4_5ForCausalLM) is Baidu's family of dense decoder-only LLMs,
released under Apache 2.0.
The dense checkpoints follow a standard Llama-style architecture
with GQA and RoPE.

It uses GPT-J style interleaved rotary pairing and store rope_theta at the top level of config.json. The implementation also includes a forward-compatible fallback for checkpoints that may use the newer nested rope_parameters dict style.

MAX had no native support for the Ernie4_5ForCausalLM architecture class.
This PR adds an implementation so baidu/ERNIE-4.5-0.3B-PT (and sibling
checkpoints) can be served directly via max serve without any custom flags
after registration.


What changed

Added max/python/max/pipelines/architectures/ernie4_5/ — a new
architecture package for the Ernie4_5ForCausalLM model family.

New files:

File Purpose
__init__.py Exports ARCHITECTURES = [ernie45_arch] for MAX loader discovery
arch.py SupportedArchitecture registration — name matches architectures field in config.json
model_config.py ERNIE45Config dataclass — parses HuggingFace config including nested rope_parameters
ernie45.py ERNIE45 / ERNIE45TextModel
model.py ERNIE45ModelPipelineModelWithKVCache wrapper, input preparation, output unpacking
weight_adapters.py Remaps model.*language_model.*, handles lm_head.weight tying edge case, casts dtype
layers/attention.py GQA with GPT-J style interleaved RoPE via rope_split_store_ragged
layers/transformer_block.py ERNIE45TransformerBlock — standard pre-norm decoder block
layers/rope.py Builds freqs_cis as [cos0,sin0,cos1,sin1,...] interleaved pairs using MAX graph ops
BUILD.bazel Defines the ernie4_5 Python library and its dependencies for Bazel builds (modeled on llama3_modulev3/BUILD.bazel due to experimental/nn dependency)

Architecture highlights:

  • 0.36B dense backbone: 18 layers, hidden_size=1024, intermediate_size=3072,
    GQA (16 Q / 2 KV heads, group=8), head_dim=128, vocab_size=103424
  • GPT-J style interleaved RoPE (interleaved=True in rope_split_store_ragged),
    distinct from the standard half-rotation used by most Llama-family models
  • rope_theta (500000.0) read from the top-level config field; the implementation also handles the nested rope_parameters dict style used by some newer checkpoint variants
  • Standard causal attention (MHAMaskVariant.CAUSAL_MASK)
  • Tied word embeddings by default (tie_word_embeddings=true), with a weight
    adapter edge case to remap a standalone lm_head.weight if present

Testing

Verified on baidu/ERNIE-4.5-0.3B-PT with GPU serving:

max serve \
  --model-path baidu/ERNIE-4.5-0.3B-PT \
  --custom-architectures architectures/ernie4_5

Smoke test — model generates correctly:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"baidu/ERNIE-4.5-0.3B-PT",
       "messages":[{"role":"user","content":"What is the capital of france?"}],
       "max_tokens":32,"temperature":0}'

Output:

The capital of France is Paris.

GSM8K accuracy vs vLLM reference:

Model Task Accuracy vs Reference
baidu/ERNIE-4.5-0.3B-PT gsm8k_cot_llama 0.362 98.3

Checklist

  • The linked issue above has been reviewed by a maintainer and is agreed-upon,
    or this is a trivial fix that does not need prior approval
  • PR is small and focused — single new architecture, no changes to existing
    architectures
  • I ran ./bazelw run format to format my changes
  • I added or updated tests to cover my changes
  • If AI tools assisted with this contribution, I have included an Assisted-by: trailer in my commit message or this PR description

Assisted-by: AI

@PMukund219 PMukund219 requested a review from a team as a code owner June 24, 2026 05:36
@github-actions

github-actions Bot commented Jun 24, 2026

Copy link
Copy Markdown

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@PMukund219

Copy link
Copy Markdown
Author

modular-cla-bot Bot added a commit to modular/cla that referenced this pull request Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Native MAX Serving Support for baidu/ERNIE-4.5-0.3B-PT (Ernie4_5ForCausalLM Architecture)

1 participant