{{ message }}
[Feature Request] Added the baidu/ERNIE-4.5-0.3B-PT model to serve natively in MAX#6717
Open
PMukund219 wants to merge 8 commits into
Open
[Feature Request] Added the baidu/ERNIE-4.5-0.3B-PT model to serve natively in MAX#6717PMukund219 wants to merge 8 commits into
PMukund219 wants to merge 8 commits into
Conversation
|
All contributors have signed the CLA ✍️ ✅ |
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Linked issue
Fixes #6685
Type of change
Motivation
ERNIE-4.5 (Ernie4_5ForCausalLM) is Baidu's family of dense decoder-only LLMs,
released under Apache 2.0.
The dense checkpoints follow a standard Llama-style architecture
with GQA and RoPE.
It uses GPT-J style interleaved rotary pairing and store rope_theta at the top level of config.json. The implementation also includes a forward-compatible fallback for checkpoints that may use the newer nested rope_parameters dict style.
MAX had no native support for the
Ernie4_5ForCausalLMarchitecture class.This PR adds an implementation so
baidu/ERNIE-4.5-0.3B-PT(and siblingcheckpoints) can be served directly via
max servewithout any custom flagsafter registration.
What changed
Added
max/python/max/pipelines/architectures/ernie4_5/— a newarchitecture package for the
Ernie4_5ForCausalLMmodel family.New files:
__init__.pyARCHITECTURES = [ernie45_arch]for MAX loader discoveryarch.pySupportedArchitectureregistration — name matchesarchitecturesfield inconfig.jsonmodel_config.pyERNIE45Configdataclass — parses HuggingFace config including nestedrope_parametersernie45.pyERNIE45/ERNIE45TextModelmodel.pyERNIE45Model—PipelineModelWithKVCachewrapper, input preparation, output unpackingweight_adapters.pymodel.*→language_model.*, handleslm_head.weighttying edge case, casts dtypelayers/attention.pyrope_split_store_raggedlayers/transformer_block.pyERNIE45TransformerBlock— standard pre-norm decoder blocklayers/rope.pyfreqs_cisas[cos0,sin0,cos1,sin1,...]interleaved pairs using MAX graph opsBUILD.bazelernie4_5Python library and its dependencies for Bazel builds (modeled onllama3_modulev3/BUILD.bazeldue toexperimental/nndependency)Architecture highlights:
GQA (16 Q / 2 KV heads, group=8), head_dim=128, vocab_size=103424
interleaved=Trueinrope_split_store_ragged),distinct from the standard half-rotation used by most Llama-family models
MHAMaskVariant.CAUSAL_MASK)tie_word_embeddings=true), with a weightadapter edge case to remap a standalone
lm_head.weightif presentTesting
Verified on
baidu/ERNIE-4.5-0.3B-PTwith GPU serving:Smoke test — model generates correctly:
Output:
GSM8K accuracy vs vLLM reference:
Checklist
or this is a trivial fix that does not need prior approval
architectures
./bazelw run formatto format my changesAssisted-by:trailer in my commit message or this PR descriptionAssisted-by: AI