{{ message }}
[Feature Request] Native MAX Serving Support for parasail-ai/GritLM-7B-vllm (GRITLM Architecture)#6713
Open
Tharun-Kumar-McW wants to merge 7 commits into
Open
[Feature Request] Native MAX Serving Support for parasail-ai/GritLM-7B-vllm (GRITLM Architecture)#6713Tharun-Kumar-McW wants to merge 7 commits into
Tharun-Kumar-McW wants to merge 7 commits into
Conversation
|
All contributors have signed the CLA ✍️ ✅ |
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Linked issue
Fixes #6684
Type of change
Motivation
GritLM-7B is a Mistral-7B-based model trained with Generative Representational
Instruction Tuning (GRIT), enabling both high-quality text generation and dense
vector embeddings from a single model. It is widely used in retrieval-augmented
generation (RAG) pipelines and semantic search applications.
MAX had no native support for the
GritLMarchitecture class. This PR adds a implementation soparasail-ai/GritLM-7B-vllmcan be served directly viamax servewithout anycustom flags after registration.
What changed
Added
max/python/max/pipelines/architectures/gritlm/— a new ModuleV3architecture package for the
GritLMmodel family.New files:
__init__.pyARCHITECTURES = [gritlm_arch]for MAX loader discoveryarch.pySupportedArchitectureregistration — name matchesarchitecturesfield inconfig.jsonmodel_config.pyGritLMConfigdataclass — parses HuggingFace config includingsliding_windowgritlm.pyGritLM/GritLMTextModelModuleV3 graph (CausalLM path only)model.pyGritLMModel—PipelineModelWithKVCachewrapper, input preparation, output unpackingweight_adapters.pymodel.*→language_model.*, drops pooling head weights, casts dtypelayers/attention.pyGritLMAttention— GQA withSLIDING_WINDOW_CAUSAL_MASKon every layerlayers/transformer_block.pyGritLMTransformerBlock— standard pre-norm decoder blockBUILD.bazelgritlmPython library and its dependencies for Bazel builds.Architecture highlights:
SwiGLU MLP, RMSNorm.
sliding_window=4096) usingflash_attention_raggedwithMHAMaskVariant.SLIDING_WINDOW_CAUSAL_MASK.rope_theta=10000, no scaling).lm_head(tie_word_embeddings=false).gritlm_pooling_head.weightis dropped by theweight adapter since MAX serves text generation alone for this model.
Testing
Verified on
parasail-ai/GritLM-7B-vllmwith GPU serving:Smoke test — model generates correctly :
Output :
GSM8K accuracy vs vLLM reference :
Checklist
or this is a trivial fix that does not need prior approval
architectures
./bazelw run formatto format my changesAssisted-by:trailer in my commit message or this PR descriptionAssisted-by: AI