iframe-proxy

Tharun-Kumar-McW · 2026-06-23T11:09:43Z

Linked issue

Type of change

New feature or public API

Motivation

GritLM-7B is a Mistral-7B-based model trained with Generative Representational
Instruction Tuning (GRIT), enabling both high-quality text generation and dense
vector embeddings from a single model. It is widely used in retrieval-augmented
generation (RAG) pipelines and semantic search applications.

MAX had no native support for the GritLM architecture class. This PR adds a implementation so parasail-ai/GritLM-7B-vllm can be served directly via max serve without any
custom flags after registration.

What changed

Added max/python/max/pipelines/architectures/gritlm/ — a new ModuleV3
architecture package for the GritLM model family.

New files:

Architecture highlights:

Mistral-7B backbone: 32 layers, hidden_size=4096, GQA (32 Q / 8 KV heads),
SwiGLU MLP, RMSNorm.
Sliding window attention on all 32 layers (sliding_window=4096) using
flash_attention_ragged with MHAMaskVariant.SLIDING_WINDOW_CAUSAL_MASK.
Standard Mistral RoPE (rope_theta=10000, no scaling).
Separate lm_head (tie_word_embeddings=false).
CausalLM path only — gritlm_pooling_head.weight is dropped by the
weight adapter since MAX serves text generation alone for this model.

Testing

Verified on parasail-ai/GritLM-7B-vllm with GPU serving:

max serve \
  --model-path parasail-ai/GritLM-7B-vllm \
  --custom-architectures architectures/gritlm \
  --max-batch-size 256 \
  --max-length 4096 \
  --quantization-encoding bfloat16

Smoke test — model generates correctly :

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"parasail-ai/GritLM-7B-vllm",
       "messages":[{"role":"user","content":"What is 2+2?"}],
       "max_tokens":32,"temperature":0}'

Output :

2+2 is equal to 4.

GSM8K accuracy vs vLLM reference :

Model	Task	Accuracy	vs Reference
parasail-ai/GritLM-7B-vllm	gsm8k_cot_llama	0.506	98.2%

Checklist

The linked issue above has been reviewed by a maintainer and is agreed-upon,
or this is a trivial fix that does not need prior approval
PR is small and focused — single new architecture, no changes to existing
architectures
I ran ./bazelw run format to format my changes
I added or updated tests to cover my changes
If AI tools assisted with this contribution, I have included an
Assisted-by: trailer in my commit message or this PR description

Assisted-by: AI

…y in MAX

github-actions · 2026-06-23T11:09:54Z

All contributors have signed the CLA ✍️ ✅
_{Posted by the CLA Assistant Lite bot.}

Tharun-Kumar-McW · 2026-06-23T13:33:59Z

[Feature] Added the parasail-ai/GritLM-7B-vllm model to serve nativel…

7b67bbf

…y in MAX

Tharun-Kumar-McW requested a review from a team as a code owner June 23, 2026 11:09

github-actions Bot added the waiting-on-review label Jun 23, 2026

Tharun-Kumar-McW added 3 commits June 23, 2026 18:06

[Fix] resolve type errors and Bazel dependency issues

cfb43ee

[Fix] resolve type errors

e1eda03

[Fix] resolve type errors

8b93142

[Fix] ran bazel format to fix format issues

61e1e00

modular-cla-bot Bot added a commit to modular/cla that referenced this pull request Jun 23, 2026

@Tharun-Kumar-McW has signed the CLA in modular/modular#6713

245dbf1

Tharun-Kumar-McW added 2 commits June 24, 2026 15:58

Merge branch 'modular:main' into my-fix

bf904bf

[Fix] Removed the ARCHITECTURE = [] and synced the repo

d36b793

File	Purpose
`__init__.py`	Exports `ARCHITECTURES = [gritlm_arch]` for MAX loader discovery
`arch.py`	`SupportedArchitecture` registration — name matches `architectures` field in `config.json`
`model_config.py`	`GritLMConfig` dataclass — parses HuggingFace config including `sliding_window`
`gritlm.py`	`GritLM` / `GritLMTextModel` ModuleV3 graph (CausalLM path only)
`model.py`	`GritLMModel` — `PipelineModelWithKVCache` wrapper, input preparation, output unpacking
`weight_adapters.py`	Remaps `model.` → `language_model.`, drops pooling head weights, casts dtype
`layers/attention.py`	`GritLMAttention` — GQA with `SLIDING_WINDOW_CAUSAL_MASK` on every layer
`layers/transformer_block.py`	`GritLMTransformerBlock` — standard pre-norm decoder block
`BUILD.bazel`	Defines the `gritlm` Python library and its dependencies for Bazel builds.

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Native MAX Serving Support for parasail-ai/GritLM-7B-vllm (GRITLM Architecture)#6713

[Feature Request] Native MAX Serving Support for parasail-ai/GritLM-7B-vllm (GRITLM Architecture)#6713
Tharun-Kumar-McW wants to merge 7 commits into
modular:mainfrom
Tharun-Kumar-McW:my-fix

Tharun-Kumar-McW commented Jun 23, 2026

Uh oh!

github-actions Bot commented Jun 23, 2026 •

edited

Loading

Uh oh!

Tharun-Kumar-McW commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

Tharun-Kumar-McW commented Jun 23, 2026

Linked issue

Motivation

What changed

Testing

Checklist

Uh oh!

github-actions Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tharun-Kumar-McW commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 23, 2026 •

edited

Loading