iframe-proxy

randydl · 2026-06-24T11:21:33Z

Fix: MiniCPM-V 4.6 training hangs on text-only samples with DeepSpeed

Problem

When training MiniCPM-V 4.6 with DeepSpeed ZeRO, the training process hangs if the dataset contains pure-text samples (no image/video). This is because:

For text-only samples, MiniCPMV4_6Model.forward() skips the vision encoder (vision_tower + merger) entirely since pixel_values and pixel_values_videos are both None.
Under DeepSpeed ZeRO, parameters are sharded across GPUs and only gathered on-demand via all-gather when a computation touches them.
When one GPU processes a text-only sample (no vision compute) while another processes an image sample (needs vision parameters), the all-gather synchronization deadlocks — one side never triggers the gather that the other side is waiting for.

Solution

Add a _post_encode method to MiniCPMV4_6Template that detects text-only samples under DeepSpeed and runs a minimal dummy image through the full vision pipeline (vision_tower → merger). The dummy features are then zeroed out via image_embeds.mean() * 0. and added to the text embeddings, which:

Forces DeepSpeed to all-gather all vision model parameters, preventing the deadlock
Is mathematically a no-op (adds zero), so it does not affect training results

The dummy image uses the smallest valid patch grid (target_sizes=[[4, 4]]), which works for both 16x and 4x downsample modes, producing only 1 visual token in 16x mode — negligible compute overhead.

Changes

swift/template/templates/minicpm.py: Add _post_encode to MiniCPMV4_6Template, add is_deepspeed_enabled import

…eepSpeed

gemini-code-assist

Code Review

This pull request introduces a _post_encode method in the MiniCPM template to handle cases during training where multimodal inputs (images/videos) are absent while DeepSpeed is enabled. It generates dummy vision embeddings to prevent DeepSpeed training issues. The review feedback suggests improving robustness by safely retrieving input_ids using .get() to avoid potential KeyErrors, and safely accessing the dtype of the vision tower to prevent AttributeErrors.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-24T11:22:43Z

+            input_ids = inputs['input_ids']
+            base_model = self.get_base_model(model)
+            inputs_embeds = base_model.get_input_embeddings()(input_ids)
+            patch_size = base_model.config.vision_config.patch_size
+            dummy_pv = torch.zeros(
+                1, 3, 4 * patch_size, 4 * patch_size,
+                device=inputs_embeds.device, dtype=base_model.vision_tower.dtype)


To improve robustness and prevent potential runtime errors:

Use inputs.get('input_ids') instead of direct key access to avoid a KeyError if input_ids is missing or None.

Standard PyTorch nn.Module objects do not have a dtype attribute. To avoid an AttributeError if vision_tower is wrapped or does not inherit from PreTrainedModel, safely retrieve the dtype using getattr(base_model.vision_tower, 'dtype', inputs_embeds.dtype).

input_ids = inputs.get('input_ids') if input_ids is None: return inputs base_model = self.get_base_model(model) inputs_embeds = base_model.get_input_embeddings()(input_ids) patch_size = base_model.config.vision_config.patch_size vision_dtype = getattr(base_model.vision_tower, 'dtype', inputs_embeds.dtype) dummy_pv = torch.zeros( 1, 3, 4 * patch_size, 4 * patch_size, device=inputs_embeds.device, dtype=vision_dtype)

[bugfix] Fix MiniCPM-V 4.6 training hangs on text-only samples with D…

42db347

…eepSpeed

gemini-code-assist Bot reviewed Jun 24, 2026

View reviewed changes

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: MiniCPM-V 4.6 training hangs on text-only samples with DeepSpeed#9639

Fix: MiniCPM-V 4.6 training hangs on text-only samples with DeepSpeed#9639
randydl wants to merge 1 commit into
modelscope:mainfrom
randydl:dev

randydl commented Jun 24, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

randydl commented Jun 24, 2026

Fix: MiniCPM-V 4.6 training hangs on text-only samples with DeepSpeed

Problem

Solution

Changes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant