lmm
Here are 119 public repositories matching this topic...
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
-
Updated
Nov 7, 2024 - Python
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
-
Updated
Aug 5, 2025 - Python
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
-
Updated
Oct 14, 2025 - Python
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025
-
Updated
May 26, 2025 - Python
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
-
Updated
Aug 5, 2025 - Python
Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024 Best Paper]
-
Updated
Jan 3, 2026 - Python
🤖 Discord AI assistant with OpenAI, Gemini, Claude & DeepSeek integration, multilingual support, multimodal chat, image generation, web search, and deep thinking | 一个强大的 Discord AI 助手,整合多种顶级 AI 模型,支持多语言、多模态交流、图片生成、联网搜索和深度思考
-
Updated
May 13, 2026 - JavaScript
😎 curated list of awesome LMM hallucinations papers, methods & resources.
-
Updated
Mar 23, 2024
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
-
Updated
Oct 10, 2025 - Python
[ICLR 2025] What do we expect from LMMs as AIGI evaluators and how do they perform?
-
Updated
Feb 3, 2025
🌋👵🏻 Yo'LLaVA: Your Personalized Language and Vision Assistant (NeurIPS 2024)
-
Updated
Mar 26, 2025 - Python
[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
-
Updated
Apr 14, 2025 - Python
[ICLR'26] Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology
-
Updated
Jan 26, 2026 - Python
Make Large Multimodal Models excel in object detection, ICCV 2025
-
Updated
Aug 1, 2025 - Python
Improve this page
Add a description, image, and links to the lmm topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the lmm topic, visit your repo's landing page and select "manage topics."
