Sunbelt Computer Software

PL/B Language Development and Support

lmm · GitHub Topics · GitHub

#

lmm

Here are 119 public repositories matching this topic...

NVlabs / Eagle

Eagle: Frontier Vision-Language Models with Data-Centric Strategies

demo eagle llama lmm nvdia huggingface gpt4 large-language-models llm mllm llava lvlm llama3

Updated Jun 24, 2026
Python

BAAI-Agents / Cradle

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

ai gcc multimodality vlm cradle computer-control lmm grounding ai-agent large-language-models llm generative-ai vision-language-model ai-agents-framework general-computer-control personoid foundation-agent

Updated Nov 7, 2024
Python

mbzuai-oryx / groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

vision-and-language lmm foundation-models vision-language-model llm-agent

Updated Aug 5, 2025
Python

LLaVA-VL / LLaVA-Interactive-Demo

LLaVA-Interactive-Demo

multimodal lmm

Updated Jul 25, 2024
Python

tianyi-lab / HallusionBench

[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

benchmark benchmarks lmm hallucination gpt-4 large-language-models llm llava large-vision-language-models vlms gpt-4v

Updated Oct 14, 2025
Python

CircleRadon / TokenPacker

The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025

connector lmm mllm token-reduction visual-projector tokenpacker

Updated May 26, 2025
Python

mbzuai-oryx / Video-LLaVA

PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models

video transcription lmm grounding video-grounding llm video-conversation

Updated Aug 5, 2025
Python

TIGER-AI-Lab / Mantis

Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024 Best Paper]

language video vision mantis vlm multimodal lmm fuyu mllm llava-llama3 multi-image-understanding

Updated Jan 3, 2026
Python

Javis603 / Discord-AIBot

🤖 Discord AI assistant with OpenAI, Gemini, Claude & DeepSeek integration, multilingual support, multimodal chat, image generation, web search, and deep thinking | 一个强大的 Discord AI 助手，整合多种顶级 AI 模型，支持多语言、多模态交流、图片生成、联网搜索和深度思考

nodejs ai discord chatbot discord-bot gemini openai discord-js claude xai lmm llm chatgpt deepseek

Updated May 13, 2026
JavaScript

TideDra / VL-RLHF

A RLHF Infrastructure for Vision-Language Models

vlm lmm dpo llm rlhf mllm

Updated Nov 15, 2024
Python

xieyuquanxx / awesome-Large-MultiModal-Hallucination

😎 curated list of awesome LMM hallucinations papers, methods & resources.

multi-modal multimodal lmm hallucination

Updated Mar 23, 2024

Chenyu-Wang567 / MLLM-Tool

MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning

lmm gpt4 llm tool-agent

Updated Oct 10, 2025
Python

graphic-design-ai / graphist

Official Repo of Graphist

graphic-design hlg lmm llm mllm layout-generation

Updated Apr 23, 2024

Q-Future / A-Bench

[ICLR 2025] What do we expect from LMMs as AIGI evaluators and how do they perform?

evaluation lmm ai-generated-images

Updated Feb 3, 2025

WisconsinAIVision / YoLLaVA

🌋👵🏻 Yo'LLaVA: Your Personalized Language and Vision Assistant (NeurIPS 2024)

personalization lmms personalized lmm neurips llm llms llava multi-modal-models neurips2024

Updated Mar 26, 2025
Python

mbzuai-oryx / VideoGLaMM

[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

vision-and-language lmm foundation-models vision-language-model llm-agent cvpr2025

Updated Apr 14, 2025
Python

Haochen-Wang409 / TreeVGR

[ICLR'26] Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology

rl lmm grounding o3 llm mllm grounding-llms grpo thinking-with-image

Updated Jan 26, 2026
Python

uni-medical / GMAI-MMBench

GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI.

benchmark medical vlm lmm gmai llm medagi

Updated Dec 17, 2024

Haochen-Wang409 / ross3d

[ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness

iccv lmm llm mllm 3d-llms iccv2025

Updated Jul 22, 2025
Python

360CVGroup / LMM-Det

Make Large Multimodal Models excel in object detection, ICCV 2025

object-detection ovd lmm open-vocabulary-detection

Updated Aug 1, 2025
Python

Improve this page

Add a description, image, and links to the lmm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the lmm topic, visit your repo's landing page and select "manage topics."