Sunbelt Computer Software

Runhao Li

Building intelligent multi-agent LLM systems

LLM Agent Engineer · MS CS @ University of Southern California

About

I design and post-train multi-agent LLM systems — from learned inter-agent communication protocols to data-centric alignment pipelines to production vLLM serving stacks. My focus is on agent teams that are reliable, interpretable, and cheap to deploy end-to-end.

Current interests: multi-agent orchestration · post-training (DPO / KTO / GRPO) · vLLM multi-LoRA serving · agent evaluation harnesses · learned latent communication protocols.

Featured Projects

latent-agent-team · Budgeted Multi-Agent Communication

Five-agent team (Planner · Retriever · Browser · Verifier · Memory) that replaces natural-language inter-agent messages with learned latent channels — continuous embeddings or VQ codes with an adaptive bitrate scheduler.

Results · Mind2Web 81.5% ElemAcc · WebShop 72.4% SR · AgentBench 66.8% SR

acm-icl · Autonomy-Calibrated Multi-Agent In-Context Learning

Four-stage inference pipeline (Solver → Skeptic → Verifier → Calibrated Judge) with DD-CoT structured reasoning and per-peer reliability scoring for epistemic robustness under adversarial peer pressure.

Results · 73.9% average across 5 peer-pressure benchmarks · +13.7 pp over strongest multi-agent-debate baseline (MAD)

dmapo · Data-centric Multi-Agent Preference Optimization

Six-stage data-centric alignment pipeline — prompts → on-policy generation → three-judge multi-agent scoring (Qwen3-8B) → process critic → confidence gating → KTO. Unified trainer supporting DPO / KTO / ORPO / SimPO / SFT.

Results · Mistral-7B on only 1,871 gated examples (3.45% accept rate) beats every baseline trained on 10–20k — MT-Bench 7.62 · AlpacaEval 96.3% · win-rate 85.3% vs. 68.2% best baseline

More Agent Research

Tech Stack

LLM / Agent Frameworks

Models

Retrieval / RAG

Training / Post-Training

Deployment / Serving

Evaluation / MLOps

General

GitHub Stats

Open to full-time LLM Agent Engineer roles · 2026

runhaoli@usc.edu · LinkedIn · GitHub

Repo	Summary
updr-reasoning	Uncertainty-Prompted Debate and Repair — adaptive-compute multi-persona reasoning with uncertainty-gated self-repair
RAMTL	Role-Adaptive Multi-Tool Learning — single-backbone multi-role agent framework for tool use and function calling
DEAMS	Decentralized Epistemic Alignment for Multimodal Swarms — MA-GRPO across heterogeneous Qwen-VL / InternVL agents
PAGC	Partner-Adaptive Grounded Communication — cooperative MARL with emergent text-grounded communication
KTM-WM	Training-free kernel-trick world models for LLM agent planning (beam / MPC / CEM planners)

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runhao li runhaoli-creator

Block or report runhaoli-creator