LLM Agent Engineer · MS CS @ University of Southern California
I design and post-train multi-agent LLM systems — from learned inter-agent communication protocols to data-centric alignment pipelines to production vLLM serving stacks. My focus is on agent teams that are reliable, interpretable, and cheap to deploy end-to-end.
Current interests: multi-agent orchestration · post-training (DPO / KTO / GRPO) · vLLM multi-LoRA serving · agent evaluation harnesses · learned latent communication protocols.
latent-agent-team · Budgeted Multi-Agent Communication
Five-agent team (Planner · Retriever · Browser · Verifier · Memory) that replaces natural-language inter-agent messages with learned latent channels — continuous embeddings or VQ codes with an adaptive bitrate scheduler.
Results · Mind2Web 81.5% ElemAcc · WebShop 72.4% SR · AgentBench 66.8% SR
acm-icl · Autonomy-Calibrated Multi-Agent In-Context Learning
Four-stage inference pipeline (Solver → Skeptic → Verifier → Calibrated Judge) with DD-CoT structured reasoning and per-peer reliability scoring for epistemic robustness under adversarial peer pressure.
Results · 73.9% average across 5 peer-pressure benchmarks · +13.7 pp over strongest multi-agent-debate baseline (MAD)
dmapo · Data-centric Multi-Agent Preference Optimization
Six-stage data-centric alignment pipeline — prompts → on-policy generation → three-judge multi-agent scoring (Qwen3-8B) → process critic → confidence gating → KTO. Unified trainer supporting DPO / KTO / ORPO / SimPO / SFT.
Results · Mistral-7B on only 1,871 gated examples (3.45% accept rate) beats every baseline trained on 10–20k — MT-Bench 7.62 · AlpacaEval 96.3% · win-rate 85.3% vs. 68.2% best baseline
LLM / Agent Frameworks
Models
Retrieval / RAG
Training / Post-Training
Deployment / Serving
Evaluation / MLOps
General
Open to full-time LLM Agent Engineer roles · 2026
