preference-optimization

Here are 27 public repositories matching this topic...

general-preference / general-preference-model

[ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment (https://arxiv.org/abs/2410.02197)

alignment large-language-models rlhf preference-modeling preference-optimization

Updated Jun 15, 2026
Python

iBacklight / PipelineLLM

Star

PipelineLLM 是一个系统性的大语言模型（LLM）后训练学习项目，涵盖从监督微调（SFT）到偏好优化（DPO）、强化学习（RLHF/PPO/GRPO）再到持续学习（Continual Learning)的完整技术栈。

reinforcement-learning lora fine-tuning post-training continual-learning sft rlhf llm-reasoning preference-optimization llm-infrastructure llm-processing

Updated Jan 16, 2026
Python

sahsaeedi / TPO

Star

[TMLR] Triple Preference Optimization

alignment large-language-models rlhf preference-optimization

Updated Feb 19, 2025
Python

s-vco / s-vco

Star

Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images

alignment-algorithms vision-language-models preference-optimization

Updated Jun 4, 2025
Python

warlockee / oxRL

Star

A lightweight post-training framework for LLMs and VLMs. 51 algorithms, 38 verified models. Scales with DeepSpeed, vLLM, and Ray.

reinforcement-learning alignment post-training dpo deepspeed rlhf vllm llm-training preference-optimization grpo

Updated May 6, 2026
Python

RUCKBReasoning / DPO_Text2SQL

Star

[ACL 2025] Uncovering the Impact of Chain-of-Thought Reasoning for Direct Preference Optimization: Lessons from Text-to-SQL

text-to-sql nl2sql dpo text2sql preference-optimization

Updated Oct 9, 2025
Python

Sreyan88 / Synthio

Star

Code for ICLR 2025 Paper: Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data

audio audio-classification synthetic-data audio-generation large-language-models preference-optimization

Updated Mar 31, 2025
Python

JIA-Lab-research / TGDPO

Star

[ICML 2025] TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization

alignment preference-learning large-language-models llm rlhf preference-alignment direct-preference-optimization preference-optimization

Updated Jul 15, 2025
Python

sahsaeedi / DCPO-T2I

Star

[TMLR] Dual Caption Preference Optimization

alignment diffusion-models rlhf preference-optimization

Updated Feb 12, 2025
Python

DtYXs / Pre-DPO

Star

Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model

alignment large-language-models preference-optimization

Updated Apr 23, 2025
Python

JingbiaoMei / ExPO-HM

Star

🔬 Official implementation of ExPO-HM: Learning to Explain-then-Detect for Hateful Meme Detection (ICLR 2026). Novel multimodal RL approach for interpretable and explainable content moderation.

multimodal-learning explainable-ai content-moderation vision-language-models preference-optimization grpo iclr-2026 hateful-meme-detection multimodal-rl

Updated Mar 1, 2026
Python

pilancilab / COALA

Star

Convex Optimization for Alignment and Preference Learning on a Single GPU

convex-optimization convex preference-learning llms preference-optimization

Updated May 28, 2026
Python

This is my personal home rig for serious LLM experimentation. I built it to test models head-to-head, create custom evaluation rubrics, automatically improve prompts based on the previous run’s results, and generate high-quality synthetic training data. Everything runs locally first (Ollama by default), with optional cloud support. logged locally.

frontend-web ab-testing evaluation-metrics human-in-the-loop evaluation-framework grading-system local-first synthetic-data-generation dataset-curation ollama llm-evaluation prompt-optimization preference-optimization rubric-based-evaluation

Updated Jun 15, 2026
Python

shaheennabi / open-posttraining-system

Sponsor

Star

Open-source research engineering project for building the end-to-end post-training stack for reasoning language models, including SFT, preference learning, RLHF/RLVR, evaluation, inference-time scaling, and scalable systems for frontier-level reasoning.

open-source evaluation inference text-generation benchmarks post-training rlhf reward-modeling preference-optimization supervised-fine-tuning inference-time-scaling open-post-training-system

Updated Jun 21, 2026
Jupyter Notebook

Yellow4Submarine7 / LLMDoctor

Star

🩺 Token-Level Flow-Guided Preference Optimization for Efficient Test-Time Alignment (AAAI 2026)

transformers pytorch alignment lora llm qwen preference-optimization aaai2026 test-time-alignment

Updated Jan 17, 2026
Python

YuanaHao / Awesome-Diffusion-RL

Star

A weekly updated awesome list of RL, RLHF, DPO, GRPO, reward models, and preference optimization for image and video diffusion generation.

reinforcement-learning image-generation awesome-list video-generation diffusion-models rlhf preference-optimization grpo

Updated May 19, 2026

martimfasantos / CustomPOs-for-SLMs

Star

Novel Preference Optimization Algorithms for state-of-the-art small LMs, enhancing performance in GenAI and NLP tasks

nlp evaluation preference-learning human-preferences llms gen-ai preference-optimization

Updated Jan 5, 2025
Python

jmagly / aiwg-training

Sponsor

Star

AIWG training-complete framework — corpus-to-dataset pipeline with SKILL.md agentic surface and optional Python runtime backend. Marketplace plugin for AIWG.

provenance synthetic-data training-data fine-tuning dpo model-cards decontamination dataset-curation sharegpt llm-training preference-optimization alpaca-format aiwg benchmark-contamination datasheets-for-datasets

Updated Apr 16, 2026
Python

runhaoli-creator / dmapo

Star

Direct multi-agent policy optimization — unified DPO/KTO/ORPO/SimPO framework.

multi-agent post-training dpo trl llm rlhf preference-optimization

Updated Apr 14, 2026
Python

Sonnet-Code / awesome-human-in-the-loop

Star

Curated list of papers, tools, datasets and resources for human-in-the-loop AI training — SFT, RLHF, preference optimization, red-teaming, evaluation.

awesome awesome-list model-evaluation human-in-the-loop red-teaming dpo ai-training llm rlhf preference-optimization

Updated Apr 21, 2026

Improve this page

Add a description, image, and links to the preference-optimization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the preference-optimization topic, visit your repo's landing page and select "manage topics."

Learn more

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

preference-optimization

Here are 27 public repositories matching this topic...

general-preference / general-preference-model

iBacklight / PipelineLLM

sahsaeedi / TPO

s-vco / s-vco

warlockee / oxRL

RUCKBReasoning / DPO_Text2SQL

Sreyan88 / Synthio

JIA-Lab-research / TGDPO

sahsaeedi / DCPO-T2I

DtYXs / Pre-DPO

JingbiaoMei / ExPO-HM

pilancilab / COALA

yuvhaim-gif / LLM_InSight

shaheennabi / open-posttraining-system

Yellow4Submarine7 / LLMDoctor

YuanaHao / Awesome-Diffusion-RL

martimfasantos / CustomPOs-for-SLMs

jmagly / aiwg-training

runhaoli-creator / dmapo

Sonnet-Code / awesome-human-in-the-loop

Improve this page

Add this topic to your repo