Sunbelt Computer Software

⚡ FastFlux2 Realtime Editor

Turn your webcam into a real-time AI art studio.

🎯 What is this?

FastFlux2 Realtime Editor is a blazing-fast, browser-based realtime image-to-image generation tool powered by FLUX.2-klein-4B and optimized with FlashAttention-3 / SageAttention.

Whether you're livestreaming, creating content, or just having fun with AI art, this tool transforms your webcam or screen capture into stylized artwork in real-time with just 2 inference steps.

Key Highlights

⚡ Ultra-low latency: ~66-75ms per frame on single H100 with 2-step inference
🚀 Single-GPU H100 throughput: 15.6 FPS with FA3 (latest measured)
🎨 21 built-in presets: Anime, Pixar, Ghibli, LEGO, Neon, Accessories & more
🖥️ Webcam & Screen support: Stream your camera or entire desktop
🧠 Smart caching: Prompt embeddings cached for repeated use
🧠 Auto attention backend selection: FA3 > Sage > Native
🚀 FA3 recommended: best measured throughput in current setup
🌐 Zero client install: Runs entirely in your browser

👤 Author

Tian Ye
PhD Student @ HKUST(Guangzhou)
🐙 About ME

🚀 Quick Start

Prerequisites

NVIDIA GPU with CUDA 12.6+ (RTX 4090/H100 recommended)
Python 3.10+
24GB VRAM (for FP16 inference)

Installation

# Clone the repository
git clone https://github.com/Owen718/flux-stream-editor.git
cd flux-stream-editor

# Install dependencies
pip install torch==2.6.0+cu126 torchvision==0.21.0+cu126 --index-url https://download.pytorch.org/whl/cu126
pip install diffusers transformers accelerate pillow fastapi uvicorn
pip install flash_attn_3==3.0.0b1  # Recommended (FA3)
pip install sageattention==2.2.0 --no-build-isolation  # Fallback backend
pip install cache-dit  # For transformer caching optimization

Start the Server

# Recommended: auto backend selection (FA3 > Sage > Native), single GPU #1
CUDA_VISIBLE_DEVICES=1 python -m realtime_editing_fast.realtime_img2img_server \
  --host 127.0.0.1 \
  --port 6006 \
  --num-inference-steps 2

# Optional: force FA3
CUDA_VISIBLE_DEVICES=1 python -m realtime_editing_fast.realtime_img2img_server \
  --host 127.0.0.1 \
  --port 6006 \
  --num-inference-steps 2 \
  --attention-backend fa3

Open in Browser

Navigate to http://localhost:6006 and click "Load Model" → "Start".

⚙️ Configuration Options

Server Arguments

python -m realtime_editing_fast.realtime_img2img_server \
  --host 0.0.0.0 \              # Server host
  --port 6006 \                 # Server port
  --num-inference-steps 2 \     # Number of denoising steps (1-4 recommended)
  --attention-backend auto \    # Attention backend: auto, fa3, sage, native, none
  --compile-transformer \       # Enable torch.compile (faster but slower startup)
  --width 512 \                 # Output width
  --height 512                  # Output height

Attention Backends

Backend	Speed	Quality	Notes
`auto`	⭐⭐⭐⭐ Recommended	Excellent	Auto-selects `FA3 > Sage > Native`
`fa3`	⭐⭐⭐⭐ Fastest (current)	Excellent	Requires `flash_attn_3`
`sage`	⭐⭐⭐ Fast	Excellent	Requires SageAttention 2.2.0+
`native`	⭐⭐ Compatible	Excellent	PyTorch native SDPA

🏗️ Architecture

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│   Browser UI    │────▶│  FastAPI Server  │────▶│  FLUX.2 Model   │
│  (Webcam/Screen)│     │  (GPU Optimized) │     │  (2-Step Infer) │
└─────────────────┘     └──────────────────┘     └─────────────────┘
                               │
                               ▼
                        ┌──────────────┐
                        │ SageAttention│  ◀── 30% Speedup
                        │ Cache-DiT    │  ◀── Skip redundant blocks
                        │ Torch.Compile│  ◀── Graph optimization
                        └──────────────┘

Performance Tips

Use auto backend (default): it tries FA3 > Sage > Native.
Install FA3: flash_attn_3==3.0.0b1 currently gives the best throughput.
Enable torch.compile: Essential for reaching latest H100 throughput targets (RTX 4090 figures are old reference values)
Prompt Caching: Same prompts reuse cached embeddings (0ms overhead)
2-Step Inference: Perfect balance of speed & quality for real-time stylization

Tip: In practice, 3-step inference is much better than 2-step in both visual quality and instruction following, but the FPS drops noticeably.

📊 Benchmarks

🎯 Measured Performance

🚀 Latest: Single H100 measured 15.6 FPS with FA3 at 2-step inference.
🕘 Old reference: RTX 4090 numbers are kept for historical comparison.

Note: First inference includes model loading (~10s) and torch.compile warmup (~5-10s). Subsequent requests achieve full speed.

🤝 Contributing

Contributions are welcome! Areas we'd love help with:

Mobile UI optimization
Better processing mode
Quant-based acceleration
WebRTC streaming support

📜 License

This project is released under a Non-Commercial License. Commercial use is prohibited without explicit permission from the author. See LICENSE.

🙏 Acknowledgments

Black Forest Labs for FLUX.2 models
SageAttention team for the optimized attention kernel
Diffusers team for the inference pipeline
Cache-DiT for transformer block caching

Made with ❤️ for the AI art community

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
docs		docs
realtime_editing_fast		realtime_editing_fast
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
run.sh		run.sh

GPU	Configuration	Infer Latency	Infer FPS	Status
H100 (single GPU)	FA3 + Compile (Transformer + VAE Encode + VAE Decode)	~64ms class	15.6 FPS 🚀	Latest
H100 (single GPU)	SageAttention + Compile (Transformer + VAE Encode + VAE Decode)	~79ms class	~12.6 FPS	Reference
H100	Native + Compile	~100-120ms	8-10 FPS	Old
RTX 4090	SageAttention + Compile	~150-200ms	5+ FPS	Old
RTX 4090	Native + Compile	~180-220ms	4-5 FPS	Old

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚡ FastFlux2 Realtime Editor

🎯 What is this?

Key Highlights

👤 Author

🚀 Quick Start

Prerequisites

Installation

Start the Server

Open in Browser

⚙️ Configuration Options

Server Arguments

Attention Backends

🏗️ Architecture

Performance Tips

📊 Benchmarks

🎯 Measured Performance

🤝 Contributing

📜 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Sunbelt Computer Software

PL/B Language Development and Support

Folders and files

Latest commit

History

Repository files navigation

⚡ FastFlux2 Realtime Editor

🎯 What is this?

Key Highlights

👤 Author

🚀 Quick Start

Prerequisites

Installation

Start the Server

Open in Browser

⚙️ Configuration Options

Server Arguments

Attention Backends

🏗️ Architecture

Performance Tips

📊 Benchmarks

🎯 Measured Performance

🤝 Contributing

📜 License

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages