Turn your webcam into a real-time AI art studio.
FastFlux2 Realtime Editor is a blazing-fast, browser-based realtime image-to-image generation tool powered by FLUX.2-klein-4B and optimized with FlashAttention-3 / SageAttention.
Whether you're livestreaming, creating content, or just having fun with AI art, this tool transforms your webcam or screen capture into stylized artwork in real-time with just 2 inference steps.
- ⚡ Ultra-low latency: ~66-75ms per frame on single H100 with 2-step inference
- 🚀 Single-GPU H100 throughput: 15.6 FPS with FA3 (latest measured)
- 🎨 21 built-in presets: Anime, Pixar, Ghibli, LEGO, Neon, Accessories & more
- 🖥️ Webcam & Screen support: Stream your camera or entire desktop
- 🧠 Smart caching: Prompt embeddings cached for repeated use
- 🧠 Auto attention backend selection:
FA3 > Sage > Native - 🚀 FA3 recommended: best measured throughput in current setup
- 🌐 Zero client install: Runs entirely in your browser
Tian Ye
PhD Student @ HKUST(Guangzhou)
🐙 About ME
- NVIDIA GPU with CUDA 12.6+ (RTX 4090/H100 recommended)
- Python 3.10+
- 24GB VRAM (for FP16 inference)
# Clone the repository
git clone https://github.com/Owen718/flux-stream-editor.git
cd flux-stream-editor
# Install dependencies
pip install torch==2.6.0+cu126 torchvision==0.21.0+cu126 --index-url https://download.pytorch.org/whl/cu126
pip install diffusers transformers accelerate pillow fastapi uvicorn
pip install flash_attn_3==3.0.0b1 # Recommended (FA3)
pip install sageattention==2.2.0 --no-build-isolation # Fallback backend
pip install cache-dit # For transformer caching optimization# Recommended: auto backend selection (FA3 > Sage > Native), single GPU #1
CUDA_VISIBLE_DEVICES=1 python -m realtime_editing_fast.realtime_img2img_server \
--host 127.0.0.1 \
--port 6006 \
--num-inference-steps 2
# Optional: force FA3
CUDA_VISIBLE_DEVICES=1 python -m realtime_editing_fast.realtime_img2img_server \
--host 127.0.0.1 \
--port 6006 \
--num-inference-steps 2 \
--attention-backend fa3Navigate to http://localhost:6006 and click "Load Model" → "Start".
python -m realtime_editing_fast.realtime_img2img_server \
--host 0.0.0.0 \ # Server host
--port 6006 \ # Server port
--num-inference-steps 2 \ # Number of denoising steps (1-4 recommended)
--attention-backend auto \ # Attention backend: auto, fa3, sage, native, none
--compile-transformer \ # Enable torch.compile (faster but slower startup)
--width 512 \ # Output width
--height 512 # Output height| Backend | Speed | Quality | Notes |
|---|---|---|---|
auto |
⭐⭐⭐⭐ Recommended | Excellent | Auto-selects FA3 > Sage > Native |
fa3 |
⭐⭐⭐⭐ Fastest (current) | Excellent | Requires flash_attn_3 |
sage |
⭐⭐⭐ Fast | Excellent | Requires SageAttention 2.2.0+ |
native |
⭐⭐ Compatible | Excellent | PyTorch native SDPA |
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Browser UI │────▶│ FastAPI Server │────▶│ FLUX.2 Model │
│ (Webcam/Screen)│ │ (GPU Optimized) │ │ (2-Step Infer) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
▼
┌──────────────┐
│ SageAttention│ ◀── 30% Speedup
│ Cache-DiT │ ◀── Skip redundant blocks
│ Torch.Compile│ ◀── Graph optimization
└──────────────┘
- Use
autobackend (default): it triesFA3 > Sage > Native. - Install FA3:
flash_attn_3==3.0.0b1currently gives the best throughput. - Enable torch.compile: Essential for reaching latest H100 throughput targets (RTX 4090 figures are old reference values)
- Prompt Caching: Same prompts reuse cached embeddings (0ms overhead)
- 2-Step Inference: Perfect balance of speed & quality for real-time stylization
Tip: In practice, 3-step inference is much better than 2-step in both visual quality and instruction following, but the FPS drops noticeably.
🚀 Latest: Single H100 measured 15.6 FPS with FA3 at 2-step inference.
🕘 Old reference: RTX 4090 numbers are kept for historical comparison.
Note: First inference includes model loading (~10s) and torch.compile warmup (~5-10s). Subsequent requests achieve full speed.
Contributions are welcome! Areas we'd love help with:
- Mobile UI optimization
- Better processing mode
- Quant-based acceleration
- WebRTC streaming support
This project is released under a Non-Commercial License. Commercial use is prohibited without explicit permission from the author. See LICENSE.
- Black Forest Labs for FLUX.2 models
- SageAttention team for the optimized attention kernel
- Diffusers team for the inference pipeline
- Cache-DiT for transformer block caching
Made with ❤️ for the AI art community

