GitHub - cpuimage/cpuimage · GitHub
Skip to content

cpuimage/cpuimage

Folders and files

Repository files navigation

Hey 👋🏽, I'm cpuimage

AI engineer working on AIGC, inference optimization, and audio/video/image algorithms.
I build real-world AI systems, accelerate models, and share open-source work here on GitHub.

If my projects help you, feel free to buy me a coffee. ☕️


⚡ What I Do | 我在做什么

  • AIGC engineering (Stable Diffusion, FLUX, SDXL, high-res synthesis)
  • Inference optimization (TensorRT, FP16, Flash Attention, async pipelines)
  • Audio/video/image algorithms (TTS, matting, OpenGL effects)
  • Training stability & numerical optimization
  • Multi-time CTO experience in AI companies

🧠 Professional Experience | 专业背景

  • 👨🏽‍💻 Worked at leading tech companies including Baidu, KingSoft, and others.
  • 🧩 Multi-time CTO for AI companies (AIGC, image generation, inference optimization).
  • 📱 Developed algorithms for multiple applications: ToolWiz Photos, Mypic, DOUPAI.
  • 💡 Delivered production-level AI technical customization and consulting services.

🚀 Research Progress & Achievements | 研究进展与成果

I work across Stable Diffusion, inference acceleration, training stability, and audio/video algorithms.

主要研究领域:大语言模型 (LLM)生成式 AI训练稳定性推理加速音视频算法

🧠 大语言模型与逻辑建模 (LLMs & Reasoning)

  • Efficient Semi-supervised Learning via Structural Regularization for Consistent Reasoning in LLMs
    • 基于半监督结构性正则化,通过约束特征与 Logits 的演变一致性抑制“死记硬背”,提升推理稳定性。
  • One-Pass LLM: From-Scratch Pre-training and SFT with Adaptive Gradient Modulation
    • 引入自适应梯度调节机制,实现从零预训练与 SFT 同步进行的高效单次训练方案。
  • Memory-Efficient LLM Training
    • 针对大规模语言模型的显存优化训练方案。
  • MozzyTokenizer: Adaptive Byte-Level Tokenizer
    • 自适应字节级分词器,优化输入端编码效率。
  • LLM from Scratch with PyTorch
    • 基于 PyTorch 框架从零构建大语言模型架构。

🎨 生成式 AI 与架构优化 (Generative AI & Architecture)

  • Training-Free Universal High-Resolution Synthesis for Any Vision Model
    • 适用于各类视觉模型的免训练通用超分辨率合成技术。
  • FLUX.1 FP16 Inference Deployment + Low-Memory LoRA Training
    • FLUX.1 模型全链路部署与低显存 LoRA 训练优化。
  • Stable Diffusion Architectural Distillation
    • Stable Diffusion 系列模型的架构蒸馏与轻量化方案。
  • Image Synthesis and Semantic Manipulation Using Stable Diffusion Networks
    • 利用 SD 网络实现图像合成与深层语义操控。
  • Super-Resolution / Video Editing Solutions based on Stable Diffusion
    • 基于扩散模型的超分辨率重构与视频编辑方案。
  • Porting SDXL 1.0, SD X4 Upscaler, PromptGen to TensorFlow/ONNX (FP16 Support)
    • 跨框架移植 SD 核心模型并实现针对 FP16 的性能优化。

⚡ 训练优化、稳定性与底层正则 (Optimization & Stability)

  • Robustness and Speed: An Adaptive, Efficient Optimizer for Stable Training
    • 全能型高效优化器:集成免学习率/预热、梯度累积纠正及长尾梯度缓解等特性。
  • Numerical Stability via Scalable Parallel Compensated Reductions
    • 大规模并行计算中的数值稳定性改良方案。
  • Adaptive Moving-Average BatchNorm Stabilization
    • 基于自适应滑动平均的 BatchNorm 稳定化改进。
  • Loss Regularization / Parameter-Free Weight Regularization
    • 提升泛化能力的损失项与无参数权重正则化技术。
  • Dynamic Loss Weighting for Multi-Task Learning
    • 多任务学习环境下的动态损失权重分配策略。
  • Chunked Flash Attention in Keras
    • 在 Keras 框架中实现分块 Flash Attention 以支持长序列处理。

🚀 推理加速与移动端部署 (Inference & Mobile Deployment)

  • Accelerate Stable Diffusion FP16 Inference Deployment with TensorRT
    • 基于 TensorRT 的扩散模型推理加速。
  • Stable Diffusion Architecture Optimization and Deployment on Mobile Devices
    • 针对移动端侧环境的 SD 架构优化与部署方案。
  • A Plug-And-Play Algorithm for Asynchronous Inference with Frequency-Domain Reconstruction
    • 基于频域重构的可插拔异步推理算法。
  • A Trimap-Free Solution for Real-Time Automatic Portrait Matting on Mobile Devices
    • 移动端实时免 Trimap 自动人像抠图算法。

🖼️ 计算机视觉与多媒体 (Computer Vision & Multimedia)

  • Enhanced FaceFusion: Decoupled Modules & Optimized Inference
    • 模块化解耦与推理流程优化的增强版 FaceFusion。
  • Ultra High-Resolution Portrait Retouching
    • 超高清人像修图与质感增强算法。
  • Arbitrary Resolution Super-Resolution for Real-World Images
    • 针对现实场景图像的任意分辨率超分方案。
  • Content-aware 3-view Synthesis for Game Art
    • 面向游戏美术资源开发的内容感知三视图合成技术。

📊 Statistical Algorithms

  • Real-time MMSE-STSA speech enhancement (embedded implementation)

🤝 Collaboration & Contact | 合作与联系

I’m open to collaboration on AIGC, inference optimization, and audio/image algorithms.

Reach me on:

  • Telegram Badge
  • Wechat Badge
  • QQ Badge

For paid technical services or consulting:

  • mail Badge

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

Contributors