Got a technical article but wish it were a fun cartoon? This AI pipeline reads any markdown file and automatically produces a fully narrated, animated video — with consistent characters, voice acting, and scene transitions. No editing software needed. Just one command.
Design Doc: docs/design.md, Flow Source Code: flow.py
The pipeline was run on a short article about Neural Networks. It produced 7 scenes — each with a generated image, voice narration, and animated video.
| Scene | Image | Description & Dialogue | Media |
|---|---|---|---|
| 1 — Mia | ![]() |
Mia is sitting at her desk looking frustrated, surrounded by crumpled homework papers about neural networks. "Ugh, my brain hurts! How am I supposed to understand these 'neural networks' for my computer science homework? It just looks like a bunch of confusing dots and lines!" |
Video · Audio |
| 2 — Ding Ding Dog | ![]() |
Ding Ding Dog walks in, wags his stubby tail, and pulls a glowing brain gadget from his belly pocket that expands into a floating hologram of interconnected nodes. "Arf! Don't let the dots and lines trick you, Mia! Neural networks are actually inspired by the human brain, using interconnected nodes to process information just like you do." |
Video · Audio |
| 3 — Mia | ![]() |
Mia pushes up her round glasses and points at the glowing nodes with a curious expression. "Okay, I get the brain part... but how do a bunch of glowing dots actually learn to recognize patterns from the data? It still looks like magic!" |
Video · Audio |
| 4 — Ding Ding Dog | ![]() |
Ding Ding Dog's golden bell jingles as he pulls miniature turning dials out of his pocket and attaches them to the glowing lines between the nodes. "Arf! It's not magic, it's math! See these dials on the lines? They represent 'weights' that automatically turn and adjust during training to help the network figure out exactly which paths are most important!" |
Video · Audio |
| 5 — Mia | ![]() |
Mia scratches her head in confusion and asks how the network knows which way to turn the dials when it makes a mistake. "Wait, I'm still confused. If the network makes a mistake, how does it actually know which way to turn those dials to fix it?" |
Video · Audio |
| 6 — Ding Ding Dog | ![]() |
Ding Ding Dog projects a tiny holographic robot that walks backward along the glowing connections, explaining backpropagation step by step. "Arf! It uses a trick called 'backpropagation'! Think of it like a little helper walking backward through the network, checking mistakes step-by-step to tell the dials exactly how to fix the errors." |
Video · Audio |
| 7 — Mia | ![]() |
Mia jumps up from her desk with a giant smile, throwing her hands in the air to celebrate understanding. "I get it now! So by repeating that backward step over and over, the network learns from its mistakes and finally makes accurate predictions! Thanks, Ding Ding Dog!" |
Video · Audio |
All 7 scenes stitched into a single 73-second narrated cartoon: final.mp4
flowchart TD
A["1. Plan Scenes (LLM)"] --> B["2. Write Scripts (LLM, self-loop)"]
B --> C["3. Generate Images (Wan 2.7)"]
C --> D["4. Generate Audio (CosyVoice)"]
D --> E["5. Animate Video (Wan 2.7 I2V)"]
E --> F["6. Combine (ffmpeg)"]
Three layers ensure the same characters look identical across all independently generated images:
- Text description — Full character description embedded in every image prompt
- Reference image — Same
assets/ref.pngpassed to every generation call - Scene chaining — Previous scene's output used as style reference for the next
-
Set up LLM in
utils/call_llm.pyby providing credentials.You can refer to LLM Wrappers for example implementations.
You can verify that it is correctly set up by running:
python utils/call_llm.py
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables (copy
.env.exampleto.env):cp .env.example .env
Then fill in your API keys:
GEMINI_API_KEY— Get from Google AI StudioDASHSCOPE_API_KEY— Get from Alibaba Model Studio
-
Make sure
ffmpegis installed on your system:# macOS brew install ffmpeg # Ubuntu/Debian sudo apt install ffmpeg
-
Run the pipeline:
python main.py examples/neural_networks.md
Options:
-o, --output— Output directory (default:./output)--ref-image— Custom character reference image (default:assets/ref.png)
- Built with Pocket Flow, a 100-line LLM framework that lets LLM Agents (e.g., Claude Code) build Apps for you








