GitHub - The-Pocket/PocketFlow-Tutorial-Wan-Video · GitHub
Skip to content

The-Pocket/PocketFlow-Tutorial-Wan-Video

Folders and files

Repository files navigation

Turn Markdown Articles into Animated Cartoon Videos

License: MIT

Got a technical article but wish it were a fun cartoon? This AI pipeline reads any markdown file and automatically produces a fully narrated, animated video — with consistent characters, voice acting, and scene transitions. No editing software needed. Just one command.

Design Doc: docs/design.md, Flow Source Code: flow.py

Example Output

The pipeline was run on a short article about Neural Networks. It produced 7 scenes — each with a generated image, voice narration, and animated video.

Scene Image Description & Dialogue Media
1 — Mia Mia is sitting at her desk looking frustrated, surrounded by crumpled homework papers about neural networks.

"Ugh, my brain hurts! How am I supposed to understand these 'neural networks' for my computer science homework? It just looks like a bunch of confusing dots and lines!"
Video · Audio
2 — Ding Ding Dog Ding Ding Dog walks in, wags his stubby tail, and pulls a glowing brain gadget from his belly pocket that expands into a floating hologram of interconnected nodes.

"Arf! Don't let the dots and lines trick you, Mia! Neural networks are actually inspired by the human brain, using interconnected nodes to process information just like you do."
Video · Audio
3 — Mia Mia pushes up her round glasses and points at the glowing nodes with a curious expression.

"Okay, I get the brain part... but how do a bunch of glowing dots actually learn to recognize patterns from the data? It still looks like magic!"
Video · Audio
4 — Ding Ding Dog Ding Ding Dog's golden bell jingles as he pulls miniature turning dials out of his pocket and attaches them to the glowing lines between the nodes.

"Arf! It's not magic, it's math! See these dials on the lines? They represent 'weights' that automatically turn and adjust during training to help the network figure out exactly which paths are most important!"
Video · Audio
5 — Mia Mia scratches her head in confusion and asks how the network knows which way to turn the dials when it makes a mistake.

"Wait, I'm still confused. If the network makes a mistake, how does it actually know which way to turn those dials to fix it?"
Video · Audio
6 — Ding Ding Dog Ding Ding Dog projects a tiny holographic robot that walks backward along the glowing connections, explaining backpropagation step by step.

"Arf! It uses a trick called 'backpropagation'! Think of it like a little helper walking backward through the network, checking mistakes step-by-step to tell the dials exactly how to fix the errors."
Video · Audio
7 — Mia Mia jumps up from her desk with a giant smile, throwing her hands in the air to celebrate understanding.

"I get it now! So by repeating that backward step over and over, the network learns from its mistakes and finally makes accurate predictions! Thanks, Ding Ding Dog!"
Video · Audio

Final Video

All 7 scenes stitched into a single 73-second narrated cartoon: final.mp4

How It Works

flowchart TD
    A["1. Plan Scenes (LLM)"] --> B["2. Write Scripts (LLM, self-loop)"]
    B --> C["3. Generate Images (Wan 2.7)"]
    C --> D["4. Generate Audio (CosyVoice)"]
    D --> E["5. Animate Video (Wan 2.7 I2V)"]
    E --> F["6. Combine (ffmpeg)"]
Loading
Step Model What it does
Plan Scenes Gemini Reads the article and plans 4-8 cartoon scenes
Write Scripts Gemini Generates dialogue, image prompts, and animation prompts per scene
Generate Images Wan 2.7 Image Creates illustrations with character reference for consistency
Generate Audio CosyVoice v3+ Text-to-speech with per-character voice profiles
Animate Video Wan 2.7 I2V Turns static images into animated clips matching audio duration
Combine ffmpeg Merges audio+video per scene, concatenates into final video

Character Consistency

Three layers ensure the same characters look identical across all independently generated images:

  1. Text description — Full character description embedded in every image prompt
  2. Reference image — Same assets/ref.png passed to every generation call
  3. Scene chaining — Previous scene's output used as style reference for the next

How to Run

  1. Set up LLM in utils/call_llm.py by providing credentials.

    You can refer to LLM Wrappers for example implementations.

    You can verify that it is correctly set up by running:

    python utils/call_llm.py
  2. Install dependencies:

    pip install -r requirements.txt
  3. Set up environment variables (copy .env.example to .env):

    cp .env.example .env

    Then fill in your API keys:

  4. Make sure ffmpeg is installed on your system:

    # macOS
    brew install ffmpeg
    
    # Ubuntu/Debian
    sudo apt install ffmpeg
  5. Run the pipeline:

    python main.py examples/neural_networks.md

    Options:

    • -o, --output — Output directory (default: ./output)
    • --ref-image — Custom character reference image (default: assets/ref.png)

Built with PocketFlow

  • Built with Pocket Flow, a 100-line LLM framework that lets LLM Agents (e.g., Claude Code) build Apps for you

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

Languages