Intro
The idea was to animate a Cinema4D hand without rigging by transferring real motion through a video-to-video pipeline in ComfyUI. The workflow: record my hand → extract pose with DW Pose → generate the final animation with WAN 2.2 VACE, using the first frame as a style reference.
Motion Capture
Pose Extraction
The source video is processed in ComfyUI with the ControlNet DW Pose Estimator to get frame-accurate skeletal cues for the wrist, palm and fingers. This gives me a clean motion signal—timing and gesture—without committing to any rig.
Comfy
Video-to-video generation with WAN 2.2 VACE.
WAN 2.2 VACE (Video-Aware Consistency Engine) is designed for frame-stable generation: it aligns temporal consistency across frames so the style remains locked while motion evolves smoothly. In my workflow, the first frame of the sequence was used as a style reference, ensuring that the yellow Cinema4D hand’s appearance stayed identical throughout. The DW-Pose sequence then guided motion, so what you see is my real hand movement faithfully translated into the stylized model.
Result
Outro
Everything is generated end-to-end in ComfyUI. Thanks to WAN 2.2 VACE’s frame stability, the output avoids jitter and drifting style, while the real-world motion input keeps the animation expressive. The result is a natural hand animation built entirely from a lightweight AI pipeline—no traditional rigging required.
⬑ research / AI Hand
What is ComfyUI?
ComfyUI is an open-source, node-based interface for Stable Diffusion that lets you build custom image and video generation workflows by connecting modular nodes for full control and flexibility.
What is WAN 2.2?
WAN 2.2 is an open-source text-to-video and video-to-video diffusion model. It introduces the Video-Aware Consistency Engine (VACE), which is designed to keep style stable and motion coherent across frames, making it especially suited for workflows that combine visual consistency with external motion guidance.
Helpful Resources
- ComfyUI Documentation
— Node-based workflow interface for building and experimenting with AI pipelines. - WAN 2.2 VACE
— Video generation model with Video-Aware Consistency Engine for stable, coherent frame-to-frame output. - ControlNet: DW Pose
— Extracts skeletal keypoints from video for precise motion guidance. - Youtube: Purz and Machine Delusions explain Wan2.2 Vace