Replaced premium proprietary video AI at $3–5/minute with open source ComfyUI workflow. Same quality, costs in cents.
A motion design agency creating advertising creatives needed lipsync video generation at scale. They were paying premium pricing for the leading proprietary API — approximately $0.05–0.08 per second of video — which translated to:
Beyond cost, they faced API rate limits, quality ceilings, and lack of customization that constrained their creative output. They needed a solution that was significantly cheaper, removed external API dependencies, and could be customized to their specific use cases.
Most teams looking at this problem would either accept the premium pricing or attempt to build a proprietary model. I took a third path: building production infrastructure around best-in-class open source AI models with cost-optimized GPU orchestration.
After evaluating available options, I selected Infinity Talk (built on Wan 2.1) as the lipsync foundation. Critical reasoning:
The challenge wasn't running the model — it was making it production-grade.
I built a containerized deployment infrastructure that handles:
The infrastructure design is modular and replicable — I've since used this same Docker template foundation to deploy similar AI pipelines for other clients with minimal modification.
This is where the economics get interesting.
The optimization journey itself demonstrates a key consulting principle: continuous iteration on infrastructure choice. VAST AI was the right answer initially, but when their pricing changed and better alternatives emerged, switching to RunningHub delivered another step-change in economics.
I implemented both modes, with deliberate use case separation:
The V2V capability is something no one else in the open source community had working at the time, which led to my next client finding me directly through a technical article I published on Infinity Talk implementation.
Published an in-depth technical article on Infinity Talk implementation on a major Russian-language tech forum, receiving editor's recognition (authorship status) and significant positive community response. The article became a primary reference for others entering this space and led to direct client acquisition.
For the original client: same volume of lipsync output at fraction of original cost. No API rate limits. Customizable workflow for specific creative needs. 6+ months in continuous production use.
Broader commercial impact: 3 paid implementations across different clients with different needs. Each customized through workflow modifications (V2V for some, I2V for others). Infrastructure foundation reused across multiple AI projects.
| AI Models | Infinity Talk (Wan 2.1 base) |
| Workflow Engine | ComfyUI |
| GPU Compute | VAST AI · RunningHub |
| Interface | Telegram Bot API (local server) |
| Infrastructure | Docker · Python orchestration |