I made a 23-second cinematic ad for $18
Watch it first.
That's a paid-social spot for the Amex Platinum card. A guy gets delayed at the airport, endures every misery of modern travel — security lines, a crying baby on the shuttle bus — then taps his Platinum card at the Centurion Lounge door and exhales into a leather couch.
Every scene, character, and voice was AI. $18 in model credits, one afternoon. Here's exactly how I built it.
The workflow in 7 steps
I run this inside Claude Code, which acts as the orchestrator. I describe what I want, it picks the right tool at each step and calls it. I'm steering, not typing commands.

Step 1 · Brief the orchestrator
I start with one sentence: "23-second airport ad for Amex Platinum. Stress arc that resolves at the lounge entry. Cinematic, 9:16." Claude scaffolds the project folder, writes an idea brief, and asks clarifying questions. From there it knows which tools to use.
Step 2 · Gather brand assets
Before generating anything, Claude runs a source-brand-assets skill. It web-searches for high-resolution product photos, downloads them, and drops them into brand-assets/. For this ad it pulled clean card shots from Wikimedia and editorial CDNs.
This matters because every AI generation that follows uses these images as a reference. Without a real anchor, the card art drifts.
Step 3 · Build the storyboard
Claude generates a single storyboard.html — the project dashboard. It has the idea brief at the top, a timeline ribbon, and a grid of scene cards. Every revision to the project lands here. The state of the project is the state of one HTML file.
Step 4 · Generate scene stills (Nano Banana 2)
For each scene, Claude calls Nano Banana 2 through the Higgsfield API. Vertical 9:16, 2K resolution, with the brand asset attached when the card needs to appear. These are still images — not video yet.

Claude embeds each still in the storyboard for review before spending on motion. Catching a wrong expression at this stage costs $0.02. Catching it after generating a 4-second video clip costs $0.80.
Step 5 · Animate the stills (Veo 3.1)
Once the stills are approved, Claude uploads each one as a start_image to Veo 3.1 (also via Higgsfield) and generates a 4-second motion clip. Veo locks onto the start frame and animates from there. All 14 Veo calls run in parallel — three minutes for the full scene set.
Step 6 · Voice and sound (ElevenLabs)
The VO is two lines, both delivered by "Austin" — a calm Texan voice from ElevenLabs TTS. Sound design is three layers: airport ambience bed, a crying baby layered over the shuttle bus scene, and a card-touch chime.

The key trick: the music was withheld for the entire chaos arc and only comes in at the moment the card touches the reader. Six seconds of silence, then music. That asymmetry is what makes the relief feel earned.
Step 7 · Assemble and publish
Claude writes a single _assemble.sh — one ffmpeg invocation that stitches 14 clips, layers five audio streams, applies fades and ducking, and renders the master. The end card is a product shot + slow zoom-in + PIL text overlay — no video generation needed, and the typography stays clean.
What it cost
~$18 total in model credits: Nano Banana 2 stills (~$3), Veo 3.1 clips at 14×4 seconds high quality (~$11), ElevenLabs VO + SFX (~$4). One afternoon of work, mostly in review — approving stills, checking the storyboard, locking the music level.
A traditional 23-second cinematic spot costs $20,000–$50,000 for the same delivery quality. The whole pipeline is Claude Code as conductor, Higgsfield for visuals, ElevenLabs for sound, and ffmpeg for assembly.