Coach Solomon Reed cover
MediumSide ProjectRoleSolo Builder

Coach Solomon Reed

2026 · Live

Twenty-one short-form videos a week across four formats, produced by a single pipeline that I review for forty-five minutes and otherwise leave alone. The interesting work was not the model picks. It was the seams between them: voice that matched the script's intent, captions that survived a composite, a review surface that I would actually use on a Friday.

Script editor showing four stacked script blocks for an episode, each tagged with personality, avatar shot type, and beat. The third block is active with an indigo glow border. A preview frame on the right shows burned-in caption text reading 'STOP TREATING SAVINGS LIKE A CHORE.'
The script editor that made the rest of the pipeline tolerable. Sectioned, regenerable, voice-tagged.
Pipeline diagram: discovery scrapes Reddit and TikTok, Claude writes a structured script, Voicebox synthesizes voice, HeyGen renders the avatar, FFmpeg composites, Whisper burns captions, a review UI gates the publish step that produces 21 videos per week split across four formats.
The whole pipeline on one page. Each stage is replaceable; the seams between them were the actual project.

Overview

I wanted to know whether one person, leaning hard on current AI tools, could run a content operation that used to take a small team. Coach Solomon Reed is the test. He's a financial coaching persona posting in four formats: monologue tips, Reddit response videos, a fake call-in show with two voices, and reaction overlays on existing public videos. Twenty-one videos a week. About forty-five minutes of human review. The rest is the pipeline.

Approach

Scripts come from Claude through a custom skill that enforces a hook/problem/point/payoff beat sheet and tags each beat with a personality mode and an avatar shot. Voice is Voicebox running Qwen3-TTS locally so I iterate on tone without a billing meter. Avatar render is HeyGen; the lip sync at this fidelity isn't worth rebuilding. Compositing is FFmpeg by hand for each format (Reddit overlay, split-screen call-in, picture-in-picture reaction) because templates collapse the moment a format wants to change. Captions are Whisper plus a burn-in step. Discovery for Reddit and TikTok runs nightly. Everything lives behind a small web UI for review.

Outcome

Production time went from a real, measured twenty hours a week to a forty-five-minute review session three times a week. Output volume tripled. The persona has been live for three months. The engagement-per-format breakdown I can't publish, but it reordered which formats I bother with. What I can publish: the call-in show, the most expensive per video, is the only format where retention curves resemble a real show.

Reflections

Two surprises. Discovery matters more than generation. The model can write a good script all day; what makes the video land is the topic choice, and that choice is fragile when it's automated. I now review the topic queue more carefully than the script output. The other one: voice is the load-bearing piece. The first month I burned on avatar quality; viewers don't care past a low bar. Voice tone is what makes a take feel earned or hollow.