How I Create Consistent Hero Images, And Why I Haven't Switched to NanoBanana
The complete system behind my brand images—from base setup to 10-second automation
Creating consistent AI-generated brand images shouldn’t take 20 minutes of prompt iteration every time. Here’s the complete system I use — ChatGPT 4o project setup, reference image pool, Cursor slash commands for 30-second prompt generation — plus honest comparisons with Gemini and newer ChatGPT models, and why I haven’t switched despite the hype.
How do you create your hero images with consistent character?
Every once in a while, I’d get such messages. Each time I’d explain it:
Create a series of images of that character: front, side, back, smile, not smile
Each time I need to create something, I attach the image and add my description, ask ChatGPT to give me a thorough prompt
I will then send the prompt together with the image to ChatGPT to generate the proper one
Then people followed up with “but how exactly?”
Even with Gemini being so good and so fast now, I still got the same question yesterday from Giuseppe Santoro 🚢 when most of my AI friends are having fun creating images.
That’s when I asked myself: would I switch directly to the faster Gemini model? My answer was... maybe. And that pause made me realize: this isn’t just about one-shot consistent image generation. People are curious about the system behind it.
Before we dive in, let’s talk about why this matters at all.
Consistent hero images create brand recognition. When someone sees that 3D cartoon character in their feed, they know it’s from you before they even read the title.
It also signals professionalism. Wildly inconsistent images make it look like you’re grabbing random stock photos.
Most importantly, it builds trust through coherence. Your visual identity reinforces your written voice.
What you’ll go through with me:
How I Found the Best AI Procedure — from cute 3D animals to a transferable system
The Step-by-Step ChatGPT Workflow — the exact setup, prompts, and real example
The Comparisons: ChatGPT vs Gemini — honest trade-offs and when each tool wins
How to Automate the Entire Workflow — from 20 minutes to 30 seconds with Cursor slash commands
Hi, I’m Jenny 👋
I teach non-technical people how to vibe code complete products and launch successfully. AI builder behind VibeCoding.Builders and other products with hundreds of paying customers. See all my launches →
If you’re new to Build to Launch, welcome! Here’s what you might enjoy:
So with that context, let me take you through how I discovered what works, starting from the very beginning.
1. How I Found the Best AI Procedure
First off, the Pixar-style 3D cartoon images are my favorite style. I love watching 3D cartoons, and they make me feel delightful.
In my early days, I’d use ChatGPT’s DALL·E model to create cute 3D animal images, because they are just sooo adorable.

As I wrote more and started to pick up traction, the branding idea naturally slipped in. Without much debate, I chose my profile picture as the source of truth.
I uploaded my profile picture and literally asked ChatGPT to create front, side, back, sit, stand, talk, walk, happy, confused... all sorts of postures. That became my cartoon image pool. Not yet styled or tailored to fit into specific stories, but serving as the baseline for everything that came next.
I have a specific project in ChatGPT called “Images,” with a system prompt and those base cartoon files in it. Each time when I need to create a 3D image for an article, I’d go to that folder and start prompting.
Through experimentation, I discovered ChatGPT 4o worked best for me. It captured that subtle feel I wanted, a little mystic, a little fun, not the rigid light blue and paper white look that other approaches gave me. I’ll walk you through the exact process in a moment.
Of course, I tried newer models and Gemini too. They had strengths, but each had deal-breaker issues for my specific needs. They are things that might not matter to you, but were crucial for my brand setting. I’ll explain those trade-offs in detail later.
One validation moment stands out: I published an article about voice transcription with a hero image of me talking magic to a tree that grows as I talk. commented asking for the image prompt. I was surprised anyone would be curious about it. I shared the prompt and he came back with his version, nearly identical to mine. That’s when I knew this wasn’t my quirky workflow anymore, it became transferable.
2. The Step-by-Step ChatGPT Workflow
Here’s how it actually works day-to-day.
My Setup
I have a ChatGPT project folder called “Images” with:
A system prompt that defines my style preferences (3D Pixar, 16:9 ratio, etc.)
Base character images (front, side, back, sit, stand, various expressions)
These reference images maintain character consistency
The Workflow
Step 1: When I need a hero image for an article, I go to that ChatGPT project folder.
Step 2: I describe what I want for the image: the scenario, the mood, the action.
Step 3: I attach the relevant reference images that match the pose or expression I’m going for.
Step 4: ChatGPT generates a detailed, thorough prompt that incorporates my style preferences, the reference character, and the scenario I described.
Step 5: I use that generated prompt together with the reference image to have ChatGPT create the actual hero image.
Step 6: Usually it takes 1-3 shots to get it right. The system prompt and reference images provide enough context that iterations are minimal.
A Real Example
Let me show you how this worked for the article about learning Claude code in 30 minutes.
Article topic: Claude Code for Everyone in 30 Minutes
What I needed: Character tidying the mess on the floor, with calm focus, magical cloud (representing Claude) transforming chaos into order
The prompt ChatGPT generated:
A 3D cartoon-style young woman with smooth skin (shared in the file), large expressive eyes, and long straight black hair (inspired by Pixar and Disney), is kneeling on the floor with a calm, focused expression. To her left, a glowing magical cloud (representing Claude) hovers above a neat, organized stack of papers and folders, radiating soft golden light and magical sparkles. To her right, a chaotic pile of cluttered papers and messy objects represents unresolved problems. The magical light from the cloud appears to be gently transforming the mess into order. The scene is set in a cozy room with soft, cinematic lighting and a blurred background. The mood is enchanted and hopeful, emphasizing problem-solving and calm focus.
Result: One-shot success.
This workflow is reliable and relatively quick once you have the foundation set up. But it wasn’t always this smooth. Next, let me explain why I landed on this specific approach instead of the alternatives everyone’s talking about.
3. The Comparisons: Why Not Newer ChatGPT Models? Why Not Gemini?
This workflow works beautifully with ChatGPT 4o. But you might be wondering:
Why not the newer ChatGPT models?
Why not Gemini, especially with all the hype around its image generation capabilities?
Fair questions. I’ll show you what happened when I tried them.
Why Not Newer ChatGPT Models?
I started noticing that newer ChatGPT models weren’t working as well for me.
There were times when I accidentally used a prompt without specifying to use 4o, and the default newer model would turn out disastrous. Look at this image for my call for AI builders article, with left using non-specified model and right side using 4o.
Suddenly short hair? Why is the character wearing glasses now? Why is the skin tanned?
Are you judging based on stereotypes? Are you making assumptions about what it means to do knowledge work, or what a “healthy person” should look like?
I wanted that little mystic, that little fun look. Not the rigid, overly polished style the newer models kept producing. The subtle feel was off.
Why Not Gemini (NanoBanana)?
That’s a totally fair question. I have tried Gemini for various projects, and it was mostly amazing. Fast, powerful, and often impressive.
Except... the feel of the cartoon person looks a little off to me.
Let me show you a specific example. I needed an image for my article about first hitting Substack rising board. My character in a mysterious forest, picking up a gem. I wanted that curious, wonder-struck feeling. You know, that moment of “what is this thing?”
What I wanted: Mysterious look, forest setting, picking up gem, curious and genuinely surprised feel
What Gemini generated: A very futuristic, mature, confident woman. Attractive, sure. But she looked like she was about to take charge of the metaverse, not humble, not encountering something fantastic and new.
Don’t get me wrong, ChatGPT’s image generation isn’t perfect either. But it’s those subtle differences: the size of the head, that facial expression, the background color and scene. Gemini’s outputs feel a bit mechanical to me. And other times, they look… too modern to be me.
And if you’re Asian, you’ll immediately spot the drift. The face shape, the features, the subtle proportions. The specs might say “consistent,” but cultural context reveals what algorithms miss. What looks “close enough” to some people isn’t consistent when you know what to look for.
There are also technical issues: Gemini still doesn’t get the size and ratios right. Whenever I share a square reference image and request 16:9 ratio for newsletter headers, it fails. For newsletter formatting, wrong aspect ratios break layouts. This is functional, not aesthetic preference.
What Gemini DOES Work For
It’s entirely my personal preferences in terms of hero images. But I do use Gemini for many other scenarios, such as:
Small location swaps of image parts
Changing clothes
Changing text or words in images
They all worked perfectly for these use cases.
I’m not a Gemini paid member, but the fact that it’s able to generate such consistent images for these tasks makes sense. If I were starting new, I’d probably just go with Gemini because it’s so fast, cheap, and also pretty consistent.
The fact that Kim Doyal & Daria Cupareanu have been using Gemini for their images consistently already shows its superpower. I have historical reasons for sticking with my current setup, I’ve built a system around ChatGPT 4o that works. But that doesn’t mean it’s the only viable approach.
4. How to Automate the Entire Image Creation Workflow
From here, it’s already the complete story. But if you’re like me and hate repetitive friction, you’d see the annoyance.
The Friction Without Automation
Every time I needed an image:
Think of ideas and concepts for the image
Write up the prompt from scratch
Tweak the prompt to include all my style preferences
Adjust positioning, gesture, background color preferences
Iterate until it matches what I expect
Many times, it was just failure, or I simply didn’t like the result at all. I have particular preferences about what positions the character should be in, what gestures work, what background colors fit my brand.
Time per image: 15-20 minutes Satisfaction rate: About 40% (yes, 60% of the time I was dissatisfied or had to start over)
Because if you don’t constrain those details, the results are just... really really unsophisticated.
The Solution: Slash Commands in Cursor
You know I love doing everything inside Cursor, including writing. So I created a particular slash command that helps me generate those prompts with minimal repetitive intervention.
Sometimes I’m particular about what the image should be like, and I’ll specify details. Other times, I let Cursor free-form the prompt based on the article topic.
How it works:
Type
/create-hero-image-prompt [article topic/description]in CursorThe command generates a brand-consistent, detailed prompt in a few seconds
Copy-paste the prompt to ChatGPT’s Image project
Generate the image
Result: Usually 1-3 shots to get a satisfying image. 90% first-try success rate.
Time per image: Under 30 seconds for prompt generation, then standard image generation time
And it’s not for standard single-character articles only. I have another slash command btlf-guide specifically for my Build to Launch Friday series where I need two persons interacting with each other. Different scenarios, different conversations, different dynamics. The same systematic approach applies: describe the interaction, the command generates the prompt, paste and generate.
What Actually Changed
The metrics tell part of the story (20 minutes down to seconds, 40% satisfaction up to 90%). But the real transformation is cognitive.
I’m no longer thinking “what exact words describe my style?” or “did I remember to specify the background color preference?” The system remembers. I only think about what the image needs to convey for this specific article, and the automation handles the rest.
The friction disappeared completely. This kind of workflow optimization is exactly what I talk about in my 10x productivity workflow guide — removing repetitive cognitive load so you can focus on the creative work.
Next Steps
Beginner: Set up your character image pool
Upload your profile picture to ChatGPT and ask it to generate front, side, back, sit, stand, and various expression poses in your chosen style. Save these as your reference images. This takes 15 minutes and is the foundation everything else builds on.
Intermediate: Create a ChatGPT project folder
Set up a dedicated “Images” project in ChatGPT with a system prompt defining your style preferences and your reference images attached. Test it with your next article’s hero image — you should see a noticeable consistency improvement immediately.
Advanced: Automate with slash commands
If you write in Cursor, create a slash command that generates brand-consistent prompts from article descriptions. This is the step that takes you from 20 minutes to 30 seconds. If you want my exact setup, it’s included in the paid resources below.
If you’re a paid member, I’m sharing the exact ChatGPT project setup with system prompt, 3 Cursor slash commands (/hero-image-prompt, /btlf-guide, /scene-builder), and an import-ready .md file for Claude, ChatGPT, or Gemini. Access the consistent image creation resource here.
If any of this is useful, share it with one person who’s still manually crafting image prompts from scratch every time.
If you’re turning your expertise into products, building with AI, or helping others do the same, you belong here. Join the vibe coding builders community and get featured on Build to Launch Friday.
What’s your current hero image workflow — and what’s the part that takes the longest?
— Jenny
Why Subscribe · Build With AI · Templates · Builder Showcase















Great article folks need to share!
You're right about Gemini, though I have to say Nano is so helpful in refashioning a scene or character pose to then go back to vid prompt where I'm using Grok to create the fillers to mix with HeyGen voicings. Heck, even Playground now uses Nano!
I have a group of over 10 individually "personalitied" AI Actors with one of the mains being blind. A major challenge (back during Kling/Luma, even HeyGen) was trying to achieve the blind character with the white cane... thankfully both Veo and Grok will do it right!
Above I brag about Veo/Grok, and then fell waist deep into bias as I introduced a new character, female Brazilian with a darker complexion. Doing scene of her getting on yacht, then going back to computer lab, all of a sudden the tools were trying to turn Yara white, in fact to look like Tinkerbell (and Yara represents the villain)...
Trust me readers, Jenny has to face that often and advice regarding the group of prompts to that end is vital!
Advice from me would be if you're creating a new personality, not using a caricature of yourself, you need to share it on the world stage off/on enough to have a record of that look/voice is (name of character). Are folks going to copy? Yes. You just have to keep posting.
I love this! I love learning how other people create/generate images. I used to use Gemini for images and it was great! Or so I thought lol and then I discovered Blunge.ai image generator and it created what I'd been striving for but didn't realize it. As soon as it did I was like OMG YES THIS IS IT!!
I use Gemini to generate image prompts (with my standard physical description of my character) and then i paste that into blunge and it gives me exactly what I want! And I'm able to add in actual images of my digital products seamlessly into the images with Blunge too, which I love! (I just figured that out recently)