I’m about to launch version 2.0 of Right-Click Prompt. This is going to be a commercial release and I’m obviously a little worried because I completely vibe coded it and I don’t know what I’m doing. 😂 would love to get your take on this fiasco when it happens
This is awesome! Let me know how it goes, and if you need more visibility, feel free to list yourself as a vibe coding builder on vibecoding.builders :)
Whenever you have personal workflows like this, you should endeavor to capture them into a Claude skill (sorry, I guess Im Anthropic centric and I dont think OpenAI or Gemini have the equiv., which might be an argument for others to use Claude instead.. LATE BREAKING = OpenAI is now adding skills, see Simon Willison's Dec 17 Substack). To start with, just prompt Claude "Make the following into a skill" and paste this whole blog post in there and then take the skill file and put it into your ~/.claude/skills/ so you have a permanent and repeatable workflow, and then start to update that file with time so you accumulate your learnings.
Maybe this is a small startup idea, sell these skill files for various efforts (code reviews, deployments, different testing, ....). Anthropic/Claude actually has a "plugin marketplace" feature so users and easily import skills and such from URLs, see the /plugin slash command in CC.
Pro pro tip:
If you have Python (backend) code, look into using the Hypothesis package for testing. Hypothesis is a property-based testing library for Python. Instead of writing specific test cases, you describe properties your code should satisfy, and Hypothesis automatically generates hundreds of random inputs to find edge cases that break those properties.
Furthermore, you can use Schemathesis. Schemathesis automatically generates API tests from your OpenAPI/Swagger spec. It uses Hypothesis under the hood to create randomized requests that probe edge cases, invalid inputs, and unexpected combinations your API should handle.
David, this is such a thorough way to think about automating workflows! I’m with you, Claude is my favorite too, whether it’s the model, the app, or the CLI 😄 I really should brush up my own workflows and publish more of them into Claude’s plugin marketplace.
Your pro pro tip intrigued me. Hypothesis sounds perfect for enterprise-scale codebases. Schemathesis seems even more powerful, though I wonder what kind of resource costs come with that level of automated fuzzing. Do you actively use either of them in your stack?
As I understand it, Claude's Marketplace, at this point, isnt really an "app store" kind of place, just a way to publish skills and such so that others can pull them in, I think answering the question of, how could an enterprise have a distribution mechanism so everyone can pull in like some skill.md file so the organization is in sync with some workflow. But I think we might be getting the first glimpses into where Anthropic is going here because they could establish their own "app store" like server host where people can publish to and have a web front end for discovery (or alternately, someone could bootstrap this effort and beat them to the punch).
I learned about Hypthesis from a PyBay video recently and was like, "oh wow, I should have been doing this all along". Ive tried it out, but havent deployed it, but its one of these things that once you see and and grok it, you put into your back pocket so when you need it, its there. And "fuzz" was a good description, but think instead of just random fuzzing, its fuzzing with appropriate data that stresses the system. For numbers, tests at upper and lower and certain break points, NAN/infinity (I think).
The "worked on my machine" trap is so real, especially when you're moving fast with AI. Our team shipped features that passed every logical check but completely fell apart when someone used an older phone or clicked twice instead of once.
Your breakdown is gold. Smoke testing isn’t glamorous, but it’s the difference between a launch day nightmare and a confident rollout. Every AI-built app benefits from this structured, human-first approach.
Brilliant breakdown of systematic testing without over-engineering. The "three times" rule for happy path testing is spot on, state bugs always hide past the first run. DevTools mobile view vs actual device testing is probably the most underrated gap in most peoples workflow, seen too many "works on my machine" disasters from skipping real hardware. Fresh eyes testing phase is gold for catching UX blind spots.
Brilliant breakdown of systematic testing without over-engineering. The "three times" rule for happy path testing is spot on, state bugs always hide past the first run. DevTools mobile view vs actual device testing is probably the most underrated gap in most peoples workflow, seen too many "works on my machine" disasters from skipping real hardware. Fresh eyes testing phase is gold for catching UX blind spots.
I like that you mention the importance of smoke tests, which many overlook. Smoke tests are an essential part of the SDLC and the best way to ensure a successful launch (well, most of the times). This is especially important when managing many teams that are building different things; otherwise, it would be chaotic!
Thanks for mentioning it, Jenny! I can't believe we are halfway through the aiadventchallenge.com, and so much is going on! 🎄
Thanks Jennifer, this is such a great post. I hadn't heard the term smoke test before, but it makes complete sense that this is what we should be doing before we launch our products.
I wonder though, is there something we should be doing with the other testers rather than just ourselves, so that we can get beyond our own blinkered approach to how different users might be using the product?
For me, this is really reminiscent of when I design tabletop games and I invite different users, or rather players, to play test the game until it breaks. They find new ways to play the game than I would ever have imagined. In a way, I guess this is an analogue version of the smoke test.
Thanks Sam! Wow, your example is so interesting, it really is an analogue version of smoke testing, but it actually goes even further. It sounds like your play-testers are reshaping the product itself. That goes beyond surface testing into something more transformational.
And from that stance, I think there’s a blurry line between testing for breakage and listening for insight. In many cases, what starts as “let’s make sure it works” quickly turns into “wait, maybe it shouldn’t work that way at all.” I think that’s what’s implied in your story too, it’s not just validation, it’s creative redirection.
For me, that’s the most revealing part of handing a product to someone else with zero context. The way they pause, click the “wrong” thing, or hesitate, that confusion is a usability bug. Not in the technical sense, but in the behavioral sense: it’s where system assumptions and human expectations collide. And that’s where the real learning begins.
Extremely detailed as usual!! Full of useful tips and guidance. It seems right on time for me as well just to ensure I am not leaving any point before shipping my first app in first week of 2026
I’m about to launch version 2.0 of Right-Click Prompt. This is going to be a commercial release and I’m obviously a little worried because I completely vibe coded it and I don’t know what I’m doing. 😂 would love to get your take on this fiasco when it happens
Sounds like a solid masterpiece 🤩 Is it open to the public yet?
No not yet
Let me know when it’s ready and I’d be very happy to roast it 😆
Thanks for the mention
They were truly amazing!
I'm glad I read this before launching my beta test next monday 😅.
This is awesome! Let me know how it goes, and if you need more visibility, feel free to list yourself as a vibe coding builder on vibecoding.builders :)
Your answer means a lot to me. I feel supported. Thanks a lot!
Pro tip:
Whenever you have personal workflows like this, you should endeavor to capture them into a Claude skill (sorry, I guess Im Anthropic centric and I dont think OpenAI or Gemini have the equiv., which might be an argument for others to use Claude instead.. LATE BREAKING = OpenAI is now adding skills, see Simon Willison's Dec 17 Substack). To start with, just prompt Claude "Make the following into a skill" and paste this whole blog post in there and then take the skill file and put it into your ~/.claude/skills/ so you have a permanent and repeatable workflow, and then start to update that file with time so you accumulate your learnings.
Maybe this is a small startup idea, sell these skill files for various efforts (code reviews, deployments, different testing, ....). Anthropic/Claude actually has a "plugin marketplace" feature so users and easily import skills and such from URLs, see the /plugin slash command in CC.
Pro pro tip:
If you have Python (backend) code, look into using the Hypothesis package for testing. Hypothesis is a property-based testing library for Python. Instead of writing specific test cases, you describe properties your code should satisfy, and Hypothesis automatically generates hundreds of random inputs to find edge cases that break those properties.
https://hypothesis.readthedocs.io/
Furthermore, you can use Schemathesis. Schemathesis automatically generates API tests from your OpenAPI/Swagger spec. It uses Hypothesis under the hood to create randomized requests that probe edge cases, invalid inputs, and unexpected combinations your API should handle.
https://schemathesis.readthedocs.io/en/stable/
These are low cost, high yield efforts and are probably things that should be added to that skills.md file
David, this is such a thorough way to think about automating workflows! I’m with you, Claude is my favorite too, whether it’s the model, the app, or the CLI 😄 I really should brush up my own workflows and publish more of them into Claude’s plugin marketplace.
Your pro pro tip intrigued me. Hypothesis sounds perfect for enterprise-scale codebases. Schemathesis seems even more powerful, though I wonder what kind of resource costs come with that level of automated fuzzing. Do you actively use either of them in your stack?
As I understand it, Claude's Marketplace, at this point, isnt really an "app store" kind of place, just a way to publish skills and such so that others can pull them in, I think answering the question of, how could an enterprise have a distribution mechanism so everyone can pull in like some skill.md file so the organization is in sync with some workflow. But I think we might be getting the first glimpses into where Anthropic is going here because they could establish their own "app store" like server host where people can publish to and have a web front end for discovery (or alternately, someone could bootstrap this effort and beat them to the punch).
I learned about Hypthesis from a PyBay video recently and was like, "oh wow, I should have been doing this all along". Ive tried it out, but havent deployed it, but its one of these things that once you see and and grok it, you put into your back pocket so when you need it, its there. And "fuzz" was a good description, but think instead of just random fuzzing, its fuzzing with appropriate data that stresses the system. For numbers, tests at upper and lower and certain break points, NAN/infinity (I think).
The "worked on my machine" trap is so real, especially when you're moving fast with AI. Our team shipped features that passed every logical check but completely fell apart when someone used an older phone or clicked twice instead of once.
Auugh yes! It’s wild how many different ways there is to break a perfect app 😂
Your breakdown is gold. Smoke testing isn’t glamorous, but it’s the difference between a launch day nightmare and a confident rollout. Every AI-built app benefits from this structured, human-first approach.
Thank you Suhrab! Much appreciate it!
Brilliant breakdown of systematic testing without over-engineering. The "three times" rule for happy path testing is spot on, state bugs always hide past the first run. DevTools mobile view vs actual device testing is probably the most underrated gap in most peoples workflow, seen too many "works on my machine" disasters from skipping real hardware. Fresh eyes testing phase is gold for catching UX blind spots.
Thank you so much!!
Brilliant breakdown of systematic testing without over-engineering. The "three times" rule for happy path testing is spot on, state bugs always hide past the first run. DevTools mobile view vs actual device testing is probably the most underrated gap in most peoples workflow, seen too many "works on my machine" disasters from skipping real hardware. Fresh eyes testing phase is gold for catching UX blind spots.
I like that you mention the importance of smoke tests, which many overlook. Smoke tests are an essential part of the SDLC and the best way to ensure a successful launch (well, most of the times). This is especially important when managing many teams that are building different things; otherwise, it would be chaotic!
Thanks for mentioning it, Jenny! I can't believe we are halfway through the aiadventchallenge.com, and so much is going on! 🎄
Thanks for reading Elena! I look forward to interviewing you about this AI advent challenge!
Can’t wait for that interview to happen! 🫶🏻
This checklist is going straight into my workflow!
Thank you Hodman 🙌
Thanks Jennifer, this is such a great post. I hadn't heard the term smoke test before, but it makes complete sense that this is what we should be doing before we launch our products.
I wonder though, is there something we should be doing with the other testers rather than just ourselves, so that we can get beyond our own blinkered approach to how different users might be using the product?
For me, this is really reminiscent of when I design tabletop games and I invite different users, or rather players, to play test the game until it breaks. They find new ways to play the game than I would ever have imagined. In a way, I guess this is an analogue version of the smoke test.
Thanks Sam! Wow, your example is so interesting, it really is an analogue version of smoke testing, but it actually goes even further. It sounds like your play-testers are reshaping the product itself. That goes beyond surface testing into something more transformational.
And from that stance, I think there’s a blurry line between testing for breakage and listening for insight. In many cases, what starts as “let’s make sure it works” quickly turns into “wait, maybe it shouldn’t work that way at all.” I think that’s what’s implied in your story too, it’s not just validation, it’s creative redirection.
For me, that’s the most revealing part of handing a product to someone else with zero context. The way they pause, click the “wrong” thing, or hesitate, that confusion is a usability bug. Not in the technical sense, but in the behavioral sense: it’s where system assumptions and human expectations collide. And that’s where the real learning begins.
Smoke testing is the fastest way to protect trust before you invite real users in.
Yeah… I’m so grateful that my users are still with me after all those breaks :)
Now that we’re in the vibe coding era this is more important than ever
Definitely! Thanks for reading Richard!
What I appreciate here isn’t the tactic, but the restraint behind it.
This isn’t really about smoke testing. It’s about refusing to confuse momentum with signal.
Most people use “vibe” as permission to avoid reality. This reframes it as something that still has to answer to it.
That distinction alone can save months of misdirected effort.
🙌🤝🙌
So true... thank you for pointing that out. Yes, most people use "vibe" as excuse to ignore reality, but it doesn't have to be.
Extremely detailed as usual!! Full of useful tips and guidance. It seems right on time for me as well just to ensure I am not leaving any point before shipping my first app in first week of 2026
This is awesome! I would love to learn more about your first app!
I already do most of what you describe, but it’s great to see it organized and packaged like this! Especially love the checklist! ❤️🙏
Haha I totally had your testing rituals in mind while writing this one 😄 glad the checklist landed!