CHAPTER FIVE
4.44
There is a specific kind of moment that only happens when you have been building long enough to stop being careful.
It was 4:44 in the morning. Not planned. Not symbolic. Just the time on the clock when the birds outside started before the city did, and something in that sound pulled me out of sleep before the alarm had the chance.
I came to the computer the way you come to a fire you left burning – not in a panic, but with the quiet understanding that something is either fine or it isn’t, and only looking will tell you which.
The build counter had reset. After weeks of running out at the worst possible moment, the number was back to zero and the window was open. This was the moment we had been waiting for.
I ran the first build.
It was wrong. Not catastrophically wrong – just wrong enough to matter. A small piece of code we had added hours earlier, a single useEffect block designed to verify a login token, was throwing logged-in users back to the visitor screen the moment they tried to enter their Lab. We had written it together, tested it in theory, assumed it was clean, and shipped it without a single real run-through. Three lines. Two builds burned before the sun came up.
This is the thing nobody writes about in the tutorials. Not the builds lost to bad code, but the builds lost to confidence. The assumption that something small couldn’t possibly break anything. The decision to skip the test because the logic seemed obvious. The moment you realize that obvious and working are not the same thing.
The fix was simple. Revert to what worked. One command, one file, one line restored to its original state. The third build ran clean.
By the time the Android AAB was uploaded to Google Play Console and the iOS build was submitted to Apple for review, it was morning in full. Bitrupt had already sent 18 tester emails before I had the chance to ask for them. The input sheets were written and sent. The engines were live. The testers were added to the console. The clock had started.
Somewhere between the first wrong build and the last right one, the project crossed a line it had been approaching for months. Not a finish line – there is no finish line in this kind of work. More like a threshold. The kind you only notice after you’ve already stepped through it.
4:44. The birds knew before I did.
Day One
The first thing the test revealed wasn’t found by a tester. It was found by the backend. I spotted a pattern – the same report appearing multiple times, identical timestamps, same input. The culprit turned out to be the development environment itself, not the app, not the users, not the button. ExpoGo’s tunnel layer resending slow requests it mistook for failures. We built a lock into the backend anyway – three engines, three guards, one rule: if the same request is already running, the second one doesn’t get through. Clean, quiet, invisible to the user.
Then the indentation war began.
Python is unforgiving about whitespace. A single wrong indent crashes the whole backend. What started as a simple addition – a request lock wrapped in a try/finally block across three engine functions – became an hour of back and forth, partial fixes creating new problems, surgical edits making things worse before they got better. The lesson wrote itself in real time: when the change is structural, rewrite the whole thing. Every token saved on a partial edit costs three times as much on the repair.
By the time the correct file was deployed and the backend came back online, Bitrupt’s team had registered 13 accounts and was generating reports. Intelligence reports, Idea validations. One tester entered “Bzhdibddusks” The validator took it seriously. Saved it to their Lab like it was the next Airbnb. Another ran Lovable five times. The QA team was doing exactly what QA teams do – finding the edges, pressing the buttons, testing the walls.
The Archives were filling up. The images were showing up on every new report. The PDF buttons were working. The Make Public flow was instant.
Day one of the test was not clean. It was not smooth. It was exactly what a first day is supposed to be – full of things you didn’t expect, things you fixed, and things you watched work for the first time in the hands of strangers.
The app was alive.
The Stars Speak
The rating widget went live on the homepage that morning. Fifty-seven stars in full lemon-green, sitting under the hero next to the app icon like it had always been meant to be there. Not a mockup. Not a placeholder. Real ratings from real hands, counted live from the database and printed on the front page of the platform.
I kept checking the Archives that day, waiting for new reports to appear. Nothing. The list sat unchanged for hours, and the old instinct started whispering – the test has stalled, the testers have lost interest, the activity is gone.
Then I looked at the counter. Eighty-two.
Twenty-five new ratings since the morning, on a day when the Archives showed nothing at all. The testers were there the whole time – running the engines, hitting the walls, trying inputs, saving everything privately to their Labs where no public eye could see it. The platform looked asleep from the outside. Inside, it was working harder than ever.
That was the day I understood what the Star Gate actually was. We built it as a small toll booth – rate your last experience, then run your next report. A way to collect feedback without asking for it. But it turned out to be something better: the only honest counter in the system. Drafts can hide in private Labs. Published reports depend on someone choosing to share. But every rating means one thing that cannot be faked – a person sat with the app and tried to run something. Success or failure, gibberish or gold, the gate counted it.
Ten drafts. Fifteen more ratings with no drafts behind them – rejected inputs, duplicate hits, walls doing their job. The testers weren’t just using the app. They were pressing on its edges, exactly the way QA is supposed to.
That evening I sent Bitrupt a message that was not a task and not a reminder. Twenty-two reports yesterday, twenty-seven today, and a simple thank you. Some messages you send because the process requires them. This one I sent because it was true.
Eighty-two ratings from a test that had barely started. By the end of the window, the number could pass two hundred. And it looked great on the homepage.
The Engine That Refused to Be Fooled
Somewhere in the middle of the test, one of the testers tried to break the Concept Generator. Not with code, not with some clever exploit – with emptiness. He pushed the literal text “INPUT_INVALID” into the generators input window: nothing dressed up as something, the kind of input designed to make an AI hallucinate a business plan out of thin air.
The engine didn’t take the bait.
Instead of inventing a fake concept, it wrote him a report about the emptiness itself. “There’s nothing here to build on – and that’s the real problem,” it opened. And then, instead of stopping at the rejection, it kept going – a full structured analysis of why blank inputs produce blank outputs. “Generic landing pages built on nothing don’t attract customers. They waste time, dilute focus, and create a false sense of progress while the actual idea remains unexamined.”
Then, buried in the opportunity section, it gave him advice. Submit a real concept, it said. It doesn’t need to be polished – one sentence, a rough description, a problem you’ve noticed. “A messy real idea will always outperform a clean empty one.” and “The gap between ‘I have an idea’ and ‘I have a positioned concept’ is where most early-stage projects stall”
I read that line three times. The engine had produced, by accident, the best one-sentence summary of everything CGEN stands for. A tester tried to confuse the system and instead got the company’s philosophy back, fully formed, in the system’s own voice.
It even suggested names for the non-concept – ConceptSeed, DraftZero – branding for the idea of starting with something real. The guard rail had turned into a mirror, and the mirror was on message.
That’s the strange thing about building with AI long enough. You spend months teaching it your principles through prompts and corrections, and then one day, under attack, it states them back more clearly than you ever did.
The tester tried to break it and the engine responded with philosophy. That’s a good engine.
Eighteen People, One Server
Somewhere in those days a question started bothering me. Eighteen testers, three engines, one backend on Render. What happens when five of them hit the button at the same time? How many before the whole thing crashes or gets confused?
I asked. The answer was a small education in how the machine actually breathes.
The backend itself was never the worry – FastAPI processes requests in parallel, each one on its own track, like a bank with many tellers instead of one. The real bottleneck sits further up: every generation is a call to Claude with web search, two to three minutes of thinking per report, and Anthropic’s rate limits standing at the gate. Five to ten simultaneous generations – nothing. Twenty at once – timeouts begin.
But here was the part that made me relax: eighteen people never press a button at the same second. One is still reading the form. Another is typing their industry. A third is staring at the loading steps. Human rhythm spreads the load by itself – the testers’ natural pace was the best traffic management we had.
The duplicate lock we had built earlier was quietly helping too, blocking the double-fires before they ever reached the API.
“You’re safe,” the answer ended. And we were. The server never blinked once through the entire test.
The Currency
By day three, the counter read 158.
A hundred and fifty-eight ratings from a team of eighteen, in three days. These were not testers going through the motions – the inputs told the story. Dropshipping platforms. Print-on-demand companies. AI tools. Business ideas they were clearly carrying around in their own heads, finally run through a machine built to examine them. They were testing the product and using it at the same time, and the second part was not in the contract.
Most of what they generated stayed private. The Archives barely moved while the Labs filled up – and somewhere in those quiet numbers I finally said out loud what the Star Gate had become: a currency.
People were running real research on their own ideas; things they didn’t want public. That is exactly the use case CGEN was built for. The Archives is the public face. The private Lab is the real product. The stars are the proof that both are working.
That morning, Asif’s reply came in to my thank-you message: glad to see things growing this fast, hitting above the goals, keeping the energy going. A QA contract on Fiverr was quietly becoming something that looked more like a partnership.
By the end of fourteen days, the homepage could be showing five hundred stars – social proof earned before the platform ever officially launched, paid one generation at a time, in CGEN currency.
The Filter Catches the Boss
Then the gibberish filter caught Bitrupt himself.
Not a tester this time – the man running the operation. Setting up his team’s day, he copy-pasted from the input sheet we had sent him earlier and submitted the table’s column header to the input window – the literal words “first engine inputs” – as an idea description, with T-Pop and Print on Demand dutifully filled into the industry fields.
The engine read it exactly right. Not a real idea – placeholder text. INPUT_INVALID, with a clean explanation: the submission “reads as placeholder/template text rather than a real concept.” Not rude. Not robotic. Just clear.
I laughed when I saw it in the logs. Earlier in the test, a tester had tried to fool this filter on purpose and failed. Now the QA owner had triggered it himself – and the filter caught that too. The deliberate attack or the honest mistakes, all handled with the same calm answer.
If there was a moment the input protection earned its keep, this was it. Real-world invalid input doesn’t look like keyboard smashing – it looks like a tired professional pasting the wrong cell from a spreadsheet at the start of a workday. The filter knew the difference between that and an idea. And as a bonus, the man whose team would spend two weeks pressing every button in the app now knew firsthand exactly what his testers would see when they pressed the wrong one.
The First Fruit
There is a difference between a test and an exam.
A test is something you run in a controlled environment. You know the variables. You set the conditions. You design the inputs and you wait for the outputs you expected.
An exam is what happens when you hand something real to people you have never met and watch what they do with it.
The closed testing period Google Play requires is framed as a compliance checkbox. Fourteen days. Twelve testers. Daily activity. Pass the requirement, unlock production access, move on. That’s the test.
What happened in the first week was the exam.
It started on a morning commute. The star rating widget on the homepage – the one sitting quietly under the main image, pulling live data from the database every time someone running the CGEN engines – was sitting at 182 when I left for work. A number I had watched climb slowly over the first few days, one rating at a time, each one a real person who had opened an engine, filled in the fields, and paid the gate before running their report.
A coworker glanced at the screen and said 293.
Not 193. 293.
In the hours I had been away from the computer, 111 people had rated their experience. Not because they were asked to rate it again. Because they came back, ran another engine, and paid the gate again. The rating counter doesn’t lie and doesn’t repeat. Every number is a unique moment of engagement.
That was the first fruit. Not a metric. Not a vanity number. A signal that something real was happening on the other side of the screen.
Then came the QA report.
Bitrupt’s team delivered eleven documented bugs. Not vague complaints – a full professional report with severity ratings, reproduction steps, expected versus actual behaviour, screenshots, and priority levels. A Bug Summary Dashboard. The kind of deliverable that most indie developers never receive before launch.
BUG-011 was the most revealing. Registration shows “Failed” – account is actually created. Every one of Bitrupt’s testers team had registered successfully on the first attempt. They just saw an error message, assumed it failed, and figured out they could log in with the credentials they had just created. They pushed through a broken registration screen and found their way in anyway. That’s not a user behavior you design for. That’s a user who wants to be there.
One day of work fixed all eleven bugs. Keyboard handling on every screen. Email validation. The false registration error. Terms and Conditions linked. Duplicate publish protection with a clear message. And the registration JWT checkbox – one setting in a plugin, unchecked, invisible, silently breaking the experience for every new user since launch.
One checkbox.
The builds went out. iOS resubmitted to Apple with a reply explaining the payment situation. Android updated on Google Play. Bitrupt’s team notified with the full fix list.
By the end of that same day the upgrade page had been cleaned – PayPal removed, new pricing structured, the payment section replaced with an honest coming soon message. The token limits on the Intelligence and Validator engines were tightened. A no-repetition instruction was added to both prompts.
The app that went into day seven of the test was meaningfully better than the app that started day one.
That’s what a real exam does. It doesn’t just check if something works. It shows you exactly where it doesn’t – and gives you the chance to fix it before the world arrives.
293 ratings. Eleven bugs found and fixed. One checkbox that changed everything.
The test was never just about Google Play.
It was about finding out if what was built could survive contact with real people.
The first week said yes.
4:27
The day before the build, Windows decided to update itself overnight and took the session with it. Weeks earlier this would have been a disaster. Now it was an inconvenience – the state file was waiting on GitHub, written for exactly this kind of morning. One link pasted, and the work continued from the precise sentence where it had stopped. The system we built to remember had passed its own exam.
What followed was the longest day of the project.
Apple had rejected the app three times on the same guideline – 3.1.1, payments. The message between the lines was simple: if you sell on our store, you sell through our system. No PayPal links, no web checkouts, no workarounds. The fix was never going to be cosmetic. It meant building a real payment system, the way Apple demands it, from nothing.
So we built it. A RevenueCat account in the morning – the service that stands between an app and Apple’s machinery. Products created and connected: Pro, Premium Monthly, Premium Annual. Entitlements, offerings, a signing key downloaded from Apple that you only get to download once. By afternoon, the purchase flow was being written into all three engines – live prices pulled straight from Apple, a Restore Purchases button, walls that knew the difference between an invitation and a receipt.
And then, somewhere near midnight, I asked a question that changed the product.
Do we have a wall that limits Premium members per day? We didn’t. The plan said “unlimited,” and unlimited was a trap – I had watched the test period prove it, ten dollars of AI costs burning every day against a seven-dollar monthly subscription. Unlimited doesn’t mean as many as you wish. So, at the hour when most people stop making decisions, we made the most important one: the full ladder. Visitors get a taste. Registered members get one more. Pro – three per engine per month, forever, for one payment. Premium – three per engine per day. Every number chosen to keep the lights on.
Then the three engines were rewritten around the ladder, top to bottom, while the night got deeper.
At three in the morning I was filling out a United States tax form – a W-8BEN, the document that tells the IRS an Israeli artist owes them nothing. There is a particular absurdity to certifying, under penalty of perjury, that you have no employees in America, while the birds outside are starting the same song that opened this chapter. The form went Active instantly. The bank was already connected. Apple’s entire business machinery – agreement, bank, tax – stood complete.
The build compiled. The version number rolled from 1.0.1 to 1.1.0 – not a fix this time, a feature. The submission went out with all three purchases attached and a note to the reviewer explaining, politely, that everything they had rejected three times was now built exactly the way they wanted it.
At 4:27 in the morning, App Store Connect said: Waiting for Review.
This chapter opened at 4:44, before dawn, with a broken build and three lines of overconfident code. It closes at 4:27, before dawn, with the strongest submission this project has ever sent – seventeen minutes earlier on the clock and several days further down the road.
The same birds. Same app. Same AI & I.
A different chapter.
The next morning, the test stood at day twelve. The star counter read 483 – seventeen short of the five hundred I had predicted when the number was 158 and the prediction felt like optimism.
The first fruit had been picked. Whatever Apple decides, the tree is real.









Leave a Reply
You must be logged in to post a comment.