Part three of our Koffeejam series.

The numbers are better than they had any right to be.

The results from the KOFFEEJAM are finally in, and frankly, the scoreboard deserves a bit of a quiet moment of appreciation.

Twenty games were submitted. Twenty.

To put that into context, this is a studio full of highly capable professionals, but it’s also a studio where most people are used to staying firmly in their own specialized lanes. Before this experiment, the prospect of a single person building an entire multiplayer game from scratch - handling the logic, the loops, the UI, and the deployment single-handedly - was generally viewed as "a job for a whole department," or at the very least, a massive production detour.

We didn't predict what happened next. Honestly, if we are being completely transparent, we were quietly hoping for maybe ten prototypes that mostly booted without catching fire.

Instead, we got twice that. Across 24 voters in the studio, 301 votes were cast to determine the final ranking.

Behind the Scores: How We Judged

To make sure the ranking was fair, every voter evaluated the submissions across four distinct criteria. Each category was scored from 1 to 10, leading to a maximum possible score of 40 points total.

Our benchmark scale was simple:

1–2: Poor / Not present
3–4: Below expectations
5–6: Meets expectations
7–8: Above expectations
9–10: Exceptional

The voters had to judge the games based on four pillars:

Clarity: Did you understand the game and its mechanics? Could you quickly figure out what to do?
Gameplay: How fun was it? Did the core mechanics feel satisfying? Was there skill, tension, or excitement?
Technical Execution: How well was it built? Did it run smoothly? Were there bugs or rough edges?
Visual Polish: How well did it look? Did the visual design feel intentional and cohesive?

The Final Ecosystem

The sheer volume of what was produced under these criteria caught us off guard, but the breakdown of the state of these games is where the real story lies. Out of those twenty submissions, the results look like this:

9 are fully playable. We aren’t talking about simple grey-box tests here. These are games with complete round systems, functional multiplayer lobbies, scoring mechanics, win conditions, and proper aesthetic polish.
7 have minor issues. These are complete, cohesive games that just ran into one stubborn, late-stage edge case. A player physics interaction that never quite resolved, or an animation layer that the AI couldn't generate and had to be left rough. The kind of thing you’d easily fix in a normal production cycle, but couldn't quite squeeze into a strict two-week window.
4 are prototypes. These aren't failures; they are ambitious explorations. They are games that got far enough to be entirely legible - you can see exactly what they were trying to achieve - but they simply ran out of two-week road before hitting the finish line. Anyone who has ever tried to ship a game under a tight deadline will intimately recognize that exact feeling.

The fact that sixteen out of twenty games - built by designers, producers, QA engineers, sound designers, and ops managers in a fortnight - are in a state where you can actually sit down with eight colleagues and play them right now is remarkable. It turns out that when you give people the keys to the entire pipeline rather than just their usual corner, they tend to run with it.

The Shape of Twenty Games

When you look at these twenty games collectively, you start to see a very specific pattern regarding what AI-assisted development actually looks like in practice.

The successful games didn't just lean on the machine for code snippet generation. One of the playable entries features a companion character loaded with dozens of personality-driven, contextual reactions to in-game events. Several others shipped with day-one mobile support. Another features an ambitious multiplayer survival RPG loop with rotating themed island worlds, a five-stage progression system, and a legendary final boss.

But the experiment also exposed the clear boundaries of the current tech. Animation, complex spatial physics, and deep architectural taste still heavily resist automation. The participants who succeeded weren't the ones who treated the AI like a magic wand; they were the ones who treated it like a highly enthusiastic, incredibly fast, but slightly erratic assistant.

What cuts across all twenty projects, regardless of their final score or level of polish, is a permanent shift in perspective. Every single participant in the cohort we’re tracking noted that this experience has fundamentally changed how they approach their day-to-day work going forward. Not "maybe in the future," and not "only for specific tasks." The tool is firmly in the room now, and the workflow has evolved.

Coming Next

The full picture : who built what, what it revealed to them, and why the Lead Sound Designer with zero coding experience ended up with the highest Visual score in the group - is in the next and final post.

It's worth the wait.