Blog

Apr 17, 2026

FairwazeAI

Evals and harness for golf rules bot with iterative prompt and grading rubric tuning

Fairwaze is a personal project I built to stay current with how modern AI products are actually constructed.

The AI ecosystem is evolving extremely quickly. New models, tools, and frameworks appear every few months. Reading about them helps, but the only reliable way to stay sharp is to build real systems.

Fairwaze was my way of doing that.

The goal wasn’t to build a large commercial product. The goal was to design and implement a complete AI product stack from prompt design and evaluation to application architecture and user interaction.

In other words: treat it like a real product, not just a demo.

The Problem

There is a large gap between experimenting with LLMs and building a reliable AI product.

Many demos look impressive in isolation, but once you try to build an application around them you quickly encounter problems:

inconsistent outputs
fragile prompts
unpredictable responses
limited evaluation
difficulty improving the system over time

Building a real AI product means solving these issues at the system level, not just at the model level.

Fairwaze was an exercise in understanding how to design around those challenges.

My Approach

Instead of building a quick prototype, I approached Fairwaze the way I would approach any production product. That meant focusing on several core layers.

Application architecture
Separating prompt logic, model interaction, and application behavior so that each layer could evolve independently.

Prompt engineering
Designing prompts that guided the model toward structured and predictable outputs.

Evaluation
Testing how the system behaved across different inputs to understand failure modes and edge cases.

Iteration
Treating the AI component like any other product feature: ship, test, refine. The goal was to treat the AI system as a product system, not just a model call.

What I Built

Fairwaze implemented several patterns that appear frequently in modern AI applications.

Structured prompting

Rather than relying on single free-form prompts, the system uses structured prompts that guide the model toward consistent outputs. This reduces variance and makes responses easier to interpret programmatically.

Model orchestration

The application separates the responsibilities of:

user interaction
prompt construction
model invocation
response processing

This architecture makes it easier to experiment with different models or prompt strategies without rewriting the entire application.

Iterative evaluation

Because LLM outputs are probabilistic, evaluating system quality requires repeated testing across different inputs.

Fairwaze includes tooling to experiment with prompts and observe how outputs change across scenarios.

This helped identify:

prompt failure modes
reasoning breakdowns
opportunities to simplify prompts

Product-level thinking

Most importantly, the project focused on the product experience, not just the AI.

Questions explored included:

how much reasoning should the model perform vs the application?
when should outputs be constrained or validated?
what parts of the system should remain deterministic?

These decisions ultimately determine whether an AI system feels reliable or chaotic.

Why This Project Matters

AI development is currently in a phase where tooling is evolving faster than best practices.

Building Fairwaze was a way to stay hands-on with:

LLM product architecture
prompt engineering patterns
evaluation workflows
real-world AI system behavior

Staying current in this space requires regular experimentation with new techniques and tools.

Lessons

The biggest lesson from Fairwaze is that building AI products is less about the model and more about the system around the model.

A successful AI product usually succeeds because it:

constrains the model effectively
handles failure cases gracefully
structures prompts carefully
iterates quickly based on feedback

The model provides the intelligence.

The product provides the reliability.

View all posts

Fauxde

May 8, 2026

Prototyping QoL updates for Claude

Read article

Fauxde

May 8, 2026

Prototyping QoL updates for Claude

Read article

Fauxde

May 8, 2026

Prototyping QoL updates for Claude

Read article

What If Every Accelerator Cohort Made the Next One Smarter?

Apr 8, 2026

Accelerators teach founders how to build compounding systems. Most accelerators don't have one themselves.

Read article

What If Every Accelerator Cohort Made the Next One Smarter?

Apr 8, 2026

Accelerators teach founders how to build compounding systems. Most accelerators don't have one themselves.

Read article

What If Every Accelerator Cohort Made the Next One Smarter?

Apr 8, 2026

Accelerators teach founders how to build compounding systems. Most accelerators don't have one themselves.

Read article

Prax Wallet v2

Feb 12, 2026

Reducing user friction and implementing PM frameworks

Read article

Prax Wallet v2

Feb 12, 2026

Reducing user friction and implementing PM frameworks

Read article

Prax Wallet v2

Feb 12, 2026

Reducing user friction and implementing PM frameworks

Read article

FairwazeAI

FairwazeAI

The Problem

My Approach

What I Built

Structured prompting

Model orchestration

Iterative evaluation

Product-level thinking

Why This Project Matters

Lessons

Read more articles

Fauxde

Fauxde

Fauxde

What If Every Accelerator Cohort Made the Next One Smarter?

What If Every Accelerator Cohort Made the Next One Smarter?

What If Every Accelerator Cohort Made the Next One Smarter?

Prax Wallet v2

Prax Wallet v2

Prax Wallet v2

Let's talk

Time for me:

Email:

Socials:

Let's talk

Time for me:

Email:

Socials: