FairwazeAI

FairwazeAI

Evals and harness for golf rules bot with iterative prompt and grading rubric tuning

Evals and harness for golf rules bot with iterative prompt and grading rubric tuning

Fairwaze is a personal project I built to stay current with how modern AI products are actually constructed.

The AI ecosystem is evolving extremely quickly. New models, tools, and frameworks appear every few months. Reading about them helps, but the only reliable way to stay sharp is to build real systems.

Fairwaze was my way of doing that.

The goal wasn’t to build a large commercial product. The goal was to design and implement a complete AI product stack from prompt design and evaluation to application architecture and user interaction.

In other words: treat it like a real product, not just a demo.


The Problem

There is a large gap between experimenting with LLMs and building a reliable AI product.

Many demos look impressive in isolation, but once you try to build an application around them you quickly encounter problems:

  • inconsistent outputs

  • fragile prompts

  • unpredictable responses

  • limited evaluation

  • difficulty improving the system over time

Building a real AI product means solving these issues at the system level, not just at the model level.

Fairwaze was an exercise in understanding how to design around those challenges.


My Approach

Instead of building a quick prototype, I approached Fairwaze the way I would approach any production product. That meant focusing on several core layers.

Application architecture
Separating prompt logic, model interaction, and application behavior so that each layer could evolve independently.

Prompt engineering
Designing prompts that guided the model toward structured and predictable outputs.

Evaluation
Testing how the system behaved across different inputs to understand failure modes and edge cases.

Iteration
Treating the AI component like any other product feature: ship, test, refine. The goal was to treat the AI system as a product system, not just a model call.


What I Built

Fairwaze implemented several patterns that appear frequently in modern AI applications.

Structured prompting

Rather than relying on single free-form prompts, the system uses structured prompts that guide the model toward consistent outputs. This reduces variance and makes responses easier to interpret programmatically.


Model orchestration

The application separates the responsibilities of:

  • user interaction

  • prompt construction

  • model invocation

  • response processing

This architecture makes it easier to experiment with different models or prompt strategies without rewriting the entire application.


Iterative evaluation

Because LLM outputs are probabilistic, evaluating system quality requires repeated testing across different inputs.

Fairwaze includes tooling to experiment with prompts and observe how outputs change across scenarios.

This helped identify:

  • prompt failure modes

  • reasoning breakdowns

  • opportunities to simplify prompts


Product-level thinking

Most importantly, the project focused on the product experience, not just the AI.

Questions explored included:

  • how much reasoning should the model perform vs the application?

  • when should outputs be constrained or validated?

  • what parts of the system should remain deterministic?

These decisions ultimately determine whether an AI system feels reliable or chaotic.


Why This Project Matters

AI development is currently in a phase where tooling is evolving faster than best practices.

Building Fairwaze was a way to stay hands-on with:

  • LLM product architecture

  • prompt engineering patterns

  • evaluation workflows

  • real-world AI system behavior

Staying current in this space requires regular experimentation with new techniques and tools.


Lessons

The biggest lesson from Fairwaze is that building AI products is less about the model and more about the system around the model.

A successful AI product usually succeeds because it:

  • constrains the model effectively

  • handles failure cases gracefully

  • structures prompts carefully

  • iterates quickly based on feedback

The model provides the intelligence.

The product provides the reliability.

Let's talk

Time for me:

Email:

gar@garwalsh.com

Reach out:

© Copyright 2026

Let's talk

Time for me:

Email:

gar@garwalsh.com

Reach out:

© Copyright 2026