Skip to content

FairHire AI

Conversational hiring system replacing resumes with structured, LLM-driven evaluation

My co-founder and I built FairHire AI to explore a different hiring primitive: conversation as input, structured evaluation as the system boundary.

The goal was to replace brittle resume filtering with role-specific signals that could be inspected, edited, and iterated on.

Prompt Chaining Built for Reliable Evaluation

Built multi-step prompt chains that turn conversation into structured criteria and consistent evaluation signals—using normalization and guardrails to keep judgment stable.

Overview

FairHire AI was an attempt to rethink how hiring decisions are made.

We saw a clear breakdown in existing systems:

  • Resumes were becoming unreliable as signals
  • LLM-generated resumes made keyword filtering ineffective
  • Existing tools reduced hiring to brittle matching logic

The goal was to replace resumes entirely with a system where:

  • Hiring managers define roles through conversation
  • Candidates demonstrate fit through guided dialogue
  • Evaluation is based on structured priorities rather than static inputs

I was the technical co-founder and built the system end-to-end.

What Was Built

A two-sided system built around structured conversational data:

  • Role definition via chat Hiring managers describe the role in natural language

  • Priority extraction and editing LLM pipelines convert conversation into ranked, structured criteria that can be refined in a visual interface

  • Candidate interaction flow Applicants engage in guided conversations aligned to those priorities

  • Contextual evaluation Candidate responses are interpreted relative to role-specific signals rather than generic filters

The system replaces resumes with a structured dataset generated from both sides of the hiring process.

Key Decisions

Conversation as input, structure as the system boundary

LLMs were not used to directly make hiring decisions.

Instead:

  • Conversations were transformed into structured data
  • That structure became the basis for evaluation

This made the system inspectable, editable, and adaptable.

Separate generation from evaluation

In practice, LLMs are weak evaluators.

To compensate:

  • Evaluation criteria were explicitly defined and structured
  • Outputs from one step were normalized before being used in the next

This reduced drift and improved consistency across the system.

Enable rapid iteration on LLM behavior

The system needed to evolve as we learned how models behaved.

I built an abstraction layer that allowed:

  • Prompt templates to be edited without code changes
  • Models to be swapped per interaction
  • Prompt chains to be iterated on quickly

This allowed non-technical collaborators to directly influence system behavior.

Engineering Approach

The core challenge was managing non-deterministic components inside a coherent system.

  • Prompt chaining pipelines Multi-step flows transforming: conversation -> structured priorities -> evaluation signals

  • Controlled data flow between steps Each stage produced structured outputs designed to be consumed reliably by the next

  • Deterministic scaffolding around LLMs Guardrails to constrain variability and maintain consistency

  • End-to-end type-safe architecture PostgreSQL + GraphQL + TypeScript to support rapid iteration without breaking system integrity

This was less about individual prompts and more about designing the system that connects them.

My Role

  • Technical co-founder
  • Designed and built the full system
  • Owned architecture, infrastructure, and data modeling
  • Developed prompt chaining strategies and evaluation flows
  • Built tooling for non-technical prompt iteration

Outcome

The product was not brought to market.

The core system, however, reached a functional state, with the primary technical challenges—structured extraction, prompt chaining, and evaluation flow—addressed.

What This Work Enabled

This project was where I developed real depth in building LLM-driven systems.

It involved pushing beyond simple prompt usage into:

  • Multi-step prompt chaining: Designing flows where each model output becomes structured input for the next step

  • Evaluation under ambiguity: Working around the fact that LLMs are unreliable evaluators by carefully structuring criteria and intermediate representations

  • Controlling model behavior at the system level: Shaping how models “reason” by constraining inputs, outputs, and transitions between steps

  • Iterating on unstable primitives: Rapidly refining prompts and system structure in response to inconsistent model behavior

In practice, this meant working at the edge of what LLMs can reliably do—building systems that remain usable despite non-determinism.

This experience now informs how I design and implement any system that depends on large language models.