FairHire AI
Conversational hiring system replacing resumes with structured, LLM-driven evaluation
My co-founder and I built FairHire AI to explore a different hiring primitive: conversation as input, structured evaluation as the system boundary.
The goal was to replace brittle resume filtering with role-specific signals that could be inspected, edited, and iterated on.
Prompt Chaining Built for Reliable Evaluation
Built multi-step prompt chains that turn conversation into structured criteria and consistent evaluation signals—using normalization and guardrails to keep judgment stable.
Overview
FairHire AI was an attempt to rethink how hiring decisions are made.
We saw a clear breakdown in existing systems:
- Resumes were becoming unreliable as signals
- LLM-generated resumes made keyword filtering ineffective
- Existing tools reduced hiring to brittle matching logic
The goal was to replace resumes entirely with a system where:
- Hiring managers define roles through conversation
- Candidates demonstrate fit through guided dialogue
- Evaluation is based on structured priorities rather than static inputs
I was the technical co-founder and built the system end-to-end.
What Was Built
A two-sided system built around structured conversational data:
Role definition via chat Hiring managers describe the role in natural language
Priority extraction and editing LLM pipelines convert conversation into ranked, structured criteria that can be refined in a visual interface
Candidate interaction flow Applicants engage in guided conversations aligned to those priorities
Contextual evaluation Candidate responses are interpreted relative to role-specific signals rather than generic filters
The system replaces resumes with a structured dataset generated from both sides of the hiring process.
Key Decisions
Conversation as input, structure as the system boundary
LLMs were not used to directly make hiring decisions.
Instead:
- Conversations were transformed into structured data
- That structure became the basis for evaluation
This made the system inspectable, editable, and adaptable.
Separate generation from evaluation
In practice, LLMs are weak evaluators.
To compensate:
- Evaluation criteria were explicitly defined and structured
- Outputs from one step were normalized before being used in the next
This reduced drift and improved consistency across the system.
Enable rapid iteration on LLM behavior
The system needed to evolve as we learned how models behaved.
I built an abstraction layer that allowed:
- Prompt templates to be edited without code changes
- Models to be swapped per interaction
- Prompt chains to be iterated on quickly
This allowed non-technical collaborators to directly influence system behavior.
Engineering Approach
The core challenge was managing non-deterministic components inside a coherent system.
Prompt chaining pipelines Multi-step flows transforming: conversation -> structured priorities -> evaluation signals
Controlled data flow between steps Each stage produced structured outputs designed to be consumed reliably by the next
Deterministic scaffolding around LLMs Guardrails to constrain variability and maintain consistency
End-to-end type-safe architecture PostgreSQL + GraphQL + TypeScript to support rapid iteration without breaking system integrity
This was less about individual prompts and more about designing the system that connects them.
My Role
- Technical co-founder
- Designed and built the full system
- Owned architecture, infrastructure, and data modeling
- Developed prompt chaining strategies and evaluation flows
- Built tooling for non-technical prompt iteration
Outcome
The product was not brought to market.
The core system, however, reached a functional state, with the primary technical challenges—structured extraction, prompt chaining, and evaluation flow—addressed.
What This Work Enabled
This project was where I developed real depth in building LLM-driven systems.
It involved pushing beyond simple prompt usage into:
Multi-step prompt chaining: Designing flows where each model output becomes structured input for the next step
Evaluation under ambiguity: Working around the fact that LLMs are unreliable evaluators by carefully structuring criteria and intermediate representations
Controlling model behavior at the system level: Shaping how models “reason” by constraining inputs, outputs, and transitions between steps
Iterating on unstable primitives: Rapidly refining prompts and system structure in response to inconsistent model behavior
In practice, this meant working at the edge of what LLMs can reliably do—building systems that remain usable despite non-determinism.
This experience now informs how I design and implement any system that depends on large language models.