Skip to content

Neural Networks from First Principles

This research project is about building reliable intuition for how neural networks actually work, beyond the abstractions of modern ML tooling.

It directly informs how I design and debug production AI systems, especially in safety- and reliability-sensitive domains.

Context

While building systems that rely on machine learning, I found that most tooling abstracts away the underlying mechanics.

To make better decisions in production systems—particularly in areas like medical imaging and LLM-driven workflows—I wanted a direct understanding of how these models behave at a fundamental level.

This led to a series of first-principles implementations, derived directly from underlying mathematical formulations rather than existing libraries or reference implementations.

From-Scratch Implementation

Implemented a fully functioning neural network from scratch in TypeScript, including manual backpropagation and gradient descent—without external libraries or guided implementations.

What Was Built

  • From-scratch neural network: Implemented forward and backward propagation using only core language features
  • Backpropagation by hand: Derived and implemented gradient calculations directly from calculus
  • Training system: Built a complete training loop with loss functions and optimization
  • First-principles derivation: Implemented directly from mathematical definitions, without relying on existing implementations
  • TypeScript environment: Developed entirely outside the typical Python/ML ecosystem

The focus was correctness and understanding, not abstraction or convenience.

Key Insights

Building these systems from first principles led to a clearer understanding of:

  • Gradient flow: How learning signals propagate through a network and where they degrade
  • Model behavior: Why models converge, fail to converge, or behave unpredictably
  • Architecture tradeoffs: How structural decisions impact stability and performance
  • Failure modes: Where higher-level abstractions can obscure critical issues

This perspective carries over directly into applied work with large language models and domain-specific AI systems.

Application to Production Work

This work informs how I approach AI systems in practice:

  • Universal Radiology: Reasoning about model behavior and limitations when adapting models to low-resource clinical environments
  • FairHire AI: Designing and debugging complex LLM prompt chains, evaluation flows, and model interactions

Rather than treating models as black boxes, I’m able to reason about their behavior and make more reliable engineering decisions.

Ongoing Work

This investigation is continuing with a focus on scaling these systems:

  • WebGPU compute: Writing custom GPU compute kernels to support parallelized training
  • Transformer architecture: Extending the implementation toward attention-based models
  • System constraints: Exploring the practical limits of training and inference outside traditional ML stacks

These constraints introduce a different set of engineering challenges, particularly around performance, memory layout, and numerical stability.

Code

The full implementation is available here: neural-networks-from-scratch

GPU experiments (DAG-based WebGPU implementation work) live here: neural-networks-from-scratch-gpu