Neural Networks from First Principles
This research project is about building reliable intuition for how neural networks actually work, beyond the abstractions of modern ML tooling.
It directly informs how I design and debug production AI systems, especially in safety- and reliability-sensitive domains.
Context
While building systems that rely on machine learning, I found that most tooling abstracts away the underlying mechanics.
To make better decisions in production systems—particularly in areas like medical imaging and LLM-driven workflows—I wanted a direct understanding of how these models behave at a fundamental level.
This led to a series of first-principles implementations, derived directly from underlying mathematical formulations rather than existing libraries or reference implementations.
From-Scratch Implementation
Implemented a fully functioning neural network from scratch in TypeScript, including manual backpropagation and gradient descent—without external libraries or guided implementations.
What Was Built
- From-scratch neural network: Implemented forward and backward propagation using only core language features
- Backpropagation by hand: Derived and implemented gradient calculations directly from calculus
- Training system: Built a complete training loop with loss functions and optimization
- First-principles derivation: Implemented directly from mathematical definitions, without relying on existing implementations
- TypeScript environment: Developed entirely outside the typical Python/ML ecosystem
The focus was correctness and understanding, not abstraction or convenience.
Key Insights
Building these systems from first principles led to a clearer understanding of:
- Gradient flow: How learning signals propagate through a network and where they degrade
- Model behavior: Why models converge, fail to converge, or behave unpredictably
- Architecture tradeoffs: How structural decisions impact stability and performance
- Failure modes: Where higher-level abstractions can obscure critical issues
This perspective carries over directly into applied work with large language models and domain-specific AI systems.
Application to Production Work
This work informs how I approach AI systems in practice:
- Universal Radiology: Reasoning about model behavior and limitations when adapting models to low-resource clinical environments
- FairHire AI: Designing and debugging complex LLM prompt chains, evaluation flows, and model interactions
Rather than treating models as black boxes, I’m able to reason about their behavior and make more reliable engineering decisions.
Ongoing Work
This investigation is continuing with a focus on scaling these systems:
- WebGPU compute: Writing custom GPU compute kernels to support parallelized training
- Transformer architecture: Extending the implementation toward attention-based models
- System constraints: Exploring the practical limits of training and inference outside traditional ML stacks
These constraints introduce a different set of engineering challenges, particularly around performance, memory layout, and numerical stability.
Code
The full implementation is available here: neural-networks-from-scratch
GPU experiments (DAG-based WebGPU implementation work) live here: neural-networks-from-scratch-gpu