A High-Performance WebGPU Deep Learning Inference Engine
Zero Dependencies ยท Hand-Written WGSL ยท GPU-Accelerated ยท Type-Safe
Quick Start ยท Features ยท Performance ยท Documentation ยท Contributing
The smallest, most transparent deep learning inference engine for the web.
| Tiny-DL-Inference | TensorFlow.js | ONNX Runtime Web | |
|---|---|---|---|
| Bundle Size | 58KB | ~2MB | ~1.5MB |
| Dependencies | Zero | Heavy | Moderate |
| Code Transparency | 100% WGSL source | Black box | Black box |
| GPU Control | Direct shader access | Abstracted | Abstracted |
| Kernel Fusion | โ Manual fusion | Limited | Limited |
Built for developers who want full control, minimal overhead, and maximum understanding of GPU-based neural network inference.
- Zero Dependencies โ No TensorFlow.js or ONNX Runtime. Pure WebGPU with minimal footprint
- Kernel Fusion โ Fused Conv2d+Bias+ReLU achieves 3ร memory bandwidth reduction
- Zero-Copy Operations โ Tensor views with no GPU overhead (< 1ฮผs reshape)
- Hand-Written WGSL โ Every operator implemented from scratch in readable WGSL code
- Type Safe โ Full TypeScript with strict mode, zero
anytypes - Comprehensive Testing โ Property-based testing with fast-check (100+ iterations each)
- Production Ready โ Custom error classes, proper GPU resource lifecycle
- Educational โ Perfect for studying GPU computing and WebGPU programming
- Browser: Chrome 113+ / Edge 113+ / Safari 18+ (with WebGPU enabled)
- Hardware: GPU with WebGPU support (discrete GPU recommended for best performance)
- Node.js: 18.0+ (for development)
npm install tiny-dl-inferenceimport { GPUContext, Tensor, ReLUOperator } from 'tiny-dl-inference';
// 1. Initialize GPU context
const context = new GPUContext();
await context.init();
// 2. Create input tensor
const input = Tensor.fromArray(context,
new Float32Array([1.0, -2.0, 3.0, -4.0]),
[1, 4, 1, 1] // [batch, channels, height, width]
);
// 3. Run ReLU activation
const relu = new ReLUOperator(context);
const output = await relu.forward([input]);
// 4. Get results
const result = await output.download();
console.log(result); // Float32Array([1, 0, 3, 0])
// 5. Cleanup resources
input.destroy();
output.destroy();
context.destroy();import { InferenceEngine, ModelLoader } from 'tiny-dl-inference';
// Initialize engine
const context = new GPUContext();
await context.init();
const engine = new InferenceEngine(context);
// Load model from JSON
await engine.loadModel('https://example.com/mnist-model.json');
// Prepare input (MNIST: 1x1x28x28)
const input = Tensor.fromArray(context, imageData, [1, 1, 28, 28]);
// Run inference
const output = await engine.infer(input);
const predictions = await output.download();
// Get predicted class
const predictedClass = predictions.indexOf(Math.max(...predictions));
console.log('Predicted digit:', predictedClass);
// Cleanup
input.destroy();
output.destroy();
engine.dispose();
context.destroy();โ Read the Full Documentation for detailed guides and examples.
Without Fusion (6 memory operations):
Read โ Conv โ Write โ Read โ Bias โ Write โ Read โ ReLU โ Write
With Fusion (2 memory operations):
Read โ Conv+Bias+ReLU โ Write
| Benchmark | Separate Operators | Fused Operator | Improvement |
|---|---|---|---|
| Conv2d 64-channel | 2.34ms | 0.89ms | 2.6ร faster |
| Memory Operations | 6 ops | 2 ops | 3ร reduction |
| Kernel Launches | 3 | 1 | 66% fewer |
| Intermediate Tensors | 3 allocated | 0 | 100% saved |
// Zero GPU overhead - creates a view, not a copy
const flat = tensor.reshape([1, 2352]); // < 1 microsecond| Model | Latency | Device |
|---|---|---|
| MNIST CNN | < 100ms | Chrome 120, RTX 3060 |
| CIFAR-10 | < 150ms | Chrome 120, RTX 3060 |
| Operator | Description | Fusion Available |
|---|---|---|
Conv2d |
2D Convolution with stride/padding | โ Fused with Bias+ReLU |
Conv2dBiasReLU |
Conv + Bias + ReLU in single kernel | โ 3ร memory reduction |
| Operator | Description |
|---|---|
MaxPool |
2D Max Pooling with configurable kernel size |
| Operator | Description | Formula |
|---|---|---|
ReLU |
Rectified Linear Unit | f(x) = max(0, x) |
Softmax |
Normalized exponential (numerically stable) | f(x_i) = e^(x_i) / ฮฃe^(x_j) |
| Operator | Description |
|---|---|
Dense |
Fully connected layer with optional bias |
Flatten |
Zero-copy tensor reshaping |
import { GPUContext, Tensor, InferenceEngine } from 'tiny-dl-inference';
async function classifyMNIST(imageData: Float32Array): Promise<number> {
const context = new GPUContext();
try {
await context.init();
const engine = new InferenceEngine(context);
await engine.loadModel('mnist-model.json');
// Input: 1x1x28x28 (grayscale MNIST)
const input = Tensor.fromArray(context, imageData, [1, 1, 28, 28]);
// Run inference
const output = await engine.infer(input);
const predictions = await output.download();
// Get result
const predictedDigit = predictions.indexOf(Math.max(...predictions));
// Cleanup
input.destroy();
output.destroy();
engine.dispose();
return predictedDigit;
} finally {
// Ensure GPU resources are released even if an error occurs
context.destroy();
}
}
// Usage
const imageData = new Float32Array(784); // 28x28 pixel data
classifyMNIST(imageData)
.then(digit => console.log('Recognized digit:', digit))
.catch(err => console.error('Inference failed:', err));โ See more Examples including custom models, web integration, and performance benchmarking.
| Browser | Minimum Version | Status |
|---|---|---|
| Chrome | 113+ | โ Fully Supported |
| Edge | 113+ | โ Fully Supported |
| Safari | 18+ (macOS Sonoma+) | |
| Firefox | Behind flag | ๐ง Enable dom.webgpu.enabled |
if (navigator.gpu) {
console.log('โ
WebGPU is supported!');
} else {
console.error('โ WebGPU not supported in this browser');
}โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Application Layer โ
โ (InferenceEngine, ModelLoader) โ
โโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Operator Layer โ
โ (Conv2d, ReLU, MaxPool, Dense, Softmax, etc.) โ
โโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Core Layer โ
โ (GPUContext, Tensor, Memory Management) โ
โโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ WebGPU Runtime โ
โ (WGSL Shaders, GPU Compute) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
tiny-dl-inference/
โโโ openspec/ # OpenSpec ่ง่้ฉฑๅจๅผๅ๏ผๅไธไบๅฎๆฅๆบ๏ผ
โ โโโ specs/ # ่ง่ๆๆกฃ
โ โ โโโ product/ # ไบงๅ้ๆฑ่ง่๏ผPRD๏ผ
โ โ โโโ architecture/ # ๆถๆ่ฎพ่ฎก่ง่
โ โ โโโ api/ # API ่ง่
โ โ โโโ testing/ # BDD ๆต่ฏ่ง่
โโโ docs/ # User documentation (Bilingual)
โ โโโ en/ # English (26 files)
โ โโโ zh/ # ไธญๆ (27 files)
โโโ src/ # Source code
โ โโโ core/ # GPUContext, Tensor, error classes
โ โโโ operators/ # Neural network operators
โ โโโ engine/ # InferenceEngine, ModelLoader
โ โโโ utils/ # Benchmark, CPU reference implementations
โโโ tests/ # Test suite (Vitest)
โโโ examples/ # Demo code (MNIST, benchmark)
# Clone repository
git clone https://github.com/LessUp/tiny-dl-inference.git
cd tiny-dl-inference
# Install dependencies
npm install
# Run type checking
npm run typecheck
# Run tests (134 passing)
npm test
# Build project
npm run build# Run all tests
npm test
# Run with coverage report
npm run test:coverage
# Run specific test file
npx vitest run tests/operators/Conv2dOperator.test.ts
# Property-based tests (100+ iterations each)
npx vitest run -t "property"Test Coverage:
- โ 134 tests passing
- โ 13 property-based tests with fast-check
- โ CPU reference implementations for correctness validation
- โ Target: >90% code coverage (V8)
- Quick Start Guide โ Get up and running in 5 minutes
- Installation โ Detailed setup instructions
- Architecture โ System design overview
- GPU Context โ WebGPU resource management
- Tensors โ Multi-dimensional data structures
- Operators โ Neural network layers
- Memory Layout โ NCHW vs NHWC
- Optimization Guide โ Performance tuning
- Kernel Fusion โ Custom fused operators
- Custom Operators โ Build your own WGSL operators
- Benchmarking โ Performance measurement
- GPUContext โ Device management
- Tensor โ Data structures
- Operators โ All operators
- InferenceEngine โ High-level API
- MNIST Classification โ Handwritten digit recognition
- Custom Model โ Build models from scratch
- Web Integration โ Browser-based app
- Performance Tuning โ Benchmarking guide
- Interactive Playground โ Experiment with operators
- ๅฟซ้ๅผๅง โ 5 ๅ้ๅ ไธๆ
- ๆถๆ่ฎพ่ฎก โ ็ณป็ปๆถๆ่ฏดๆ
- ็ฎๅญๆๆกฃ โ ็ฅ็ป็ฝ็ป็ฎๅญ
- ไผๅๆๅ โ ๆง่ฝไผๅๆๅ
- API ๅ่ โ ๅฎๆด API ๆๆกฃ
โ Browse Full Documentation: English | ไธญๆ
We welcome contributions! This project follows Spec-Driven Development (SDD) โ all changes must be defined in /specs/ first.
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature - Review specs in
/specs/before coding - Implement your changes
- Test thoroughly (134+ tests)
- Submit a Pull Request
- Contributing Guide โ Full development workflow
- AGENTS.md โ AI agent development guidelines
- Specs Directory โ Single Source of Truth
- TypeScript strict mode (
strict: true) - 2-space indentation, single quotes
- Property-based testing with
fast-check - Follow existing patterns in
/src/operators/
This project uses Spec-Driven Development โ specifications are the Single Source of Truth:
| Spec | Location | Purpose |
|---|---|---|
| Requirements | openspec/specs/product/spec.md |
What to build |
| Architecture | openspec/specs/architecture/spec.md |
How to build it |
| API Contracts | openspec/specs/api/spec.md |
Interface definitions |
| Test Criteria | openspec/specs/testing/spec.md |
Acceptance criteria |
See CHANGELOG.md for all releases.
Security:
- Fixed 5 moderate npm vulnerabilities
- Updated vitest to v4.1.4
Performance:
- Kernel fusion: 3ร memory reduction
- Zero-copy reshape: < 1ฮผs overhead
- GPU memory leak fixes
โ Full Changelog
MIT License โ Free for personal and commercial use.
- ๐ Documentation: https://lessup.github.io/tiny-dl-inference/
- ๐ป GitHub Repository: https://github.com/LessUp/tiny-dl-inference
- ๐ Issue Tracker: https://github.com/LessUp/tiny-dl-inference/issues
- ๐ฆ npm Package: https://www.npmjs.com/package/tiny-dl-inference
Built with โค๏ธ for the AI community