Skip to content

LessUp/tiny-dl-inference

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

45 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Tiny-DL-Inference

CI Status npm version Bundle Size WebGPU TypeScript License

A High-Performance WebGPU Deep Learning Inference Engine

Zero Dependencies ยท Hand-Written WGSL ยท GPU-Accelerated ยท Type-Safe

Quick Start ยท Features ยท Performance ยท Documentation ยท Contributing

English | ็ฎ€ไฝ“ไธญๆ–‡


Why Tiny-DL-Inference?

The smallest, most transparent deep learning inference engine for the web.

Tiny-DL-Inference TensorFlow.js ONNX Runtime Web
Bundle Size 58KB ~2MB ~1.5MB
Dependencies Zero Heavy Moderate
Code Transparency 100% WGSL source Black box Black box
GPU Control Direct shader access Abstracted Abstracted
Kernel Fusion โœ… Manual fusion Limited Limited

Built for developers who want full control, minimal overhead, and maximum understanding of GPU-based neural network inference.


Features

๐Ÿš€ Performance

  • Zero Dependencies โ€” No TensorFlow.js or ONNX Runtime. Pure WebGPU with minimal footprint
  • Kernel Fusion โ€” Fused Conv2d+Bias+ReLU achieves 3ร— memory bandwidth reduction
  • Zero-Copy Operations โ€” Tensor views with no GPU overhead (< 1ฮผs reshape)
  • Hand-Written WGSL โ€” Every operator implemented from scratch in readable WGSL code

๐Ÿ›  Developer Experience

  • Type Safe โ€” Full TypeScript with strict mode, zero any types
  • Comprehensive Testing โ€” Property-based testing with fast-check (100+ iterations each)
  • Production Ready โ€” Custom error classes, proper GPU resource lifecycle
  • Educational โ€” Perfect for studying GPU computing and WebGPU programming

Quick Start

Requirements

  • Browser: Chrome 113+ / Edge 113+ / Safari 18+ (with WebGPU enabled)
  • Hardware: GPU with WebGPU support (discrete GPU recommended for best performance)
  • Node.js: 18.0+ (for development)

Installation

npm install tiny-dl-inference

๐Ÿš€ Try it Online

Open in StackBlitz

First Inference

import { GPUContext, Tensor, ReLUOperator } from 'tiny-dl-inference';

// 1. Initialize GPU context
const context = new GPUContext();
await context.init();

// 2. Create input tensor
const input = Tensor.fromArray(context, 
  new Float32Array([1.0, -2.0, 3.0, -4.0]),
  [1, 4, 1, 1]  // [batch, channels, height, width]
);

// 3. Run ReLU activation
const relu = new ReLUOperator(context);
const output = await relu.forward([input]);

// 4. Get results
const result = await output.download();
console.log(result); // Float32Array([1, 0, 3, 0])

// 5. Cleanup resources
input.destroy();
output.destroy();
context.destroy();

Using InferenceEngine (High-Level API)

import { InferenceEngine, ModelLoader } from 'tiny-dl-inference';

// Initialize engine
const context = new GPUContext();
await context.init();

const engine = new InferenceEngine(context);

// Load model from JSON
await engine.loadModel('https://example.com/mnist-model.json');

// Prepare input (MNIST: 1x1x28x28)
const input = Tensor.fromArray(context, imageData, [1, 1, 28, 28]);

// Run inference
const output = await engine.infer(input);
const predictions = await output.download();

// Get predicted class
const predictedClass = predictions.indexOf(Math.max(...predictions));
console.log('Predicted digit:', predictedClass);

// Cleanup
input.destroy();
output.destroy();
engine.dispose();
context.destroy();

โ†’ Read the Full Documentation for detailed guides and examples.


Performance

Kernel Fusion: 3ร— Memory Bandwidth Reduction

Without Fusion (6 memory operations):
  Read โ†’ Conv โ†’ Write โ†’ Read โ†’ Bias โ†’ Write โ†’ Read โ†’ ReLU โ†’ Write

With Fusion (2 memory operations):
  Read โ†’ Conv+Bias+ReLU โ†’ Write
Benchmark Separate Operators Fused Operator Improvement
Conv2d 64-channel 2.34ms 0.89ms 2.6ร— faster
Memory Operations 6 ops 2 ops 3ร— reduction
Kernel Launches 3 1 66% fewer
Intermediate Tensors 3 allocated 0 100% saved

Zero-Copy Reshape

// Zero GPU overhead - creates a view, not a copy
const flat = tensor.reshape([1, 2352]);  // < 1 microsecond

First Inference Latency

Model Latency Device
MNIST CNN < 100ms Chrome 120, RTX 3060
CIFAR-10 < 150ms Chrome 120, RTX 3060

Supported Operators

Convolution

Operator Description Fusion Available
Conv2d 2D Convolution with stride/padding โœ… Fused with Bias+ReLU
Conv2dBiasReLU Conv + Bias + ReLU in single kernel โœ… 3ร— memory reduction

Pooling

Operator Description
MaxPool 2D Max Pooling with configurable kernel size

Activation Functions

Operator Description Formula
ReLU Rectified Linear Unit f(x) = max(0, x)
Softmax Normalized exponential (numerically stable) f(x_i) = e^(x_i) / ฮฃe^(x_j)

Fully Connected

Operator Description
Dense Fully connected layer with optional bias
Flatten Zero-copy tensor reshaping

Complete Example: MNIST Classification

import { GPUContext, Tensor, InferenceEngine } from 'tiny-dl-inference';

async function classifyMNIST(imageData: Float32Array): Promise<number> {
  const context = new GPUContext();
  
  try {
    await context.init();
    const engine = new InferenceEngine(context);
    await engine.loadModel('mnist-model.json');
    
    // Input: 1x1x28x28 (grayscale MNIST)
    const input = Tensor.fromArray(context, imageData, [1, 1, 28, 28]);
    
    // Run inference
    const output = await engine.infer(input);
    const predictions = await output.download();
    
    // Get result
    const predictedDigit = predictions.indexOf(Math.max(...predictions));
    
    // Cleanup
    input.destroy();
    output.destroy();
    engine.dispose();
    
    return predictedDigit;
  } finally {
    // Ensure GPU resources are released even if an error occurs
    context.destroy();
  }
}

// Usage
const imageData = new Float32Array(784); // 28x28 pixel data
classifyMNIST(imageData)
  .then(digit => console.log('Recognized digit:', digit))
  .catch(err => console.error('Inference failed:', err));

โ†’ See more Examples including custom models, web integration, and performance benchmarking.


Browser Compatibility

Browser Minimum Version Status
Chrome 113+ โœ… Fully Supported
Edge 113+ โœ… Fully Supported
Safari 18+ (macOS Sonoma+) โš ๏ธ Experimental
Firefox Behind flag ๐Ÿ”ง Enable dom.webgpu.enabled

Check WebGPU Support

if (navigator.gpu) {
  console.log('โœ… WebGPU is supported!');
} else {
  console.error('โŒ WebGPU not supported in this browser');
}

Project Structure

Architecture Overview

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Application Layer                        โ”‚
โ”‚              (InferenceEngine, ModelLoader)                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                        โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Operator Layer                           โ”‚
โ”‚    (Conv2d, ReLU, MaxPool, Dense, Softmax, etc.)            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                        โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                      Core Layer                             โ”‚
โ”‚         (GPUContext, Tensor, Memory Management)             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                        โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    WebGPU Runtime                           โ”‚
โ”‚              (WGSL Shaders, GPU Compute)                    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Directory Layout

tiny-dl-inference/
โ”œโ”€โ”€ openspec/           # OpenSpec ่ง„่Œƒ้ฉฑๅŠจๅผ€ๅ‘๏ผˆๅ•ไธ€ไบ‹ๅฎžๆฅๆบ๏ผ‰
โ”‚   โ”œโ”€โ”€ specs/          # ่ง„่Œƒๆ–‡ๆกฃ
โ”‚   โ”‚   โ”œโ”€โ”€ product/    # ไบงๅ“้œ€ๆฑ‚่ง„่Œƒ๏ผˆPRD๏ผ‰
โ”‚   โ”‚   โ”œโ”€โ”€ architecture/ # ๆžถๆž„่ฎพ่ฎก่ง„่Œƒ
โ”‚   โ”‚   โ”œโ”€โ”€ api/        # API ่ง„่Œƒ
โ”‚   โ”‚   โ””โ”€โ”€ testing/    # BDD ๆต‹่ฏ•่ง„่Œƒ
โ”œโ”€โ”€ docs/               # User documentation (Bilingual)
โ”‚   โ”œโ”€โ”€ en/             # English (26 files)
โ”‚   โ””โ”€โ”€ zh/             # ไธญๆ–‡ (27 files)
โ”œโ”€โ”€ src/                # Source code
โ”‚   โ”œโ”€โ”€ core/           # GPUContext, Tensor, error classes
โ”‚   โ”œโ”€โ”€ operators/      # Neural network operators
โ”‚   โ”œโ”€โ”€ engine/         # InferenceEngine, ModelLoader
โ”‚   โ””โ”€โ”€ utils/          # Benchmark, CPU reference implementations
โ”œโ”€โ”€ tests/              # Test suite (Vitest)
โ””โ”€โ”€ examples/           # Demo code (MNIST, benchmark)

Development

Setup

# Clone repository
git clone https://github.com/LessUp/tiny-dl-inference.git
cd tiny-dl-inference

# Install dependencies
npm install

# Run type checking
npm run typecheck

# Run tests (134 passing)
npm test

# Build project
npm run build

Testing

# Run all tests
npm test

# Run with coverage report
npm run test:coverage

# Run specific test file
npx vitest run tests/operators/Conv2dOperator.test.ts

# Property-based tests (100+ iterations each)
npx vitest run -t "property"

Test Coverage:

  • โœ… 134 tests passing
  • โœ… 13 property-based tests with fast-check
  • โœ… CPU reference implementations for correctness validation
  • โœ… Target: >90% code coverage (V8)

Documentation

๐Ÿ“š Getting Started

๐Ÿ”ง Core Concepts

๐Ÿš€ Advanced

๐Ÿ“– API Reference

๐Ÿ’ก Examples

๐Ÿงช Playground


ไธญๆ–‡ๆ–‡ๆกฃ

โ†’ Browse Full Documentation: English | ไธญๆ–‡


Contributing

We welcome contributions! This project follows Spec-Driven Development (SDD) โ€” all changes must be defined in /specs/ first.

Quick Start

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/your-feature
  3. Review specs in /specs/ before coding
  4. Implement your changes
  5. Test thoroughly (134+ tests)
  6. Submit a Pull Request

Resources

Code Style

  • TypeScript strict mode (strict: true)
  • 2-space indentation, single quotes
  • Property-based testing with fast-check
  • Follow existing patterns in /src/operators/

Specifications

This project uses Spec-Driven Development โ€” specifications are the Single Source of Truth:

Spec Location Purpose
Requirements openspec/specs/product/spec.md What to build
Architecture openspec/specs/architecture/spec.md How to build it
API Contracts openspec/specs/api/spec.md Interface definitions
Test Criteria openspec/specs/testing/spec.md Acceptance criteria

Changelog

See CHANGELOG.md for all releases.

Latest: v2.0.1 (2026-04-16)

Security:

  • Fixed 5 moderate npm vulnerabilities
  • Updated vitest to v4.1.4

Performance:

  • Kernel fusion: 3ร— memory reduction
  • Zero-copy reshape: < 1ฮผs overhead
  • GPU memory leak fixes

โ†’ Full Changelog


License

MIT License โ€” Free for personal and commercial use.


Links


Built with โค๏ธ for the AI community