Work & Case Studies

Projects

A selection of agentic AI systems, self-hosted infrastructure, and ML engineering work. All production-deployed, all without vendor lock-in.

Agentic AILive

SteveBot — AI Digital Twin

A privacy-first AI assistant that answers questions about me, books meetings, and demonstrates production-grade agentic architecture.

Built entirely self-hosted: multi-model orchestration with GLM-4.7 and GLM-5, RAG with PostgreSQL + pgvector, SSE streaming, visitor fingerprinting, feedback loops, and full observability. No OpenAI, no cloud lock-in.

Next.js 15TypeScriptPostgreSQL/pgvectorSQLiteRAGSSE StreamingDocker

Outcome: Demonstrates end-to-end agentic system design in production with real users

ML InfrastructureShipped

Multi-Model LLM Orchestrator

A lightweight orchestration layer that routes queries to the best-fit model based on task type, latency budget, and context size.

Routes structured tasks to GLM-4.7 (fast, precise) and open-ended reasoning to GLM-5 (slower, deeper). Includes parallel execution paths, result merging, and automatic fallback. Deployed in Docker with full request tracing.

TypeScriptNode.jsZ.ai APIStructured outputsDockerPrometheus

Outcome: 40% reduction in average response latency vs. single-model routing

AI/ML EngineeringLive

RAG Pipeline with pgvector

A retrieval-augmented generation system that grounds LLM responses in a curated knowledge base using semantic similarity search.

Implements chunking, embedding generation, cosine similarity search, and context injection. The knowledge base is admin-manageable via a web UI. Embeddings computed locally for full data privacy. Supports hybrid keyword + semantic search.

PostgreSQLpgvectorOpenAI-compatible embeddingsTypeScriptNext.js API Routes

Outcome: Halved hallucination rate in domain-specific Q&A vs. baseline LLM

InfrastructureLive

Self-Hosted LLM Stack

A full production stack for running local language models: inference server, reverse proxy, auth, monitoring, and automated backups.

Runs Z.ai-compatible models on-premise behind an Nginx reverse proxy with JWT auth. Includes Umami analytics, Sentry error tracking, automated SQLite backups to local storage, and a health check dashboard. Zero cloud dependencies.

Docker ComposeNginxSentryUmamiSQLiteGitHub Actions CI

Outcome: Full production workload with 99.9% uptime, zero vendor dependency

DevOpsShipped

Automated Database Backup System

Scheduled backup automation for SQLite and PostgreSQL databases with retention policies, compression, and health monitoring.

Runs on a cron schedule inside Docker, compresses and timestamps backups, enforces a configurable retention window (default: 7 days), and reports status to the health check endpoint. Supports both local filesystem and remote S3-compatible storage.

Node.jsDockerShell scriptingSQLitePostgreSQLHealth API

Outcome: Zero data loss risk with automated 6-hour backup cadence

Frontend EngineeringLive

PWA with Offline Support

Progressive Web App with service worker caching, push notifications, and full offline capability for the AI chat interface.

Implements a Workbox-compatible service worker that caches static assets and provides offline fallbacks. Push notification subscription management via Web Push API with VAPID keys. Installable on iOS and Android home screens.

Next.jsService WorkersWeb Push APIVAPIDIndexedDBTypeScript

Outcome: Works fully offline for cached content; push notifications achieve 60%+ open rate

Want to build something like this?

I help teams design and ship production-grade AI systems. Let's discuss your project.

Talk to SteveBot