Self-Hosted AI Runtime — FastAPI + ComfyUI + Ollama n your cloud


About this Gig
I'll stand up a production-ready self-hosted AI runtime on your cloud. Replace $200-$2,000/mo of SaaS API spend with a single commodity cloud box you fully control. Typical scope: - FastAPI gateway routing webhook + REST traffic to inference services - ComfyUI + Flux Q4 / Stable Diffusion for image generation - Ollama (Llama / Qwen / nomic-embed) for chat + embeddings - Kokoro TTS for voice synthesis - Mermaid CLI for diagram render - Optional: PaddleOCR for document parsing, Podcastfy for multi-voice podcast Stack: FastAPI, Docker Compose, Ollama, ComfyUI/Flux, Kokoro TTS, Mermaid, PaddleOCR. Deployed on your OCI ARM / Hetzner / AWS / any commodity Linux box. Tailscale-secured if private; nginx public-facing if needed. Deliverables: - Working stack on your infrastructure - docker-compose.yml + setup runbook - API contract doc per service (curl examples) - Monitoring + GPU usage dashboards - Hand-off video walkthrough I run this exact stack across my own businesses: FastAPI + ComfyUI/Flux + Kokoro TTS + Mermaid + Podcastfy + PaddleOCR on a single OCI ARM box. Real wall-clock 14-50 min for image gen. Replaced ~$200/mo of SaaS APIs.
Requirements
From you to start: - Cloud account: OCI / AWS / Hetzner / Linode / similar (or I'll spin one up on a temp dev environment) - Use cases: which inference services do you need? (image gen / TTS / embeddings / OCR / podcast). Anything beyond the standard stack scopes as add-ons. - Hardware budget: GPU vs CPU, VRAM size if known - Auth posture: who can call the API? (Tailscale-only / API key / OIDC) If unclear what stack you need, 30-min discovery call first to scope properly.
Related Tags
Get To Know Tai Vu
