I built a $0 fault-tolerant AI pipeline (Groq×5 → DeepSeek → Vertex → template)
TL;DR I run a small dev-shop. Every product I ship needs an LLM call somewhere — content generation, security analysis, classification, summaries. The economics only work if these calls average near-zero . The trick: never pay for what a free tier can do. Stack 5+ providers in a deterministic fallback chain so that when one rate-limits, account-bans, or hikes prices, the next one takes over within the same request — invisible to the user. This post is the actual production code from audit_routes.py powering askoracle.site/audit — a 12-question crypto security scan that costs me $0 per scan in practice, while still producing real AI reports anchored to 2024-25 incident cases. --- The fallback chain User request │ ▼ ┌──────────────────────────────────────────────────────────┐ │ Groq llama-3.3-70b (5 keys, sequential, free tier) │ Tier 1 └──────────────┬───────────────────────────────────────────┘ │ all 5 rate-limited / Cloudflare-blocked? ...