I shipped a 12-question crypto security audit in 2 hours

I had ~2 hours on a Saturday. A committee of two AI models (Gemini Pro + Claude) had just argued me into building a "gamified security audit" tied to our existing in-app currency (💎). 500💎 → 12-question scan → AI report. $49 upsell to a human pentest.

Here's the technical breakdown.

The product in one sentence

User answers 12 questions about their crypto hygiene (seed storage, 2FA, DeFi approvals, etc.). They get a 0–100 score, a per-category breakdown, and a top-3 vulnerabilities report with named tools and 2024–25 incident cases. They burned 500💎 they earned through activity.

Live: askoracle.site/audit — try it.

Question schema

Every question carries a weight (sum across all 12 = exactly 100) and each option has a risk value language-independent of the labels. Example:

{
    "id": "seed_storage",
    "category": "wallet",
    "weight": 14,           # contributes max 14 points to score loss
    "title": {              # per-language headers
        "ru": "Где хранится твоя главная seed-фраза?",
        "en": "Where do you store your main seed phrase?",
        "es": "¿Dónde guardas tu frase semilla principal?",
    },
    "options": [
        {"v": "metal",        # language-independent enum
         "ru": "Металлический backup (Cryptosteel)",
         "en": "Metal backup (Cryptosteel / Billfodl)",
         "es": "Backup metálico (Cryptosteel / Billfodl)",
         "risk": 0},
        {"v": "cloud_notes",
         "ru": "Заметки в iCloud/Google Drive/Notion",
         "en": "Notes in iCloud / Google Drive / Notion",
         "es": "Notas en iCloud / Google Drive / Notion",
         "risk": 10},
        # ...
    ],
}

The same risk arithmetic runs for all 3 languages. Only the rendering and the AI prompt swap on ?lang=ru|en|es.

Scoring

def _calc_score(answers: dict, lang: str) -> tuple[int, dict, list]:
    cat_lost = {"wallet": 0, "twofa": 0, "defi": 0, "operational": 0}
    vulns = []

for q in QUESTIONS:
        w = q["weight"]
        max_r = max(o["risk"] for o in q["options"])
        opt = next((o for o in q["options"] if o["v"] == answers.get(q["id"])), None)
        risk = opt["risk"] if opt else max_r        # missing answer = max risk
        cat_lost[q["category"]] += (risk / max_r) * w if max_r else 0

if risk >= max_r * 0.6 and max_r > 0:
            vulns.append({
                "qid": q["id"],
                "severity": "high" if risk >= max_r * 0.85 else "medium",
                # ...
            })

score = max(0, round(100 - sum(cat_lost.values())))
    return score, breakdown, vulns[:5]

If a user doesn't answer (impossible via UI, but defensive), they take max risk. Score caps at 0 — no negative scores.

LLM fallback chain

The most useful piece. I have 5 free Groq API keys, a paid DeepSeek key, and the Vertex AI Pro CLI as a last resort. Each handoff happens on rate-limit or empty response:

def _generate_report(answers, score, breakdown, vulns, lang):
    text, provider = _chat_complete(system_msg, user_msg, max_tokens=2200)
    if text and len(text) > 400:
        return text, provider  # Groq1..5 or DeepSeek worked

# Subprocess fallback to ask_pro CLI (Vertex AI)
    try:
        with tempfile.NamedTemporaryFile("w", suffix=".md", delete=False) as f:
            f.write(system_msg + "\n\n---\n\n" + user_msg)
        r = subprocess.run(
            ["/usr/local/bin/ask_pro", "-f", f.name, "--max", "3000"],
            capture_output=True, text=True, timeout=90,
        )
        out = re.sub(r"\n\[tokens:.*?\]\s*$", "", r.stdout.strip())
        if out and len(out) > 400:
            return out, "vertex-pro-fallback"
    except Exception:
        pass

# Last resort: deterministic template
    return _fallback_report(score, breakdown, vulns, lang), "fallback-template"

The deterministic template is 30 lines of f-strings that produce a usable (if bland) report from raw answers. It cannot fail. Users always get something.

In E2E testing: Groq key1 was rate-limited that morning; the chain landed on groq3 in 3 seconds. Cost: $0.

The AI prompt — one rule that mattered

""" You are a senior crypto-security analyst at GuardLabs. ...

IMPORTANT: - Never mention "AI", "ChatGPT", "LLM", "model" — you are a GuardLabs analyst - Use specific tool names: revoke.cash, Pocket Universe, YubiKey 5C, SimpleLogin, Bitwarden, Cryptosteel Capsule - Anchor to real incident cases when possible (Lazarus, Anubis, Inferno Drainer, address poisoning waves) """

That third bullet is what makes the report not feel generic. The model knows Inferno Drainer (May 2023, $87M total), it knows address-poisoning (Q2 2024 surge), it knows SIM-swapping by Lazarus (FTX, AscendEx). Without the prompt grounding it just says "your funds could be stolen."

Sample paragraph from a test run (user said: seed in iCloud, browser-only wallet, SMS for 2FA):

> SMS codes for 2FA — severity: high > Using SMS codes for 2FA is significantly vulnerable due to SIM-swapping attacks. In 2020 a hacker used SIM-swapping to steal over $100,000 in crypto from a victim's exchange account. SMS codes can be intercepted or bypassed, granting attackers access. Consider U2F or TOTP via SimpleLogin or Bitwarden. > > Quick fix: Replace SMS with a YubiKey 5C or an authenticator app, and enable 2FA on your email using a password manager like Bitwarden.

XSS defense in the report renderer

The report comes from an LLM. LLMs are influenced by user input. User input is enum values (validated server-side against an allowlist) — but I still html-escape the LLM output before re-wrapping our markdown patterns:

def _report_html(audit_row, report_md, lang):
    import html as _html
    md = _html.escape(report_md or "")                # escape FIRST
    md = re.sub(r"^### (.+)$", r"\1", md, flags=re.M)
    md = re.sub(r"^## (.+)$", r"\1", md, flags=re.M)
    md = re.sub(r"\*\*(.+?)\*\*", r"\1", md)
    # ...

Yes, this means a literal

Поиск по этому блогу

AskOracle Blog