Hallucination Guard Methodology¶

Purpose¶

hallucination_guard flags high-frequency 補助金 / 税制 / 融資 / 認定 / 行政処分 / 法令 misconceptions in LLM-generated answers before they reach the user. We never call an LLM ourselves (see feedback_autonomath_no_api_use); this guard is the cheapest way to keep downstream Claude / Cursor / GPT outputs honest when they cite our data.

Data structure¶

Source of truth: data/hallucination_guard.yaml (launch v1 = 60 entries).

entries:
  - phrase: "..."         # verbatim misconception
    severity: high        # high | medium | low
    correction: "..."     # one-line correction
    law_basis: "..."      # optional 法律名 + 条
    audience: 税理士       # 税理士 | 行政書士 | SMB | VC | Dev
    vertical: 税制         # 補助金 | 税制 | 融資 | 認定 | 行政処分 | 法令

Grid: 5 audience × 6 vertical × 2 phrase = 60. Every cell holds exactly two phrases — broad coverage, no single-cell overfit pre-launch.

Runtime¶

src/jpintel_mcp/self_improve/loop_a_hallucination_guard.py exposes:

match(text) -> list[dict] — substring scan; pure, no DB / network.
summarize() -> dict — counts by severity / audience / vertical.
run(dry_run) — weekly orchestrator entry. Never writes the DB at launch; real candidate writes are gated to T+30d.

Self-improve expansion (60 → 1,500+)¶

Loop A runs weekly post-launch:

Pull 7-day customer_feedback (wrong_answer / made_up_program) + low-confidence rows from query_log_v2.
Embed with local e5-small (no LLM API).
DBSCAN (eps 0.18, min 3). Medoid → candidate phrase.
Append to hallucination_guard_candidates with status='pending_review'.
Operator promotes manually. Target: 1,500+ rows within 6 months.

Operator manual-add¶

Append to data/hallucination_guard.yaml. Required fields and enum values must match the schema; the loader silently drops malformed rows.
Run pytest tests/test_hallucination_guard.py — the schema test catches missing fields and bad enums.
Commit. lru_cache means API workers need a restart to pick up changes.