Hallucination Guard Methodology¶
Purpose¶
hallucination_guard flags high-frequency 補助金 / 税制 / 融資 / 認定 /
行政処分 / 法令 misconceptions in LLM-generated answers before they reach
the user. We never call an LLM ourselves (see feedback_autonomath_no_api_use);
this guard is the cheapest way to keep downstream Claude / Cursor / GPT
outputs honest when they cite our data.
Data structure¶
Source of truth: data/hallucination_guard.yaml (launch v1 = 60 entries).
entries:
- phrase: "..." # verbatim misconception
severity: high # high | medium | low
correction: "..." # one-line correction
law_basis: "..." # optional 法律名 + 条
audience: 税理士 # 税理士 | 行政書士 | SMB | VC | Dev
vertical: 税制 # 補助金 | 税制 | 融資 | 認定 | 行政処分 | 法令
Grid: 5 audience × 6 vertical × 2 phrase = 60. Every cell holds exactly two phrases — broad coverage, no single-cell overfit pre-launch.
Runtime¶
src/jpintel_mcp/self_improve/loop_a_hallucination_guard.py exposes:
match(text) -> list[dict]— substring scan; pure, no DB / network.summarize() -> dict— counts by severity / audience / vertical.run(dry_run)— weekly orchestrator entry. Never writes the DB at launch; real candidate writes are gated to T+30d.
Self-improve expansion (60 → 1,500+)¶
Loop A runs weekly post-launch:
- Pull 7-day
customer_feedback(wrong_answer / made_up_program) + low-confidence rows fromquery_log_v2. - Embed with local e5-small (no LLM API).
- DBSCAN (eps 0.18, min 3). Medoid → candidate phrase.
- Append to
hallucination_guard_candidateswithstatus='pending_review'. - Operator promotes manually. Target: 1,500+ rows within 6 months.
Operator manual-add¶
- Append to
data/hallucination_guard.yaml. Required fields and enum values must match the schema; the loader silently drops malformed rows. - Run
pytest tests/test_hallucination_guard.py— the schema test catches missing fields and bad enums. - Commit.
lru_cachemeans API workers need a restart to pick up changes.