Cache architecture (4-Layer L0–L4)¶
AutonoMath's API is built on a Pre-computed Reasoning Layer: customer queries do not trigger ad-hoc joins, FTS scans, or multi-tool plans at request time. Every shape an answer can take has already been computed overnight, or is being served straight from a hot cache row. This page explains the four layers, why each one exists, and where the boundary between them lives.
Why a 4-layer split¶
A single ¥3 request must amortise:
- One FTS / index lookup (cheap, ~µs).
- Zero or more cross-table joins (medium, ~ms).
- Zero or more cross-database lookups (medium, ~ms —
jpintel.db⇄autonomath.dbare not ATTACH-joinable, two connections required). - Response serialisation (medium, ~ms — JSON encoding dominates p95 for high-cardinality enums).
Caching only the bottom layer (page cache) breaks down because page cache is per-connection and cold on every Fly machine restart. Caching only the top (full response blob) wastes storage on rare params. So we split the work into four cooperating layers, each with its own freshness contract:
| Layer | Name | Storage | Freshness | Rebuild cost |
|---|---|---|---|---|
| L0 | Storage | Raw SQLite + FTS5 | Every ingest write | Hours (full reingest) |
| L1 | Atomic | Single-row lookups via primary key / index | Live | Microseconds |
| L2 | Composite | Joins across tables | Live | Milliseconds |
| L3 | Reasoner | pc_* materialized views |
Nightly | Minutes |
| L4 | Cache | l4_query_cache blobs |
Per-row TTL | Microseconds (hit) / L3 cost (miss) |
L0 — Storage¶
The on-disk SQLite files plus their FTS5 trigram indexes.
data/jpintel.db(188 MB):programs,case_studies,loan_programs,enforcement_cases,laws,court_decisions,bids,tax_rulesets,invoice_registrants, plus the join tablesprogram_law_refsandenforcement_decision_refs.autonomath.db(7.36 GB, read-only as of 2026-04-25): the entity-fact EAV schema (am_entities,am_entity_facts,am_relation,am_alias,am_authority,am_region, ...). Owned by the data-collection CLI; the API repo opens it read-only.
L0 is the only layer that holds ground truth. Every other layer is a projection or cache of L0 rows — corruption here is the only thing that cannot be recovered without a re-ingest.
L1 — Atomic queries¶
Single-row primary-key or unique-index lookups:
SELECT * FROM programs WHERE program_id = ?SELECT * FROM laws WHERE law_id = ?
These are too cheap to cache (b-tree depth 3–4, < 1 µs warm). The API serves them straight from L0.
L2 — Composite joins¶
Multi-row reads that join 2–3 tables:
- "Programs that cite law X" →
programs ⨝ program_law_refs - "Loans within Y prefecture" →
loan_programs ⨝ region - "Adoptions for FY 2025" →
adoption_records ⨝ programs
These are still served live — SQLite's planner handles them in single- digit milliseconds with the right indexes — but they are the input to L3 materialized views below.
L3 — Reasoner (the pc_* layer)¶
Multi-tool, multi-DB, or aggregate queries are pre-computed nightly into
the pc_* tables. As of migration 045 (2026-04-25) there are 32 such
tables on disk (14 from migration 044 + 18 from migration 045). The v8
plan target counts a broader "pre-computed table" definition that includes
the 19 baseline materialized representations (FTS5 trigram indexes,
programs denormalised rollups, ...) — counted that way the on-disk
total is 19 + 32 = 51 representations covering the L3 surface, comfortably
ahead of the T+90d 47 milestone.
The 14 tables added in migration 044:
| # | Table | Dimension | Purpose |
|---|---|---|---|
| 1 | pc_top_subsidies_by_industry |
industry × top 20 | Faster industry-filter responses |
| 2 | pc_top_subsidies_by_prefecture |
47 prefectures × top 20 | Faster region-filter responses |
| 3 | pc_law_to_program_index |
law_id → program_ids[] |
Citation graph lookup |
| 4 | pc_program_to_amendments |
program_id → amendment_ids[] |
Change-history lookup |
| 5 | pc_acceptance_stats_by_program |
program_id × FY |
Pre-aggregated 採択率 |
| 6 | pc_combo_pairs |
program_a × program_b |
Compat / conflict pairs |
| 7 | pc_seasonal_calendar |
month × programs | Deadline calendar tool |
| 8 | pc_industry_jsic_aliases |
alias → JSIC | Natural-language matcher |
| 9 | pc_authority_to_programs |
authority_id → program_ids[] |
"All METI programs" enum |
| 10 | pc_law_amendments_recent |
last 365 days | Change feed |
| 11 | pc_enforcement_by_industry |
industry × enforcement | Risk lookup |
| 12 | pc_loan_by_collateral_type |
collateral × top loans | 担保軸での絞り込み |
| 13 | pc_certification_by_subject |
subject × top certs | "ISO27001 が要る制度" |
| 14 | pc_starter_packs_per_audience |
5 audience × pack | Audience-aware landing |
The 18 tables added in migration 045:
| # | Table | Dimension | Purpose |
|---|---|---|---|
| 15 | pc_amendment_recent_by_law |
law × 直近 365 日 amendment | Law-first change feed |
| 16 | pc_program_geographic_density |
prefecture × tier × count | Region density chart |
| 17 | pc_authority_action_frequency |
authority × month | "Active authority" indicator |
| 18 | pc_law_to_amendment_chain |
law → ordered amendments | Versioned-law lookup |
| 19 | pc_industry_jsic_to_program |
JSIC × top 50 programs | All-program industry lookup |
| 20 | pc_amount_max_distribution |
amount bucket × count | Funding-size histogram |
| 21 | pc_program_to_loan_combo |
program × compatible loans | Combo planner |
| 22 | pc_program_to_certification_combo |
program × required certs | Eligibility planner |
| 23 | pc_program_to_tax_combo |
program × applicable tax 特例 | Tax stacking |
| 24 | pc_acceptance_rate_by_authority |
authority × FY | 採択率 rollup |
| 25 | pc_application_close_calendar |
month × close-date programs | Close-only calendar |
| 26 | pc_amount_to_recipient_size |
amount × SMB size | Scale-fit guidance |
| 27 | pc_law_text_to_program_count |
law → program count | Citation rollup |
| 28 | pc_court_decision_law_chain |
court × law × decision | Case-law graph |
| 29 | pc_enforcement_industry_distribution |
industry × severity × 5yr count | Risk histogram |
| 30 | pc_loan_collateral_to_program |
collateral × programs | Collateral-axis bridge |
| 31 | pc_invoice_registrant_by_pref |
pref × 適格事業者 count | Invoice density |
| 32 | pc_amendment_severity_distribution |
severity × month | Change-feed trend |
Refreshed by scripts/cron/precompute_refresh.py once per night, using a
DELETE-then-INSERT pattern inside a single transaction per table. The pre-
launch state is empty: tables exist, indexes exist, the cron stub
runs cleanly, but the projection bodies are deferred to follow-up tickets.
API code falls through to L0/L1/L2 on pc_* miss, so the launch path is
identical to today.
Numerical target¶
| Date | Total L3 representations | of which pc_* tables |
|---|---|---|
| Launch (2026-05-06) | 19 | 0 |
| T+30d (post mig 044) | 33 | 14 |
| T+90d (post mig 045, current) | 51 | 32 |
| T+180d | 79 | ≥ 50 |
| Y3 | 100+ | ≥ 80 |
Each new pc_* ticket should include the refresher body it expects in
scripts/cron/precompute_refresh.py.
L4 — Hot blob cache (l4_query_cache)¶
L4 sits above L3 and caches the serialized API response blob — not the rows that produced it. Migration 043 creates a single table:
CREATE TABLE l4_query_cache (
cache_key TEXT PRIMARY KEY, -- sha256(tool + canonical_json(params))
tool_name TEXT NOT NULL,
params_json TEXT NOT NULL, -- canonical JSON (sort_keys, no whitespace)
result_json TEXT NOT NULL, -- the response blob the API returns
hit_count INTEGER NOT NULL DEFAULT 0,
last_hit_at TEXT,
ttl_seconds INTEGER NOT NULL DEFAULT 86400,
created_at TEXT NOT NULL DEFAULT (datetime('now'))
);
Why blob, not rows¶
The Zipf-shaped traffic tail (top ~100 distinct param sets per tool) is ~80% of all calls in steady state. Re-running the L0/L1/L2/L3 query for each repeat is wasted CPU; the response is byte-identical. Caching the serialized blob skips the JOIN, the JSON encoding, and the response model validation step in one go.
Key shape¶
The single source of truth is cache.l4.canonical_cache_key:
def canonical_cache_key(tool_name: str, params: dict) -> str:
payload = f"{tool_name}\n{canonical_params(params)}".encode()
return hashlib.sha256(payload).hexdigest()
canonical_params is json.dumps(params, sort_keys=True, ensure_ascii=False,
separators=(",", ":")). Never hand-roll a sha256 elsewhere — drift
silently misses the cache.
Read path¶
from jpintel_mcp.cache import canonical_cache_key, get_or_compute
key = canonical_cache_key("search_tax_incentives", params)
result = get_or_compute(
cache_key=key,
tool="search_tax_incentives",
params=params,
compute=lambda: _real_search(params),
ttl=86400,
)
get_or_compute returns the cached value on hit, or calls compute()
and stores the result on miss. Stale rows (TTL expired) are treated as
miss; they are not deleted on read (avoiding write-on-read amplification).
Write path¶
L4 is populated organically as customer traffic flows through the API. Pre-launch the table is empty by design — this is not a bug, it is the launch-day expected state. Hit rate climbs over the first ~7 days as the Zipf tail forms.
Eviction posture¶
Two cron paths keep the table bounded:
scripts/cron/l4_cache_warm.py(daily): readsusage_eventsover the last 7 days, ranks by(endpoint, params_digest)Zipf, optionally re-warms the top-100 keys, sweeps TTL'd rows, and trims the table to a soft cap (default 1000 rows) bylast_hit_atascending.scripts/cron/precompute_refresh.pycallscache.l4.sweep_expired()after thepc_*refresh so the L4 cache is consistent with the freshly-rebuilt L3 layer.
Hit-rate target¶
| Date | Hit rate | Margin impact |
|---|---|---|
| Launch | 0% (empty) | baseline 92% |
| T+30d | 60% | 93% |
| Y1 (Zipf saturation) | 80% | 95% |
The 95% margin number is what unlocks the ¥1M/month projection in
project_autonomath_expansion_plan without breaking the
non-negotiable ¥3/req metered pricing.
Layer cooperation (read flow)¶
┌────────────────────────────┐
│ incoming /v1 request │
└──────────────┬─────────────┘
│
▼
┌──────────────────┐
│ L4 cache? │ ← hash(tool, params)
└─────┬────────────┘
│ hit
│ └──────────────► serialised JSON ▶ response
│ miss
▼
┌──────────────────┐
│ L3 pc_* row? │ ← materialized view
└─────┬────────────┘
│ hit
│ └──────────────► row(s) → L4 INSERT ▶ response
│ miss
▼
┌──────────────────┐
│ L2 join │ ← live JOIN
└─────┬────────────┘
│
▼
┌──────────────────┐
│ L1 atomic │ ← single-row b-tree
└─────┬────────────┘
│
▼
┌──────────────────┐
│ L0 storage │ ← bytes on disk
└──────────────────┘
A miss at any layer always falls through to the layer below. There is no "cache wall": correctness is identical with or without the cache.
Layer cooperation (write flow)¶
L0 is mutated by ingest. L1/L2 are live views with no separate write
path. L3 is rebuilt nightly by precompute_refresh.py (DELETE-then-
INSERT per table inside a transaction). L4 entries are invalidated
either:
- Implicitly via TTL (default 24h, 1h for amendment-coupled tools).
- Explicitly by
cache.l4.invalidate_tool(tool_name)after a schema or source change that affects a specific tool's response shape. - By the
precompute_refresh.pysweep step at the end of each nightly refresh.
Operational levers¶
- Cache miss spike → check
usage_eventsfor a new top-N param pattern, then add apc_*row (or pre-warm vial4_cache_warm.py). - L4 staleness complaint → drop
ttl_secondsfor the affected tool (1h for amendment-coupled paths is the established floor). - Storage budget alert → drop
--soft-capinl4_cache_warm.py(default 1000 → 500). Each row is ≤ 32 KB compressed, so even 1000 rows is ~30 MB.
Files¶
scripts/migrations/043_l4_cache.sql—l4_query_cachetable.scripts/migrations/044_precompute_tables.sql— 14pc_*tables (C3 wave).scripts/migrations/045_precompute_more.sql— 18pc_*tables (D8 wave).src/jpintel_mcp/cache/__init__.py,src/jpintel_mcp/cache/l4.py— helper API (canonical_cache_key,get_or_compute,invalidate,invalidate_tool,sweep_expired).scripts/cron/l4_cache_warm.py— daily Zipf warm-up + LRU trim.scripts/cron/precompute_refresh.py— nightlypc_*rebuild (32 tables) + L4 expired sweep.tests/test_precompute_schemas.py— 102-case schema + dry-run invariants (table existence, column shape, indexes, cron iteration).