i18n Style Guide (P6-E, English V4)¶
Hand-curation rules for the English half of the AutonoMath message
catalog (src/jpintel_mcp/i18n/__init__.py). Every English string in
the codebase MUST follow this guide. Tests in tests/test_i18n.py
enforce the mechanical rules; the prose rules below are reviewed by a
human at PR time.
TL;DR: Stripe / GitHub register. Keep Japanese names for legal entities and laws, append the official English in parens at first reference. No aggressive translation — the goal is "readable for a Japan-based foreign-born founder who already deals with bilingual contracts," not "fluent for a US-only audience."
1. Tone¶
- Professional but friendly. Match the Stripe / GitHub developer-doc register: declarative, direct, free of marketing adjectives.
- No emoji. Memory
feedback_no_emoji_in_filesapplies. - No exclamation marks in error / status messages. Save them for onboarding tips and only when truly warranted.
- Active voice. "Verify the source URL" — not "The source URL should be verified by the caller."
- Second person where natural. "Please retry after a short interval" is fine; "The user should retry" is not.
2. Capitalisation¶
- Sentence case for all messages.
- Yes: "No matching programs found."
- No: "No Matching Programs Found."
- Tool names stay snake_case (e.g.
search_tax_incentives) — they are referenced as identifiers, not English nouns. - Acronyms are uppercased: NTA, METI, MHLW, FSA, JFC, SMRJ, FTS5, MCP, REST, API. Never "Nta" or "metI".
3. Punctuation¶
- ASCII punctuation only. No 、 。 「 」 ・ in English strings.
The test
test_english_no_zenkaku_punctuationenforces this. - Period at end of sentences. Even short ones: "Multiple candidates."
- Slashes for "and/or" lists are fine: "ja / en / kana / romaji".
- Em-dashes (—) ok but use sparingly — they read as casual.
4. Bilingual conventions (the load-bearing part)¶
4.1 Law names¶
Format: Japanese name, English transliteration in parens at first reference.
- First reference: "中小企業等経営強化法 (SME Management Enhancement Act)"
- Subsequent references in the same response: "中小企業等経営強化法" alone is fine.
- If the response is short enough that the user never sees the first reference (e.g. a 1-line empty-state message), repeat the parens form.
Source of truth for English law names: e-Gov 法令検索 English titles where they exist; otherwise the official ministry transliteration. Never invent — if no official English exists, keep the Japanese name only and add a footnote reference to the e-Gov URL.
4.2 Public agency names¶
Format: Japanese name, official English in parens at first reference, then acronym only afterwards.
| Japanese | First reference | Subsequent |
|---|---|---|
| 経済産業省 | METI (Ministry of Economy, Trade and Industry) | METI |
| 厚生労働省 | MHLW (Ministry of Health, Labour and Welfare) | MHLW |
| 金融庁 | FSA (Financial Services Agency) | FSA |
| 国税庁 | NTA (National Tax Agency) | NTA |
| 中小企業庁 | SME Agency | SME Agency |
| 日本政策金融公庫 | JFC (Japan Finance Corporation) | JFC |
| 中小機構 | SMRJ (Organization for SME and Regional Innovation, Japan) | SMRJ |
| 環境省 | MOE (Ministry of the Environment) | MOE |
| 国土交通省 | MLIT (Ministry of Land, Infrastructure, Transport and Tourism) | MLIT |
| 農林水産省 | MAFF (Ministry of Agriculture, Forestry and Fisheries) | MAFF |
4.3 Program names¶
- Do not translate program names. They are legal entity names.
- Use the Japanese name verbatim. Append a parenthetical descriptor only when the response budget allows: "ものづくり補助金 (manufacturing subsidy)".
- Never invent an English brand: "Manufacturing Subsidy 2026" is wrong; the actual canonical name is the Japanese one.
4.4 Date / time¶
- ISO-8601 for machine-readable fields:
2026-04-25. - Spelled-out for human-facing prose: "April 25, 2026" or "25 April 2026" (consistent within a single response).
- Timezone is JST or UTC explicitly: "JST month-start (00:00)". Never imply timezone from context.
4.5 Currency¶
- Yen sign + ASCII digits: "¥3 per request", "¥3.30 tax-included".
- Comma thousands separator: "13,578 programs".
- Never use the kanji 円 in English strings.
4.6 URL conventions¶
- Keep host names verbatim: "e-Gov", "j-net21", "mirasapo-plus".
- Lowercase URL paths: "/v1/programs/search", not "/V1/PROGRAMS/SEARCH".
- Inline URLs without anchor text are fine in error messages: "See https://autonomath.ai/docs/api-reference/."
5. Length budget¶
- Envelope explanations: 80-180 chars. Long enough to be useful, short enough not to dominate the response.
- Suggestion strings: 60-120 chars. They appear in lists.
- Error user_messages: 60-120 chars. Must include a recovery hint.
- Onboarding tips: 100-200 chars. They are read once.
- Empty-state messages: 80-180 chars, MUST point to a primary source.
6. Forbidden patterns¶
- No "Sorry" / "We apologize" — apologetic phrasing is service-y; Stripe / GitHub register avoids it.
- No "please contact support" — there is no support channel.
- No "free trial" / "upgrade to Pro" — there is no tier model.
Memory
project_autonomath_business_modelapplies. - No machine-translated program content in the catalog. If a string references a specific program, keep the program name in Japanese.
- No "Japan's first" / "the only" / "best-in-class" marketing claims.
- No "official partner" / "endorsed by" language re. ministries — AutonoMath is not endorsed by any ministry.
7. Patterns to copy¶
The scaffolded entries in src/jpintel_mcp/i18n/__init__.py are the
canonical examples. When in doubt, mimic them:
ja: "税制特例が十分件数見つかりました。適用期限と対象を確認して引用してください。"
en: "Sufficient tax incentives found. Verify the sunset date (適用期限) and eligible scope before citing."
Notice: - Drops the polite "してください" → imperative "Verify ... before citing." - Keeps 適用期限 in parens for bilingual readers (Japanese accountants reading the English version recognise the term). - "before citing" carries the hallucination-guard intent without using the word "hallucination" (developer jargon, not user copy).
ja: "該当認定は少数のみでした。認定名の正式名称で再検索すると精度が上がります。"
en: "Few certifications matched. Re-querying with the official Japanese name will improve precision."
Notice: - "Few" not "Only a few of them" (Stripe-style economy). - "the official Japanese name" — explicit that the input must be in JP, to head off the failure mode where an English-speaking caller types "management innovation plan" and gets zero matches because the canonical alias is 経営革新計画.
8. Review checklist for new English strings¶
Before adding an English string to MESSAGES:
- Sentence case, no Title Case
- No emoji, no zenkaku punctuation
- Acronyms uppercased (NTA, METI, ...)
- Law names: JP + English in parens at first reference
- Agency names: JP + official English + acronym pattern
- Program names: JP only
- Currency: ¥ + ASCII digits + comma thousands separator
- No "sorry", no "support", no "upgrade", no "best-in-class"
- Length within the budget for that surface
- Test
test_english_no_zenkaku_punctuationpasses
9. References¶
- Stripe API error reference for tone anchoring.
- GitHub REST API reference for noun-phrase economy.
- e-Gov 法令検索 for official English law titles.
- Memory:
feedback_autonomath_fraud_risk— translation accuracy is a fraud-risk vector, not a polish item. - Memory:
feedback_no_fake_data— never invent English entity names that do not exist in primary sources.
10. llms-full.en.txt generation policy (D6)¶
The English LLM crawler dump at site/llms-full.en.txt is generated by
scripts/regen_llms_full_en.py and follows the bilingual conventions in
this guide. Specifically:
- Structural prose is English. Header, About, Pricing, Audiences, Coverage, and Footer are written in English (~150 lines total). They are templated; do not edit the generated file by hand — edit the builder functions in the script.
- Data rows stay Japanese.
primary_name(programs),law_title(laws),case_titleandindustry_name(case_studies) are emitted verbatim. Translating them mechanically would violate §4.3 (Program names) and §6 (Forbidden patterns: "No machine-translated program content"). - Compact pipe-delimited rows. Same format family as
llms-full.txtbut extended with separate## All Lawsand## All Case Studiessections. Each row sanitises embedded|/ CR / LF / TAB so the file remains parseable by a one-line splitter. - Sibling files.
site/llms.en.txt(short index, ~40 lines) points athttps://autonomath.ai/llms-full.en.txtas the canonical full dump.site/llms.txtandsite/llms-full.txtremain the Japanese siblings and are NOT touched by the en regenerator.- robots.txt. The default policy
User-agent: *Allow: /plus the per-botAllow: /for GPTBot / ClaudeBot / PerplexityBot / Google-Extended / OAI-SearchBot / CCBot / Applebot / Amazonbot already permits crawling of/llms-full.en.txt. No bot-specific change needed. - No machine translation pipeline. This script never calls Anthropic
/ OpenAI / any LLM (memory
feedback_autonomath_no_api_use). The English structural sections are hand-written templates that read live counts from SQLite and slot them in.