Seeing more announcements of in-game dialogue running locally via 7B distilled models, targeting sub-60 ms turns on RTX 40-series and high-end mobile NPUs. From an AI behavior standpoint this changes loop design — shorter contexts, stricter tool contracts, and deterministic state handoffs — plus it finally pencils out on cost per session. Anyone here shipping this yet, and how are you enforcing style and guardrails without tanking latency?