The Semantic Execution Engine

In my past career, I built several command-and-control (C2) systems. These all work by having databases of rules. Streams of events enter the system, and the rules are activated by the events and cause side effects. The rules are effectively policies.

Later, I designed software security programs that feed system events into a rules engine which applies side effects. In these systems, a single event being processed by a rule can create a long-lived session — or you could call it a saga, or a transaction — that accumulates state and reacts to future events.

Later still, I designed UEBA and UBA (user entity behavior analytics) software that accepts events describing entities and their actions and then builds dynamic graphs based on the events. The changes to the dynamic graphs are then modeled as events, and those events trigger rules in a rule engine.

The C2 systems, the rule-based systems, and the UEBA systems are all called "execution engines" by some people. They are common for automating business processes in many industries, and similar rule-based systems exist outside of business as well. All of these are together called "execution engines" in this article. An execution engine is a persistent service that accepts a stream of events and applies dynamic, stateful policies over them, thereby triggering side effects.

In my personal designs, I always pair these execution engines with extreme levels of durability, atomicity, run-once guarantees, flush-before-acknowledge behavior, auditing, visibility, observability, safety by design, and reliability by design. And also I'm just a really dope cool dude, haha.

Why "Bronco"

"Bronco" is called Bronco because a bronco is a wild, untamed horse — and an LLM is wild and untamed. Even five or six documents poisoning the training data of a one-trillion-parameter, state-of-the-art model can influence its behavior in a critical mission. LLMs are being used to "think" by our programs, and they are incredibly fast, so we need perfect visibility into their actions to have any confidence in what they are doing. We must have restraints on the LLM agents that are tight.

In Bronco, every single byte sent to the LLM and generated by the LLM is stored in a local private database. Every tool call attempted or run is recorded, with all its inputs and outputs, into a database. Process isolation is carefully designed to allow for intricate RBAC. We split up the orchestration of workflows, and the GPU inferencing, and the tool priority queue, and the tool executor, and the GPU priority queue, and the process supervisor, so that each process can only access its required data and to allow for future distributed architectures to derive from our code base. Short circuits, failovers, dead letter queues, backpressure, CPU pinning, and anything else we can think of to make this a reliable, safe, and powerful tool — those are all the design components we will use.

This is going to be the TR-808 of LLMs, if that makes any sense.

Why Apple Silicon

I have experience shipping software, websites, appliances (where you buy a computer with preinstalled licensed software), SaaS, serverless, and all that. So initially I was planning to build a highly parallel, concurrent system that would allow a single H100 to serve hundreds of users all working on diverse tasks. My experience using Devin AI left me thinking: Devin AI is the best AI product. They have something really valuable, and I'm envious.

But when I went down the path of building my Devin-AI competitor, I realized a few things:

The physics of data centers and the Internet and users cause massive bottlenecks in certain cities. That leads to severe competition, and therefore managing the economics of any data-center-based product becomes the overwhelming focus. Rates are flexible and bid-based. You can be outbid and lose your spot at any time. Guaranteed slots are available, and extremely expensive. The pricing is often structured so that data in/out is the majority of the cost. You can expect to pay $25 just to download your model to the computer before running it. And even if you rent an entire 8x H200 machine, you'll find that its hardware is only strong enough to support inferencing. Any agentic work that requires computer use — such as web browsing, file editing, coding, spreadsheet analysis (e.g. for finance, medical, engineering), 3D modeling, audio engineering, video editing — would have to be shipped off to another computer. And to power an agentic workflow, you must ship the result back. So we pay the extreme data cost for transferring the results back to the GPU to continue processing. All the while, the data is bottlenecked at population centers full of citizens protesting the data centers in the first place.
Massive models that can do-it-all are slow and expensive to run (due to cooling and energy), and so relying on them would result in an unreliable product. I don't even like my Zippo lighter because the damn lighter fluid evaporates. It's not reliable. Massive models that cannot run on your computer will not seem reliable because they will be running in another location (usually in another city) and might not be available. And they must be shared with other users due to economics, so your speed and availability won't be reliable either. And if you create a business contract to share that and become an inferencing provider, you'll find yourself using int4 quants under the table, and then also using sketchy dynamic quantization under the table. None of that is a problem in and of itself, but the constant changes make it useless as a tool. It's like if I had a clawtooth hammer and I was using it, and then I put it down and picked it up and it just had three claw teeth now, and then the next time I picked it up it had one clawtooth. I mean, Cursor definitely has had days where they have given me two updates in a single day, and days with one update a day two days in a row. And it feels constant, and the UI changes each time. And they remove models you rely on. I built like fifty custom processes around GPT o3, but then it wasn't available on Cursor anymore and I couldn't get my prompts to work with GPT. I lost all trust in hosted models instantly and forever.
You remember when I mentioned that agents need extra compute to do stuff like edit files, code, or work on media projects? Well, that's true, and it's also true that humans need to validate that stuff. An AI agent can work as fast as you can pay it. If you hire a company to build you software, at the end you get a presentation, you get demos, you get manuals, you get continued support, etc. The AI agents also need to provide white glove service to users, and users need to be able to see and interact with the software. Furthermore, when it comes to stuff like video games, Android apps, interactive media of any kind (including interactive websites), there are limits to what an agent — even with video understanding and advanced telemetry — can validate. Ultimately you need a human in the loop. And the human's experience in that loop is critically important.

By building for Apple Silicon, I provide these massive advantages to my users:

They can install my program on an airgapped Apple device and run it and get all the value.
They can have total confidence in the privacy of their data.
The models can run in excess of 2,500 TPS and at high parallelism, and the code is designed to keep the GPU hot. This thing runs like a high-end video game more than a sluggish chat API. Our engine software design process is not constrained around remote providers or OpenAI-API compatibility. Many of our interfaces don't include a chat at all, or even show any LLM-generated text. Our integration, in the Apple style, is vertical and complete. The UI/UX design is fully integrated with the LLM engine design by the same team.
I built safety-for-children-by-design into the company charter and the product. And Apple makes this doable.
Absolute total sandboxing is trivial: just buy an Apple computer and don't connect it to the Internet. Beat that.
I don't have to support data centers. And you don't either. The model uses 2.2 GB of memory in general. It's about as energy intensive as watching a pre-downloaded Netflix video.

Why Not Cursor

I started using Cursor because my teammates were writing too much code for me to read. I could edit it fine — faster than Cursor. Neovim's LSP-based workflows (and coc.nvim) are both faster to do things like refactor files, split up files, fix imports, and even write code in many cases (due to very fast LSP-based autocomplete). And this isn't unique to Neovim or coc.nvim. I'm sure there are some Refactor users, Emacs users, and general power users of any tool-that-fits-the-job, that can pull a John Henry and beat out an LLM at coding head to head.

But with even just one Cursor user on my team, I would have to spend all my time reading their code — or I would have to say, "No, stop writing more code than I can read!" — or I would have to simply let them commit code I hadn't read or analyzed. Now, the correct answer here depends entirely on the business/organization and your mission. In my case, we were just an early-stage software startup making proofs-of-concept. So anything-goes was the exact right mentality. I took the leap and jumped hard into vibe coding.

Two years later, I was able to write an entire AI chat agentic workflow with durable orchestration and a React UI using Cursor. In four hours. And $300 USD.

First off, let me stop myself here and say this: if you have money and would like me to use Cursor to build you some dope stuff, do call me up. Cursor is great and I can seriously pump out some code. It's a blast, too.

Now back to what I was saying.

$300 for four hours of use is kind of expensive, but it got me everything I needed. So it was definitely worth it. But I wasn't surprised when, a few months later, I heard complaints online about people saying their AI budget was gone for the year and it was the first week of the year. This kind of issue is going to be systemic. AI IS useful. And once someone finds out it's good and figures out how to make it work, they'll instantly use up their budget with any of the providers. So there will be demand for people who want to let their kid play with AI, and get good at it, and compete with it, and learn skills with it, and read a book on how to use it, and use it in a standardized way that can be studied and perfected. And they won't be using a cloud-provider-based tool. Because that tool won't be reliable. Because of physics. They will be using Bronco.

Because Bronco actually works. You can give it a task, in English, that takes years to run, and if you keep your Mac powered up, it's just gonna run. Even if you have to replace your Mac and migrate over to a new one, that task is just gonna keep on running. If you want to understand how it works, you can peer down into the very last byte of a single token. You can understand the pipeline flow and the handling of all the data completely. The manual is designed to be printed, bound, and read. Every single character of text that's used in a tool description, or system instruction, or template, or embedding prompt, or safety guardrail, is visible and controllable by the user. Every generation is saved and recorded and audited. And the auditing and logging behavior are controllable by policy and can be disabled and cleared. It can be installed in an air gap and used without Internet access forever. The model it ships with knows it's stale and offline and doesn't pretend to have up-to-date knowledge of the world, and likewise, the system is designed to force direct use of structured reference data. The system ships with SQL databases, graph databases, vector databases, an execution engine, an agentic Markdown-based knowledge tool (akin to Evernote, Joplin, Obsidian, etc.) all built around a tiny core set of Rust and Python code (maintained professionally) and small enough to run on a 16 GB Apple Silicon laptop.

Why Not Cloud Agents

LLMs got famous because of the chat interfaces where a fake person talks to you and pretends to know everything: ChatGPT. These tools trick people into thinking they are like talking encyclopedias, but with terribly imperfect knowledge. And that's mostly useful as a chatbot or for cheating on exams. And people do a lot of both.

But when I talk about AI, and I tell people I'm not doing chatbots, they sometimes know about agents. Agents have a different purpose, even if they sometimes share the same UI. Computers have incredible software, but it's hard to use. And building interfaces for it is very hard, and it's a culturally and artistically sensitive activity to build a good UI. But a chatbot powered by an LLM can run any computer program for a user trivially. This allows me to use an agentic tool to control my computer.

Now, I'm a computer expert, but I can't type every command, and I don't know the specifics of every pattern or every program available. I used to try to keep a huge knowledge in my memory, but with age and experience and a move to leadership, and with new languages and tools all the time, I will not keep up. I find that even trying to keep up diverts my focus in a way that is often strategically inferior (I am a big fan of Sun Tzu). So I can use my agentic coding tool to extend my computer use ability substantially. It's great.

How it works: the LLM replies to me, and when it does so, it replies with a special message saying "computer, call this computer program for me with these parameters and return the result" and the agentic coding tool does it. This allows the LLM to decide what to do in response to its inputs.

Agentic cloud programs exist, and I wanted to build one using data centers. But, even if data centers were great and very popular, the LLMs running in them need to run tools. And we established that they can't really run on the data center computers that run the inferencing. And if you really try, you might succeed, but you'll be competing for very few slots against competitors who have very deep pockets. So what tools can your agents use? Well, they can use cloud-based tools. So you can have an orchestrator call out to the cloud. But if you don't like the cloud, you could link the LLM back to your workstation. It's not impossible, but it's expensive: you have to send ALL the computer's "thoughts" back and forth over the Internet, as it thinks — and if it wants to see, you gotta stream it images, and if it wants to hear, you gotta stream it audio, and if you want privacy, you can't have that.

So all those patterns are OK in some situations, and I'll possibly support them one day with Bronco, but Bronco's main goal is local and private, sustainable, powerful, reliable, serviceable AI. So the tools are built in, and they actually work. And they work fast. And as soon as they are done working, their result is instantly shipped into your GPU — not to the Internet to wait in line at a cloud provider or inferencing provider. The tools are agentic-first and scoped to a "workbench." A workbench is just a scope for tool state:

Each workbench gets a SQLite database.
Each workbench gets a "library." It's backed by SQLite, but it's got vector DB (with automatic embedding) and full-text search, and metadata search, and a simple graph DB table, and it's got a primarily natural-language-in-and-natural-language-out interface. This allows the agent to remember and recall a variety of types of information with a simple interface.
Each workbench gets a container with a small image running. The container can forward ports to the host using a special tool. This allows the agent to run commands and even spawn a server and allow the host computer to access that server.
Each workbench gets a "notebook." This is a feature set inspired by Joplin, Evernote, Obsidian, and Zettelkasten. Basically you have notebooks that are in a tree, and each one is a Markdown file. And there are tags and todo lists. And it's for taking notes in Markdown — used to organize stuff like projects, todo lists, or any general reference info.
CSV tool. The user sees a spreadsheet editor (some FOSS one) and the agent can operate on the CSVs as well.

I Haven't Even Told You What Bronco Does Yet, Haha

I haven't even told you what Bronco does yet, but before we get to that — the interface, the use cases — let me tell you about the execution engine in Bronco. That's what we started this post with. If you recall, execution engines take events, then do stuff with them, emitting side effects, and they might model sagas, transactions, or any other asynchronous business process.

Execution engines work in a few ways, typically. Some have an internal state and some rules and a ticker: CLIPS is kinda like this. Some use the rules to build a query, and query the events, and then process matching events with the rules. Some build a queue of all events, and process each one against each rule. Sometimes "processing against a rule" is a SQL match. Or a regex match. Or a boolean truth table of some kind. In systems like ECS, or behavior trees, or statecharts, the matching is conditional on the state of the system (different things match at different times).

In our design, we are just gonna SKIP ALL THAT, lol. We are going to use cosine distance, and exact KNN, and HNSW KNN, and embeddings, to determine matches. And we are going to use durable agentic loops — which support subagents, and multi-year-long waits for tool call responses, and year-long "alarm" calls, and which persist EVERYTHING to a transactional table for full resumability and total visibility and auditability into every action the system takes — to handle matching rules. So a rule is basically defined with two things: 1) a payload that gets embedded and compared to an event, and 2) a prompt template + agent that accepts the event and runs to completion. With this simple design, we can do it all.

Here are my initial design notes for the execution engine. I had this idea last night at 3 AM and wrote them in shorthand. This is the LLM reinterpretation, cleaned up for reading:

The semantic rules engine has a set of system-level variables that shape how rules are matched and executed. There's a system template (s), a rule-embedding template (srv), an event-embedding template (see), a nearest-neighbor count (rk, for how many rules to pull per event), a rule-filter template (srf), a rule-agent definition (sra), and structured tool settings (sats). Each individual rule carries its own rule value (r), an activation template (a), and an exception value.

Registering a rule works like this: the system template (s) is filled in with the rule's value (r) and sent to the LLM. The completion becomes the rule's system-rule string (sr). Then the rule-embedding template (srv) is filled in with that sr and the result is embedded — stored in a vector table alongside all other rules (the "SRV table").

When an event arrives, it gets formatted into a string (e). The event-embedding template (see) is filled in with e and embedded. Exact KNN search runs against the SRV table and pulls the rk nearest rules — these become the event-rule candidates (err).

Filtering: each candidate rule is run through the rule-filter template (srf) one at a time, producing a filtering payload (srfp). The LLM must return true or false — this is enforced by custom logic in the logits processor. Rules that return false are dropped.

Activation: for each surviving rule, the activation template (a) is filled in with the event's e value and sent to the LLM. The completion is called the rule-skill (rs). That rs is passed as instructions to a brand new agent session, defined by the system's agent definition (sra). The agent gets tools for spawning subagents and reading summaries of other activated rules, plus whatever tools sats specifies. The session runs up to a system-configured max turns (which individual rules can lower). Agents use side effects to maintain durable state across sessions.

Raw 3 AM notes

semantic rules engine: engine has s (system) value which is variable,
an srv (system-rule-vector) value which is variable,
a see (system-event-embedding) value which is variable,
and a rk (rule-k) value which is variable,
and srf (system-rule-filter),
and a sra (system-rule-agent)
and a sats (system agent tool settings) which is structured data.

each rule has r (rule) value and a (activation) value and exception value.

when rule is activated, the s value is used as a template and accepts
the rules r value. the completion is the rule's sr (system-rule) value.

the srv value is used as a template with the sr and the result is
embedded in a rule srv vector vtable.

when an event occurs, it is formatted as a string and called the
events e value. the see value is used as a template with e and the
result is embedded.

then exact knn search is done and the rk nearest rules are selected
and called err (event-rules).

srf is used as a template with each err one-at-a-time and the result
is called srfp (system-rule-filtering-payload) and it must return
true/false when completed and its used to filter in/out each rule
(custom inferencing here in the logits processor.)

then each selected rule, one at a time, has its a value used as a
template with the event's e value and the completion is called
rs (rule-skill) and it is passed as instructions to a new session
with an agent defined by sra.

the agent gets tools that allow subagent spawning and reading a
summary of the other activated rules. plus any sats selected tools.

the session runs a system-configured max turns (can be set lower
by rule setting.)

agents can use side effects to maintain durable state.

Why the Semantic Execution Engine

When you match an event — like a mouse click, a network packet, a club membership application, or a social media post — to a rule, you can do it with simple expressions. Like "does this have the word 'click' in it." But that only works a little. You can do it with full-text search (like we did at Elastic), and that actually works pretty good.

But with semantic search as the matching engine, we get a simple, unified matching implementation that can take a natural language policy — put it in the embedding template and/or query-time LLM rewrite template — and enforce it easily. And the models can also be swapped or locked. So it can meet any user need, in constant time, given the correct model and policy.

Lastly, I can use the same embedding models and vector store/search code for the "library" and the semantic execution engine. Recall the "library" has on-device graph, vector, full-text, metadata, hybrid search, and ingest/modeling support.

So What Is Bronco?

The interface to Bronco is actually a local website. The homepage is defined by a single Markdown document (passed to an agent), and the result is an interactive website. When you click a link, or submit a form, an agent generates new content on the fly for you. It uses every tool we have discussed so far. A whole orchestration of tasks takes place to render the page — querying your workbench state to see what you've been doing, showing you recent work in your Markdown files, library, or CSV files (future versions of Bronco will add more domains).

As you work on the site, the UI is generated live. It's tested live, and bugs are fixed live (before you even see them). New content is generated, based on tools. It's an offline internet, that generates everything from a 2 GB model. All the cloud services are replaced by tiny native private services, that work even better. And it runs on your laptop.

It's Bronco.

Appendix: Market Research (June 2026)

Hacker News & Developer Discourse on Local AI

The dominant narrative on HN has shifted decisively toward local/on-device AI. Key tension: privacy and data sovereignty vs. cloud convenience.

Defining moments:

"I am completely convinced the future is local LLMs, private, on your device" — One of the highest-engagement local AI threads on HN. Core sentiment: "I really don't want my AI assistant in the cloud, controlled by a corporation, reading and seeing everything I do."
Vitalik's April 2026 post ("My self-sovereign / local / private / secure LLM setup") — Framed privacy, security, and self-sovereignty as non-negotiable. All inference local-first. Sandboxed everything.

What hooks developers (HN crowd):

Subscription fatigue — $200/month OpenAI plans called out repeatedly
Rate limits — cloud providers throttling frequent users
Data sensitivity — proprietary code, personal information demands on-device
Dystopian framing: "Making the effort to build the capacity to run intelligence locally is in part a hedge against the dystopian 'you will own nothing and be happy'."

What gets flagged / downvoted:

Posts framing local AI purely as cost optimization (HN cares about sovereignty, not just savings)
"Running AI on your laptop is like playing Starcraft Remastered on the Xbox" — dissenting voice but surprisingly upvoted
Tooling that requires "an unimaginable amount of knowledge" to set up

Apple Silicon MLX tensions:

Ollama switching to MLX (March 2026) was a watershed: ~57% faster prefill on M5 Max
LM Studio vs. Ollama: LM Studio for non-developers; Ollama for automation/scripting
MLX has ~10% lower memory overhead than GGUF via llama.cpp
Consensus: local models at 20-32B sweet spot now rival cloud for many tasks

Broader narrative: Discourse evolved from "Can local AI compete?" → "How do we make local AI tooling boring enough for everyone?" The anti-cloud sentiment is strongest among: (1) developers with sensitive code/data, (2) privacy advocates, (3) people burned by subscription pricing and rate limits.

Indie Maker / Solopreneur Sentiment on AI Pricing & Subscription Fatigue

AI subscription fatigue is real and worsening. Average power user manages 3–5 AI subscriptions ($70–$100/month): ChatGPT Plus ($20) + Claude Pro ($20) + Gemini AI Pro ($20) + Perplexity Pro ($17) = ~$77/month.

The "Tokenpocalypse" (June 2026) — the single biggest catalyst for local AI demand:

June 1, 2026: GitHub Copilot replaced flat-rate subscriptions with per-token billing
Users reported bills jumping 10x–25x — one user went from $38/month to projected $847/month
PCMag: "GitHub Copilot Costs Skyrocket As Users Are Pushed to Per-Token Billing"
The Register: "Angry devs vow to flee GitHub Copilot as metered billing takes hold"
Reddit r/github (315 upvotes): "Copilot's new credit pricing is a 10-22x cost increase disguised as 'flexibility.'"
Community term "Tokenpocalypse" emerged organically

Price points:

Sweet spot: $20/month for a single professional tool
One-time purchase: $39–$75 (TypingMind $39, 1min.AI lifetime $75) — validated but niche
Key finding: Lifetime deals work best when positioned as "escape subscription hell"

Local AI demand — structural shift underway:

Model API spend doubled to $8.4 billion in 2025
Cloud AI share of compute projected to fall from 85% (2024) → 55% (end of 2026)
Local/edge inference growing from 10% → 35% over same period
Stack Overflow 2025: 87% of developers concerned about AI accuracy, 81% worry about privacy/security

$150 price positioning: Equals 7.5 months of the $20/month sweet spot. Framed as "one year of AI subscriptions, paid once" — a rational trade.

Parent Discourse on Kids and AI Safety

Top parent concerns about kids and AI:

Unmoderated content — suicide coaching cases (Character.AI, 2024), chatbots grooming minors
Data privacy — "Is my kid's chat history being used to train the model?"
Cognitive offloading — "Will my kid stop learning how to think?"
Hallucinations — untrustworthy factual claims presented as true
Lack of parental controls — no time limits, no content boundaries, no visibility into usage

What parents want:

Offline/sandboxed AI that doesn't send data to the cloud
Educational uses (coding, research, creative writing) — not a cheating tool
Time limits and content guardrails
Parental visibility dashboard

Bronco positioning vs parent concerns:

Zero internet = no data leaves the device — privacy guaranteed by physics
No telemetry, no accounts — nothing to sell
Bronco's model knows it's stale and doesn't pretend — direct use of structured reference data
Alarm/sleep tool enables time limits (planned)
Sandboxed tools via containers (gvisor)

Privacy Community Discourse on Local AI

Core motivations (r/privacy, r/selfhosted, Privacy Guides):

Trustlessness: "The only way to achieve actual trustlessness is to run the model locally, air-gapped if possible, where you control the weights, execution environment, and network exposure." (426 upvotes on r/privacy)
"Promise" vs. "Physics": Cloud AI promising privacy vs. local AI guaranteeing it. "This shifts privacy from policy to physics."
Data sovereignty: Users handling medical, legal, financial data value certainty that "there is no server to send data to."

Apple Silicon as a privacy platform:

Unified memory architecture praised for efficient local inference
Apple's privacy brand generally trusted more than Google/Microsoft/OpenAI
But skepticism exists: "Just because processing happens locally doesn't mean valuable metadata isn't sent to the mothership."

Air-gapped AI demand:

Essential for: classified government data, healthcare PHI, financial algorithms, legal privilege, proprietary R&D
Users need version-locked, predictable, auditable models

Major deal-breakers for privacy users:

Telemetry / phone-home behavior (even in "local" tools)
Requiring accounts or logins
Open network endpoints
Cloud dependencies disguised as "local"
Subscription models — "I'm sick of 'renting' my life" is dominant sentiment
Vulnerability-ridden containers

One-time purchase demand — validated:

Private LLM (Mac App Store, single purchase) got 107 upvotes + 145 comments on r/macapps
Indie Hackers: "A single purchase aligns with ownership mindset more than a recurring commitment."
The $150 price point aligns with what privacy-focused users have shown willingness to pay for privacy + polish

Appendix: Target Personas

Persona 1: The Engineer-Gone-Dad

Name: Marcus Chen
Age: 39
Family: Married, two kids (7 and 10)
Occupation: Staff Engineer / ex-Tech Lead at a mid-stage startup. Built distributed systems for 15 years. Just left his job.
Income: $220K (now: savings + consulting)
Device: M4 Max MacBook Pro, 64 GB
Where he gets information: Hacker News (daily), r/LocalLLaMA, Simon Willison's blog, the llama.cpp GitHub discussions, Michael Tsai's blog, LWN.net
What he cares about: He's burnt out on corporate SaaS, rate limits, and per-token billing. He's built execution engines before (C2, rule engines, event-driven systems) and wants a tool that reflects his philosophy: durable, auditable, air-gappable, no magic. He has kids now and doesn't want them growing up thinking ChatGPT is the only way to interact with AI.
What he's skeptical of: "Local AI isn't ready for real work." He runs Qwen locally and knows the gap between local models and Claude/GPT-4. He'll dismiss you fast if you overclaim benchmarks. He's also skeptical of anything that "phones home" or requires an account.
Hook in the manifesto: The execution engine architecture (4NF, append-only, state-from-row-existence). The "Zippo lighter" reliability argument. The C2/UEBA background establishing credibility.
Turn-off in the manifesto: The "dope cool dude" line. Any whiff of marketing-speak or hand-waving about performance.

Persona 2: The Indie Maker on Fire

Name: Priya Narayan
Age: 31
Family: Single, lives alone in a studio apartment
Occupation: Solo founder. Launched 3 SaaS products, 1 profitable ($5K MRR). Writes code 8 hours a day.
Income: $60K (variable)
Device: M3 MacBook Air, 24 GB — her only computer. Bought refurbished.
Where she gets information: Indie Hackers (daily), r/SaaS, r/github (especially since the Copilot Tokenpocalypse), X/Twitter (indie dev circles), Hacker News (weekly), some Substack newsletters (Lenny's, The Pragmatic Engineer)
What she cares about: Every dollar matters. She just watched her Copilot bill jump from $38/month to a projected $847/month under the new token pricing (June 2026). She's actively looking for alternatives. She already canceled 2 of her 4 AI subscriptions this month. She'd love to stop renting her tools.
What she's skeptical of: "One-time purchase means abandonware." She's bought lifetime deals that went nowhere. She needs proof that the product ships and works NOW, not a promise. Also skeptical of any tool that can't integrate with her workflow (git, CLI, her existing stack).
Hook in the manifesto: The $150 one-time price vs. subscription fatigue. The "it just runs, even across Mac upgrades" durability pitch. The CSV/spreadsheet and SQLite workbench tools (she manages her SaaS metrics in CSV).
Turn-off in the manifesto: The deep technical architecture sections (4NF, logits processors, KNN) — she'll skim past them. She needs the "what it does for me" to come earlier and clearer.

Persona 3: The Privacy Absolutist

Name: Lena Voss
Age: 44
Family: Married, no kids, two dogs
Occupation: Security researcher / consultant. Does red-team work as a contractor.
Income: $180K
Device: M2 Ultra Mac Studio, 128 GB. Also runs a homelab (Proxmox, OPNSense, a NAS she built herself). No smart speakers, no Alexa, no Google Home. Uses Signal exclusively. Runs GrapheneOS on her phone.
Where she gets information: r/privacy, r/selfhosted, Privacy Guides forum, Schneier on Security, Krebs on Security, DEF CON talks, CCC talks, Michael Tsai's blog, infosec.exchange on Mastodon
What she cares about: "Privacy is physics, not policy." She'll never run an AI tool that phones home. She air-gaps machines for sensitive work. She wants complete visibility into what the model is doing — every token, every tool call, every side effect. The only acceptable data pipeline is "the data stays on my machine and I can prove it."
What she's skeptical of: "Local" tools that still phone home (telemetry, update checks, "anonymous" crash reports). Ollama's telemetry controversy made her switch backends twice. She assumes every new tool is lying about being offline until she wiresharks it.
Hook in the manifesto: The "every single byte sent to the LLM and generated by the LLM is stored in a local private database." The air-gap section. Zero internet, zero telemetry, zero tracking. Process isolation for RBAC.
Turn-off in the manifesto: The Cursor/vibe coding story — she reads it as "surveillance capitalism by another name." The "dope cool dude" tone. Anything that feels like a startup pitch rather than a philosophy.

Persona 4: The Apple-Family Parent

Name: Sarah Okonkwo
Age: 37
Family: Married, three kids (5, 9, 12). Entire household is Apple — Macs, iPads, iPhones, Apple TVs.
Occupation: Product designer at a fintech company (remote).
Income: $165K (household: $280K)
Device: M4 iPad Pro + M3 MacBook Air, 16 GB. Considering a family iMac.
Where she gets information: Wirecutter, Common Sense Media, r/parenting, r/daddit (lurks), some Apple-focused YouTube (MKBHD, iJustine), The Verge, school parent groups on WhatsApp, the local library's tech literacy program
What she cares about: Her 12-year-old is asking for ChatGPT. She's terrified of him talking to an unmoderated chatbot that might hallucinate dangerous advice, surface adult content, or sell his data. She'd love a tool he can use to learn coding, do research projects, or explore creative writing — safely, without the internet. She also wants something she can use herself without worrying about her work NDA.
What she's skeptical of: "AI for kids" products that are just wrappers around ChatGPT with a cartoon character on top. She knows most "safe AI" tools are theater. She's also price-sensitive — but a one-time $150 family purchase feels reasonable compared to 4 × $20/month ChatGPT subscriptions.
Hook in the manifesto: Safety-for-children-by-design. Air-gap capability ("just don't connect it to the internet"). Apple ecosystem credibility. The Notebook/Library features as an educational tool. The printed, bound manual.
Turn-off in the manifesto: The C2/execution-engine backstory (irrelevant to her). The 3AM raw notes. Too much technical jargon without a "what this means for your family" translation.

Persona 5: The AI Skeptic / Alignment Nerd

Name: Dr. Jonathan Adjei
Age: 29
Family: Single, grad-school roommates
Occupation: PhD candidate in CS (ML safety / interpretability). Also writes a Substack about AI governance with 3K subscribers.
Income: $42K (stipend)
Device: M1 MacBook Pro, 16 GB (university-issued). Can't afford an upgrade.
Where he gets information: LessWrong, AI Alignment Forum, arXiv (cs.CL, cs.AI), Zvi Mowshowitz's blog, Dwarkesh Patel's podcast, r/MachineLearning, follows a tight circle of AI researchers on X/Twitter and Bluesky
What he cares about: The concentration of AI capability in 3–5 corporations terrifies him. He believes local, open, auditable AI is a structural check on that concentration. He's less interested in "can local AI code" and more in "can we prove what this model knows and doesn't know." He wants to see the prompts, the templates, the safety guardrails — all of them.
What he's skeptical of: Local-first products that are just wrapping OpenAI APIs under the hood. "Safety" claims that amount to "we put a system prompt saying be nice." Any product that doesn't let him inspect or modify the model's behavior. Also skeptical of small models — he worries about capability overhang and hidden failure modes.
Hook in the manifesto: "Every single character of text that's used in a tool description, or system instruction, or template, or embedding prompt, or safety guardrail, is visible and controllable by the user." The "knows it's stale and doesn't pretend" design philosophy. The semantic execution engine as an auditable, transparent rules system.
Turn-off in the manifesto: "Haha" and "lol" tone in technical sections. The "dope cool dude" line. Any hint that the author hasn't thought seriously about AI safety.

Persona-Content Fit Matrix

Section of Manifesto	Marcus (Eng-Dad)	Priya (Maker)	Lena (Privacy)	Sarah (Parent)	Jonathan (Skeptic)
Execution engine backstory	★★★	★	★★	✗	★★
Why Apple Silicon	★★★	★★	★★★	★★	★★
Why Not Cursor	★★	★★★	★	★	★
Why Not Cloud Agents	★★	★★	★★★	★★	★★★
Semantic execution engine design	★★★	★	★★★	✗	★★★
Workbench & tools (expanded)	★★★	★★★	★★	★★★	★
Safety for children (new)	★★★	★	★★	★★★	★★
Business model (new)	★★	★★★	★★★	★★★	★★

The insight: no single section works for all five. But the manifesto should feel like every persona finds their section and skims the rest. The current draft over-indexes on Marcus (the most like the author). The expansions I proposed would catch Priya (workbench/tools/business model), Lena (architecture/safety), Sarah (safety/what Bronco does), and Jonathan (transparency/model philosophy).