Not All Data Is Created Equal

Or: How I Kept My AI from Eating My Lunch

Mar 16, 2026

The DoD/DoW (Department of Defense, now operating as Department of War) has been classifying data longer than most of us have been alive. Unclassified. Confidential. Secret. Top Secret. The principle is simple: access level matches sensitivity, and you don’t get to touch what you don’t have clearance for. That clearance isn’t a reward. It’s a risk management decision.

AI builders are about to learn this lesson the hard way.

Most people set up an AI workflow the same way. They hand the AI everything (notes, documents, reference material) and then wonder why it occasionally writes back into something it wasn’t supposed to touch. The blast radius on those decisions is invisible until it isn’t.

There’s a better way to think about this, and it comes from a CISSP (Certified Information Systems Security Professional) exam question most people treat as a throwaway: what’s the best way to secure customer data? The answer isn’t encryption. It isn’t zero-trust. The answer is: don’t collect it in the first place. Risk you don’t introduce doesn’t need to be mitigated, monitored, or reported on.

The same logic applies to AI memory. The data the AI can’t reach can’t be corrupted, and files it can’t write to stay safe by definition. What never touches an agent surface needs no recovery plan. It just sits there, intact, being correct.

The same principle shows up in basic SDLC (Software Development Life Cycle) discipline. You never connect development resources to live production data. Copies for testing, maybe. Live data? Never. The AI is a development resource. Treating your canon database (master reference data) like production data it can freely write to isn’t a workflow choice. It’s a category error.

Here’s what skipping that question looks like in practice.

In 2025, a developer gave Claude Code full access to AWS (Amazon Web Services) infrastructure to manage a Terraform migration. The agent ran terraform destroy and wiped 2.5 years of production data in minutes, automated snapshots included. It executed the instructions correctly. Whether it hallucinated or followed orders perfectly, the data was gone either way. Nobody had asked whether it needed write access to production at all. In my opinion, it didn’t.

Before the classification model, there’s a more basic layer: the databases aren’t internet-exposed. No public endpoint. No external API. Nothing the outside world can reach. That alone reduces the attack surface to zero from the outside. SQLite, running locally. Not a managed Postgres instance on someone else’s infrastructure. When the database lives on my machine, I’m not delegating my security posture to a vendor’s breach response timeline and a “we take your data seriously” email. Everything after this is about controlling what happens inside that boundary.

So here’s how I structure it, using the classification model that’s been working since the Cold War:

Unclassified is the internet. Substack articles, LinkedIn posts, Facebook, public Notes. The stuff anyone can read. This isn’t input to the AI workflow. It’s the output of it. The published artifact is what the classified work produces. The classification model runs in reverse from what most people assume: you don’t start with public data and lock it down, you start with protected data and deliberately choose what gets released.

Controlled Unclassified is the AI workspace. Full CRUD (Create, Read, Update, Delete). The AI owns this space completely. Session notes, constraints, preference logs, draft material. I manage the size. If it gets weird, I prune it or delete it entirely and start over. Low blast radius, easy recovery.

Classified is the canon database (master reference data). Read-only to the agent. This is ground truth. The AI consults it constantly but cannot write to it. Changes only happen when I make them deliberately. There is no automated path from Controlled Unclassified to Classified. I am that path.

Above Classified is no access. Credentials, sensitive data, anything where even read access is a risk surface. The AI doesn’t know it exists.

Inside that boundary, the AI is the only query path to Classified data, and I’m the only write path. That’s defense in depth: independent controls covering independent failure modes. Boring to attack.

It’s not just architecture on paper. I use an MCP (Model Context Protocol) server to expose these databases to the AI, and the Python is written to match the classification. Canon gets a read-only connection, the workspace gets read-write. If a database isn’t wired into the server, the AI has no path to it. The AI cannot write to canon, not because of policy or goodwill. Because the connection object won’t allow it.

The master copies of the Classified data are only touched by specific, purpose-built scripts that export to JSON for database integration. Ad-hoc access isn’t a thing. The script is the gate.

Paranoid? Maybe. But it’s my work and my IP (Intellectual Property). If I’m not responsible with it, I can’t expect anyone else to be.

The DoD built this model because they are absolutely paranoid. That paranoia is earned. When you are the single biggest attack target on the planet, you have to be. They built it because when something bad happens, the damage can be “exceptionally grave” and unrecoverable. Classified information that leaks doesn’t un-leak.

Most people building AI workflows right now are operating without tiers. Everything in one bucket, AI touching all of it without any explicit decision about what it should and shouldn’t reach. That works fine, right up until the AI decides to helpfully “update” your master reference document based on something it inferred from three sessions ago. Now your ground truth is whatever the model believed on a Tuesday. Good luck with that.

Decide before you build. The security posture follows from the classification. If you don't know what tier your data lives in before the AI touches it, well... as Neil Peart wrote in Freewill: "If you choose not to decide, you still have made a choice." I will choose data security.

You may also like:

E L Frederick

Discussion about this post

Ready for more?