Metadata is the Message

Jun 5

For thirty years the security field has poured much of its effort into one place: the contents of the message. We encrypted the web, the messengers, the disks. End-to-end became the default, and somewhere along the way, we started to believe that if the words were unreadable, the conversation was private.

It was never only the words. Everything wrapped around them — who spoke to whom, when, how often, for how long, from where, in messages of what size — reveals a portion of the meaning. You can encrypt every syllable you send your doctor and still hand over the diagnosis, because a call to an oncology practice at 4pm, three calls to family that evening, and a search session that runs past midnight tell the whole story without a word being read. We encrypted the letter and kept mailing it with the address on the envelope.

Metadata sounds harmless: data about data, bookkeeping, exhaust. But intent lives in the pattern of our behavior, and the pattern is exactly what metadata records. Timing, frequency, size, duration, and the bare fact that two parties are talking at all is part of the signal, and the message can be the least interesting thing in the exchange.

In the late 70's, law enforcement started analyzing phone records (probably well before then). They weren't listening to the calls — that takes a warrant — they were just taking the call records and seeing what numbers called each other and when. And just like that, entire criminal organizations were laid bare. It wasn't long before the same data was used to out informants, or trace spies. Since then, metadata analysis has been a tool for the good guys and the bad.

What changed is the scale. Watching one person used to cost something: a tap, a warrant, a team. Now the watching is cheap, and most of us pay for it ourselves. An industry exists to assemble these patterns and sell them: ad networks, data brokers, the carriers that move our traffic, the platforms most of the world routes its life through. Storage got cheap and correlation got cheap, and the default flipped from "collect what you need" to "collect everything and decide later."

Start with the use case that funds the whole apparatus: advertising. Some of it really is built on what you say. Your searches, your posts, and the things you type into a box are harvested and sold, and they feed the profile directly. But that is only half the story. The rest is built on what you do: the apps you open, the places you pause, who you contact and how often. Content tells an advertiser what you said you wanted; the patterns tell them what you are about to do, and they surface what you never typed: that you are job hunting, expecting a child, managing an illness, falling behind on a bill. Both are for sale, and together they describe you more honestly than you could describe yourself.

The same profile that sells you a mattress can be used to take your identity. Social engineering runs on it. An attacker who knows your communication graph, who you trust, when you call your bank, the routine you keep, doesn't need to guess; the pretext writes itself. The phishing email that knows too much, the call that sounds legitimate, the impersonation that lands because it arrives exactly when you expected one: all of it is assembled from patterns bought in the open. The advertiser and the con artist shop in the same data broker market.

Underneath both is the quieter cost: the steady loss of privacy as the ability to control what's known about you. Privacy was never secrecy. It is the right to decide what gets inferred from your life, and by whom. Ubiquitous metadata collection takes that decision away. You are profiled, scored, and priced by insurers, lenders, and the algorithm behind a job screening, on the basis of patterns you didn't know you were broadcasting and can't see or correct. Association becomes evidence. Behavior becomes intent.

The same logic scales from people to companies, where the pattern reveals the strategy. A burst of encrypted traffic between two firms, their counsel, and an investment bank announces a deal before any press release. The roster of who a company talks to, and how the volume shifts, maps its suppliers and partners into the dependency graph that a competitor would pay for. That same map names the targets: which employees hold the sensitive relationships, and when a team goes heads-down on something new. Industrial espionage no longer opens with a wide net; it opens with the few nodes the graph says matter. A company can encrypt every document it owns and still broadcast its next move in the shape of its traffic.

The message does come back into the picture, because the contents are still the prize, the thing an adversary ultimately wants to read. Harvest-now, decrypt-later is the patient route to them: store the ciphertext today, read it once the cryptography ages out or a capable enough machine arrives. But nobody can keep every encrypted stream. Storage is finite even for the largest collectors, so the ciphertext gets triaged, and the triage runs on metadata: the patterns decide which sealed streams are worth the shelf space. The same logic governs the fight over backdoors. A backdoor can't be put into every wall, and the target wall is chosen from metadata. We argue over the lock on the message while the thing that decides which message to unlock flows freely, unencrypted, and largely unregulated.

None of this is news to the people who build private systems, and it would be unfair to say the field has ignored it. There are real efforts aimed at the traffic itself: protocols and browser extensions that pad messages to a uniform size, send at a steady cadence, whether or not there is anything to say, and manufacture decoy traffic to bury the real exchange in noise. Tor and the protocols that followed it route each message through paths no single observer sees end to end, so no relay knows both the sender and the receiver. These raise the cost of harvesting considerably.

But raising costs does not solve the problem, and the gap is where the patient collector lives. An adversary watching enough of the network at once doesn't have to break any hop; it can match the traffic entering one end against the traffic leaving the other and recover the path from timing alone. And the defenses that do work demand a discipline ordinary people can't sustain: one logged-in account, one careless app, one convenient shortcut, and the cover is gone. Protection that depends on never making a mistake is not protection most people can use.

The problem has several layers, ranging from simple awareness to solutions that require sophistication and absolute discipline. And even those strong defenses still bend under an observer large and patient enough to wait. The interesting problem I find worth chasing is a harder one: making the timing, the size, the frequency, and the very existence of a communication indistinguishable from the noise around it, by default, without anyone having to keep perfect tradecraft to stay safe. A conversation that can't be seen can't be used against you.

I keep coming back to one conviction: technology earns its place only when it can be trusted, and trust we can't verify is just hope with better marketing. We spent a generation hiding what people say. The new challenge is to hide that they said anything at all — and make it work for the people who will never think about it.

Tim Sewell