Thread - Nostr Hypermedia

Nostr relays see everything - who connects, what they fetch, how often they post. Zero-knowledge cryptography can fix all three problems: Semaphore-based authentication hides which whitelisted user is connecting, private information retrieval hides which notes you're fetching, and Privacy Pass enables rate limiting without identity linkage.

Every Nostr relay operates as a surveillance point. Not by design, not maliciously - it's architectural. The relay sees your IP address when you connect. It sees which pubkeys you subscribe to. It sees every note you request. It knows your posting frequency, your online hours, your social graph.

NIP-42 authentication makes this worse. Paid relays and private communities need access control, so you sign a challenge proving your identity. Now the relay has cryptographic proof linking your IP to your npub.

This is the metadata problem. Your content is signed and optionally encrypted, but your behavior is naked. A relay operator - or anyone compelling them - can reconstruct your communication patterns without reading a single message.

Zero-knowledge proofs offer a path out. Not by hiding that you're using the relay, but by hiding which authorized user you are.

The Authentication Problem

A relay maintains a whitelist: these 500 npubs are allowed to connect. Current NIP-42 requires you to prove you're npub X. The relay learns exactly who you are.

What you actually need to prove is weaker: "I'm someone on your whitelist." Not which someone. ZK makes this possible.

Semaphore (developed by PSE/Ethereum Foundation, battle-tested by Worldcoin) implements exactly this pattern:

The relay publishes a Merkle tree root of authorized pubkeys
You hold a Merkle proof showing your pubkey is a leaf in the tree
You generate a ZK proof: "I know a secret key sk, and the corresponding pubkey is in this Merkle tree"
The relay verifies the proof, grants access, learns nothing about which member you are

The proof reveals zero information about your position in the tree or your actual pubkey. From the relay's perspective, you could be any of the 500 authorized users.

Performance is practical today: Semaphore proofs generate in ~3 seconds on mobile hardware, verify in ~10 milliseconds. The relay's Merkle tree can hold millions of members without degrading verification time. Client-side libraries exist for browser and native mobile.

Rate limiting within anonymity requires one addition: nullifiers. When generating a proof, you also output a deterministic value derived from your secret key and the current epoch (hour/day/week). If you authenticate twice in the same epoch, you produce the same nullifier. The relay can detect and reject duplicates without learning your identity.

A malicious user can't spam because their nullifier repeats. An honest user can't be tracked because nullifiers change each epoch and reveal nothing about the underlying identity.

Implementation sketch:

New message type: ["AUTH_ZK", <proof>, <nullifier>, <epoch>]

Relay logic:
1. Verify proof against current Merkle root
2. Check nullifier not seen this epoch
3. Store nullifier, grant session
4. Session has no identity attached - just "verified member"

This requires a new NIP, client library changes, and relay-side verification (~100 lines of code using existing snarkjs/Semaphore packages). The infrastructure exists; adoption is the barrier.

The Retrieval Problem

Authentication hides who you are. But once connected, you send REQ messages specifying exactly which notes you want: these pubkeys, these event kinds, these tags. The relay learns your interests, your contacts, your subscriptions.

Private Information Retrieval (PIR) lets you query a database without revealing what you're querying. The relay responds with your requested data without learning which data you requested.

The naive approach - download everything, filter locally - doesn't scale. A relay with millions of notes can't stream its entire database to every client.

Modern PIR achieves sublinear communication:

SimplePIR (USENIX Security 2023) achieves 10 GB/s throughput with single-server security. The client encodes their query as a vector, the server performs matrix multiplication, the result decodes to the requested record. The server sees a query vector but can't determine which index it encodes.

FrodoPIR (PoPETS 2023) optimizes for the messaging use case: <1 second queries, ~$1 per 100,000 queries at scale. Communication overhead is ~10-100x the actual data size - significant but potentially acceptable for high-value privacy.

The hybrid approach combines PIR with ZK:

Relay maintains an encrypted message store indexed by recipient
Client generates PIR query for their mailbox index
ZK proof accompanies query proving:
- Client knows the secret key for the queried mailbox
- Client hasn't exceeded rate limits (via nullifier)
Relay executes PIR query, returns encrypted response
Relay learns: a valid user queried something. Not who, not what.

Practical constraints: PIR has real costs. Server computation scales with database size. Bandwidth overhead is substantial. This isn't viable for "firehose" subscriptions - you're not going to PIR-query every note from 1000 pubkeys.

But for specific high-value queries - fetching your DMs, checking notifications, retrieving specific threads - PIR provides metadata protection impossible through other means.

Express (MIT, 2021) demonstrated practical metadata-hiding communication: two-server deployment, 20ms client computation, 5KB communication per message, ~$1/month operating cost. The architecture required two non-colluding servers, which maps imperfectly to Nostr's relay model but suggests the overhead is manageable.

The open question: can PIR be adapted to Nostr's subscription model, or does metadata privacy require a fundamentally different query pattern?

The Rate Limiting Problem

Relays need spam protection. The obvious solution - rate limit by IP or pubkey - destroys privacy. Your posting pattern becomes a fingerprint.

Privacy Pass (IETF standardization in progress) decouples rate limiting from identity:

Client contacts an issuer, proves they're a legitimate user (CAPTCHA, payment, reputation)
Issuer blind-signs tokens - client gets valid tokens without issuer learning which tokens
Client redeems tokens to relay - one token per action
Relay verifies token validity, enforces one-use, learns nothing about client identity

The relay knows: this action was authorized by someone who passed the issuer's checks. It doesn't know which user, can't link multiple redemptions to the same user, can't build behavioral profiles.

Blind signatures are the key primitive. The issuer signs tokens without seeing them. The client unblinds the signature. The relay verifies without being able to correlate to issuance. RSA blind signatures achieve this in ~0.5KB per token.

Integration with Nostr:

New message types:
["TOKEN_REQUEST", <blinded_token>] -> ["TOKEN_RESPONSE", <blind_signature>]
["EVENT", <event>, <token>, <signature>]

Flow:
1. Client obtains token batch (could be from relay itself, or third-party issuer)
2. Each EVENT submission includes token redemption
3. Relay verifies token, publishes event, discards token
4. No identity linkage between events

Anonymous Rate-Limited Credentials (ARC) extend Privacy Pass with per-origin limits. A user gets N unlinkable tokens per time period. They can spend them across multiple relays without any relay learning their total activity. Bandwidth scales sublinearly with token count.

The issuer could be the relay itself (simplest), a federation of relays (spreading trust), or an independent service (separating authentication from relay operation). Each model has different trust assumptions but all break the identity-to-behavior link.

Putting It Together

A privacy-preserving relay connection looks like:

Connect over Tor or VPN (IP privacy - out of scope but essential)
Authenticate with Semaphore ZK proof (membership without identity)
Query high-value data via PIR (retrieval without revelation)
Subscribe to public feeds normally (some metadata leakage acceptable)
Post with Privacy Pass tokens (rate limiting without tracking)

Not every connection needs full privacy. Public note browsing has lower stakes than DM retrieval. The architecture should support gradations.

What's deployable today:

Semaphore authentication: production libraries exist, ~3s mobile proving
Privacy Pass rate limiting: IETF-standardized, existing implementations

What needs work:

PIR integration: research-stage for Nostr's query model
Relay coordination: new NIPs, adoption incentives, client support

What's missing:

Relay incentive alignment: privacy features cost compute, why would relays adopt?
User experience: additional latency, larger bandwidth, battery impact on mobile
Ecosystem coordination: value depends on critical mass of supporting relays

The Metadata Reality

Zero-knowledge authentication doesn't make you invisible. The relay knows someone is connected. Traffic analysis can still correlate timing patterns. Global adversaries watching multiple relays can potentially deanonymize through intersection attacks.

But the threat model improves dramatically. The relay operator can't identify you. A subpoena for "all activity from npub X" returns nothing - the relay genuinely doesn't know. Your posting pattern isn't stored because there's no identity to attach it to.

This is defense in depth. Tor hides your IP. ZK auth hides your identity. PIR hides your interests. Privacy Pass hides your behavior. No single layer is perfect; together they raise the cost of surveillance from trivial to substantial.

Nostr's architecture makes this possible - the protocol is simple enough that privacy extensions don't require core changes. The question is whether the ecosystem will build them.

The cryptography works. The implementations exist. What's missing is the coordination to deploy.

Private Relay Connections: Zero-Knowledge Solutions for Nostr

The Authentication Problem

The Retrieval Problem

The Rate Limiting Problem

Putting It Together

The Metadata Reality