A few years ago I asked the devs to run the benchmark on how much better (if at all) outbox is over using large relays. No one took up this question, so I started running benchmarks. A few learnings for me: 1. Outbox is proven better than using large relays 2. Three relays is the optimum for each npub 3. Three year old notes have an incredibly low survival rate, and are barely findable by the best outbox implementations (~~20%) 4. There are half a dozen algos not yet implemented by any nostr app to my knowledge that are superior to current algos for finding year old notes 5. No nostr app has implemented nip-66 relay liveness checks in outbox implementations to my knowledge 6. It could be measurement or methodology error - the relay/note indexer performs worse than outbox and large relays 7. There are poor performing outbox implementations Cc @balas @KernelKind @fiatjaf

Replies (12)

Here is the latest synthesis: https://github.com/nostrability/outbox/pull/6/ Running some more tests tomorrow. In the nostrability/outbox repo you can find the algos I benchmarked, and what test were run if you’d lung to recreate. Tomorrow I’m running nip-66 benchmarks across profiles with various follow counts to better understand how much better is nip-66 x outbox for note findability
At the moment with the current state of nostr, just for outbox I agree that there are diminishing returns after about 20 or so relays, but in addition to the outbox relays, the app still needs to open more connections to find replies. Turns out chrome is fine, it was Brave Shields preventing more connections.
I read your report on Nostrability, but I don't understand how you're doing these benchmarks. Are you firing up the apps and measuring the notes they display? Can't be that, right? And what are those algorithms you listened. I tried looking them up but didn't get a clear view of how exactly they would work with outbox.
Thanks for taking interest. I will add methodology overview to the readme. To your questions: 1. I extracted the algorithms from nostr outbox implementations to a typescript library. 2. In the TS library we have nostr algos from the real world (e.g. from amethyst, nostrudel, nostur etc) 3. In the TS library we also have the general CS algorithms not yet found in nostr apps 4. The computer science problem of “outbox” maps to “weighted set cover problem” AKA “maximum coverage problem”. 5. The “winning” algo for long term event retrieval MAB is used in real world applications in online ads, a/b testing https://web.stanford.edu/class/cme241/lecture_slides/MultiArmedBandits.pdf https://web.mit.edu/6.7950/www/lectures/L15-2022fa.pdf 6. I answer two questions to the above: Given a npub (e.g. fiatjaf who has 400/whatever follows) run a ~~dozen or so algos A) “assignment coverage” phase 1(academic/no network ) if every relay is online, and no events are deleted, how many of your follows are reached by this algo? B) “event recall” phase 2 (real world like/pull events from relays) given realistic conditions of relays being down, events being pruned/deleted, relays gating event retrieval (via auth or otherwise) what are the numbers? C) given phase 2 above, what happens if we triage relays first via nip-66 liveness monitor events, relays, services like nostr.watch by @sandwich 7. I have not gone into a dozen apps to measure in app the outbox results. All of this is done in deno/typescript 8. The real world results are vastly different than the academic ones. I’m measuring more details on nip-66 - the hypotheses are 1) improved coverage, and 2) drastically reduced number of calls to relays to find the info being sought
I made the report less academic, and more actionable. Specific to your question on how to implement the cs algo - mab - see https://github.com/nostrability/outbox/blob/main/IMPLEMENTATION-GUIDE.md#7-learn-from-what-actually-works And https://github.com/nostrability/outbox/blob/main/IMPLEMENTATION-GUIDE.md#improvement-opportunities MAB complements nip-66 well - nip-66 checks liveness before asking relays, whereas MAB tracks relay health *after* asking the relay, and scores it. Nosotros @Cesar Dias looks like has a “seen” table (event created at x seen on relay y), and it is not providing feedback to relay selection for future lookups.
IMHO regarding coverage: clients don't need to find a minimum set with maximum coverage. Clients can just connect to 500 relays to follow 500 people. It still works. But of course it is more efficient to do otherwise, and especially if some relays are down, to find the set that is up and covers everyone. I think (as you describe in (8) that real world considerations probably far outweigh theoretical algorithm choices. But that doesn't mean we shouldn't bother to get the algorithms right.