Replies (34)
if you haven't memorised your frens' npubs you are not nostr-ing hard enough
Or by the last four or five digits of their npub. Or writing style. I can usually suss out who's responding to me before the profile pictures load. π€£
I'm happy to see it's just not my Amethyst that sucks lately. Plz unfuck this nostr:nprofile1qqsyvrp9u6p0mfur9dfdru3d853tx9mdjuhkphxuxgfwmryja7zsvhqpzamhxue69uhhv6t5daezumn0wd68yvfwvdhk6tcpz9mhxue69uhkummnw3ezuamfdejj7qgswaehxw309ahx7um5wghx6mmd9u2mk7fe. π«π
I saw that haha
Imma just leave this here π€
nostr:nevent1qqs9qexaxfpardd3sh5lxf7hxuat2zvsln8h0x376995m8fhhxakrygprdmhxue69uhhyetvv9ujumn0wd68yurvv438xtnrdakj7q3q7plqkxhsv66g8quxxc9p5t9mxazzn20m426exqnl8lxnh5a4cdnsxpqqqqqqzzexa2m
Amber caches our data, but maybe we should be caching kind0 profiles as well? Does Amethyst always refetch upon starting? Makes sense for kind1's but profiles change way less frequently
accurate.
It was Ralf who said that π
i miss ralf :(
Kind 0 should be cached for easy and fast lookup. I'm starting to think I want nostrdb for Amethyst. The only complaint I have for nostrdb for #Notedeck is that so many profiles are old and show old names or old profiles. It's not refreshing and checking for changes.
me too man me too π
Plz unfuck this nostr:nprofile1qyfhwue69uhhyetvv9uju6nzx56jucm0d5qs6amnwvaz7tmwdaejumr0dsqzqvhpsfmr23gwhv795lgjc8uw0v44z3pe4sg2vlh08k0an3wx3cj96l2ln2
That's one of the main problems why I didn't focused on a DB yet. We need to figure out how to delete things from the DB and rotate information very quickly. Otherwise, a simple amethyst session will get to 10 GBs of disk usage quite fast..
You can just click a profile to refresh it
I appreciate the amount of unfuckery requests in this thread.
I support all unfeckeries when it comes nostr.
βοΈβοΈ
Just throwing something out there, maybe there are multiple valid ways to cache events. I think kind 1's have a lower priority to be cached, as they have a pretty short half life
but kind 0 and other events could have different options like refreshing on a timed schedule, or even request at viewing. Like you keep the same kind0 until you want to view the user's profile, so you'll fetch again for the profile view.
Yeah, that is definitely true. I am currently reorganizing our cache for the 100,000 WoT score events the app needs to download to make each lifecycle kind-based if root and attached to other events if not a root kind, like comments, reactions, etc. so, when the cache removes the root event, all the others get deleted, including the WOT events in memory. Then it's all about getting the same thing in disk
Also, I need to find a way to do string interning in disk. Because copying 64-byte IDs and pubkeys everywhere is not a good idea.
*chuckles in ORLY* i just fixed the ORLY build and it has all the things and builds on android. enforces expiry, honors deletes, even has a nice handy wipe function, import and export. being native ARM code also the signatures and codecs are nice and fast
Wen String interning? :) Downloading 100,000 WoT events per user means most of the DB is just duplicated event ids and pubkeys.
We also need to figure out how what to do with large encrypted payloads (DMs, drafts, etc). Maybe it saves encrypted blobs on a different file. Idk...
I also need one shared DB for every app in the phone + a separate DB for each user so that we can save/index the decrypted DM chats, which are unsigned Nostr events.
string interning? i don't know what this means. you mean the pubkey index? that's in there. yes, the pubkeys are encoded as 5 byte long pointers to the pubkey index. IIRC. i remember doing something with this. there is actually a table in there that links pubkey/event/kind together that theoretically should enable index free adjacency (meaning it only iterates one table to find the event or pubkey you seek).
haven't implemented it in many places yet but tag queries for e and p tags use it. it's potentially able to add a graph query language to walk that table
i may be misremembering. just mustering the electric gnome to explain what it does, as my memory may be foggy, or maybe i got lied to about it. but i'm sure i did it
here it is nostr:npub1gcxzte5zlkncx26j68ez60fzkvtkm9e0vrwdcvsjakxf9mu9qewqlfnj5z : it's not interning them as 5 byte serial references in the event storage but there is a graph table (bidirectional) that lets you search that way. i should make a todo to have it switch out the pubkeys in events for the serials. here is the explanation of it and how it works:
PubkeyEventGraph (peg) - Lines 595-610
// PubkeyEventGraph creates the reverse edge: pubkey_serial -> event_serial with event kind and direction
// This enables querying all events related to a pubkey, optionally filtered by kind and direction
// Direction: 0=is-author, 2=p-tag-in (pubkey is referenced by event)
//
// 3 prefix|5 pubkey serial|2 kind|1 direction|5 event serial
var PubkeyEventGraph = next()
Key structure:
[3: "peg"][5: pubkey_serial][2: kind][1: direction][5: event_serial]
β β β β
Unique ID for Event kind Relationship Event ref
this pubkey (uint16) type (byte) (uint40)
Total: 16 bytes per edge
The Direction Byte
This is the key insightβit encodes the relationship type between pubkey and event:
| Direction | Meaning | Query Use Case |
|-----------|-----------|------------------------------------------------------|
| 0 | is-author | "Find all events this pubkey authored" |
| 2 | p-tag-in | "Find all events that mention/reference this pubkey" |
How It Works With EventPubkeyGraph (epg)
These are bidirectional edgesβtwo indexes that mirror each other:
EventPubkeyGraph (epg): event_serial β pubkey_serial
"Given an event, find related pubkeys"
PubkeyEventGraph (peg): pubkey_serial β event_serial
"Given a pubkey, find related events"
Example: If Alice (pubkey serial #42) posts event #1000 that mentions Bob (pubkey serial #99):
Indexes created:
epg entries (event β pubkey):
[epg][1000][42][kind 1][direction 0] β Alice is author
[epg][1000][99][kind 1][direction 1] β Bob is referenced (p-tag-out)
peg entries (pubkey β event):
[peg][42][kind 1][direction 0][1000] β Alice authored event 1000
[peg][99][kind 1][direction 2][1000] β Bob is mentioned in event 1000
Why Pubkey Serials Instead of Hashes?
Notice it uses pubkey_serial (5 bytes) not pubkey_hash (8 bytes). This requires two additional indexes:
// PubkeySerial: pubkey_hash β serial (lookup serial for a pubkey)
// 3 prefix|8 pubkey hash|5 serial
// SerialPubkey: serial β full 32-byte pubkey (reverse lookup)
// 3 prefix|5 serial -> 32 byte pubkey value
β
Insight βββββββββββββββββββββββββββββββββββββ
- Serials save space: 5 bytes vs 8 bytes per edge Γ millions of edges = significant savings
- Kind in the key enables efficient filtering: "Find all kind 1 events mentioning pubkey X" is a single range scan
- Direction ordering matters: [pubkey][kind][direction][event] means you can scan "all kind 3 events where X is author"
without touching "events mentioning X"
βββββββββββββββββββββββββββββββββββββββββββββββββ
Query Examples
"All events authored by pubkey X":
Start: [peg][X_serial][0x0000][0][0x0000000000]
End: [peg][X_serial][0xFFFF][0][0xFFFFFFFFFF]
β
direction=0 (is-author)
"All kind 1 events mentioning pubkey X":
Start: [peg][X_serial][0x0001][2][0x0000000000]
End: [peg][X_serial][0x0001][2][0xFFFFFFFFFF]
β β
kind=1 direction=2 (p-tag-in)
"All events where pubkey X is either author OR mentioned":
// Two range scans, union results:
Scan 1: [peg][X_serial][*][0][*] β authored
Scan 2: [peg][X_serial][*][2][*] β mentioned
This graph structure is designed for social graph traversalβfinding followers, mentions, interactionsβwithout decoding full
events.
events need to store their ID, but pubkeys both author and p tags can be replaced with the pubkey index.
yes, this structure enables full graph traversals, currently it just speeds up those kinds of normal tag searches only, i can add this, and the necessary migration for existing databases to upgrade to it. i am going to add it because it seems to me it would save a lot of space in the database, (which is already compact as binary - including eliding all escaping as it's binary), and since the pubkey table will likely be quite small, probably will live in memory mostly and rarely substantially impact query performance.
i'm doing this now. once it's done, graph-native queries could become a thing, with that bidirectional table, with extremely fast iterations after an initial seek.
good to know. ty sir.
#click2unfuck
we have a lot of unfucking to do, but we're gonna get there.
compared to my 7GB installation :P
claude is thoughtfully adding a proper LRU cache for the pubkey index as well so repeated lookups over a small cluster of events will stay in memory and not even call the database engine. i figured badger would probably do this to some extent automatically but probably an explicit LRU cache should be there for especially your use case but just to cut down the second iteration required for event fetches on at least the pubkeys/ptags
the ones for e tags will always require a second iteration of the event tables, which can actually mean iterating three separate tables, or maybe it's only two, as it has a "small event" table which inlines them into the key table avoiding a second (and usual) value table iteration to fetch the event data.
should be a good fit with adding WoT stuff to amethyst
i'm implementing the change to a compact event format that exploits the pubkey, p and e tags being references. it also has a "placeholder" code for events that refer to events not yet in the database, these store the whole event id.
idk what to do about sharing the access to the relay, i think android has some features that would allow you to bind to the websocket listener and plug it into an IPC interface for other apps to also use, would require you to stipulate that the battery permissions on the app be active, so it could be a standalone service. idk about other users, since that rarely would be needed concurrently it can just be a startup service, like how orbot and wireguard work.
i'd put the encryption side of it at the OS level, just encrypt the whole app specific storage in the user's profile. storing drafts i think you could then eliminate the encryption step for when you are using the already encrypted db file storage area
as for the DMs storage, probably better to just put that in a separate place inside the encrypted file storage where the database engine reads and writes, so make two subfolders for that.
Too late I am fucked !
It just means that there is only one place in memory for all strings (or byte arrays if you convert them). So, all if,pubkey,p,e tags with the same value just point to the same address of the string/bytearray. It's like a huge KV db just for each string inside every nostr event.
oh, i think that would turn out to be not so great as you think. i'd recommend instead using snappy or zstd on a low setting as a way to mitigate this. the database can enable this, in fact i think it already is. yes, it uses zstd at level 1 already.
btw, it creates a fulltext index for all words, it's not a reverse index though, and i hadn't thought to enable to disable it. but there is no way that even if i bound those words into some kind of lexicon and reference scheme, that i could come close to the compression or performance of ZSTD.
also, i think you WANT that fulltext search capability, and i'm reluctant to enable to disable it, even. but i could, it's a trivial switch and just a few bits of logic to disable index creation and reporting in the nip-11 that nip-50 (iirc) is not available.
i think with just the graph table and the author/p/e replaced with 6 bytes (flag and event/pubkey serial) will have a huge effect on the storage all by itself, even before the basic but fast compression scans blocks of events to deduplicate segments of it.
btw, the compression does not just merely compress single events, it compresses, i'm not sure... lemme see what claude says:
1. Badger LSM Tree Compression (Storage Layer)
Located in pkg/database/database.go:158-160:
opts.Compression = options.ZSTD
opts.ZSTDCompressionLevel = 1 // Fast compression (500+ MB/s)
β
Insight βββββββββββββββββββββββββββββββββββββ
- What it compresses: SST (Sorted String Table) files on disk β the actual database files
- When it runs: During compaction, when Badger merges and rewrites LSM tree levels
- Trade-off chosen: Level 1 prioritizes write throughput over compression ratio
βββββββββββββββββββββββββββββββββββββββββββββββββ
Available compression levels for Badger:
| Option | Description |
|----------------|--------------------------------|
| options.None | No compression |
| options.Snappy | Fast, moderate ratio (~2-4x) |
| options.ZSTD | Best ratio, configurable speed |
ZSTD Levels (when using options.ZSTD):
- Level 1 (current): ~500+ MB/s compression, ~4-6x ratio
- Level 3 (default): ~300 MB/s, ~5-7x ratio
- Level 9: ~50 MB/s, ~8-10x ratio
- Level 19: ~10 MB/s, maximum ratio
-----
ah yes, it has a 512mb, zstd level nine compressed in-memory hot cache with 5 minute TTL, also, that returns the already encoded JSON if the normalized filter matches the normalized filter in the cache. this has a potential 1.6gb of hot events ready to go without encoding, or any iteration of the database
there is also inlining of small events (under 1kb) in the key table (a unique feature of badger) which avoids a second iteration and is there for more common use cases where there is a lot of mutating of the values rather than, as we have with nostr, pretty much write once, maybe later delete.
yes, all that inlining is configurable, that's just what is default.
i've added the ability to set the zstd compression level, default 1, so to really save a lot of memory for the event storage in the disk, you can set it higher, you probably will find 9 is too slow, 2-5 is typical for reasonable performance. 1 is very fast, about equal to snappy in performance both compression and decompression.
so, it would be a toss-up between battery and disk usage then. ZSTD9 will be as slow as like 25-50mb/s and probably very high CPU
in damus ios we have a cache timeout for profiles, we will do this in notedeck as well eventually