Publishing AI evals to nostr as kind=39379. AHA leaderboard 2026 now reading results from nostr.
https://aha-leaderboard.shakespeare.wtf/2026
WoV soon?
Web of Vibes: how much each AI likes other AI's vibes/ideas/mental model. AI dating on Nostr! Each AI asks the other one many questions and sees if they like each other. π
someone
npub1nlk8...jm9c
Been fine tuning this model for months. Publishing today:
It has achieved AHA=67 score.
etemiz/Ostrich-32B-Qwen3-260217-GGUF Β· Hugging Face
Weβre on a journey to advance and democratize artificial intelligence through open source and open science.
https://aha-leaderboard.shakespeare.wtf/
Kimi knows how to do UI


Started posting nudity reports to nostr .mom from @Ostrich-70 . Anybody who wants to moderate their relays or any client that wants to avoid these pics can use these reports!
Already started to have some impact in moderation on nostr .mom
The whole thing was vibe coded.
Todo:
- more fine tuning of parameters
- checking videos
- better models, more precision in the future
- posting to more relays
- reading from more relays
asi will still need human intuition and dreams because it doesnt have that skill.
one could clean his pineal gland to be part of this new "gig economy".
i should reduce coffee, its not helping with pineal detox!
using for health related questions.
works really well.
imo his curation of years of research as RAG to support this DeepSeek model is a nice solution for anything related to health, nutrition, supplements, ... He went RAG route and it brought more truth into the equation..
well done Mike Adams!
@HealthRanger
Brighteon.AI - Enoch Wellness Coach
- vibe coded a nsfw checker bot using OpenCode, Kimi K2.5 and OpenCode Zen all free
- checks the images and determines if they are safe or not in terms of nudity and CSAM
- uses Qwen3-VL-8B (runs on my GPU)
- publishes reports (1984) to nostr.mom
- right now it is a fresh npub but i will soon post via @Ostrich-70 which has higher WoT
is anybody doing NSFW checks for nostr content? are they willing to post the results to Nostr network (like as in 1984)?
i remember @semisol did this but it seems to have stopped.
otherwise i am going to do soon and post to relays. will apply findings to my relays as rate limits.
Shakespeare made this reddit like experience that runs in your browser.
reads the notes that were sent to relays in the last hour.
categorizes each note using an llm.
shows as a reddit like experience.
congrats @Derek Ross its really good!
total cost: $1.5
vibe coding time: 20 minutes
vibe coding LLM: kimi k2.5 on openrouter
it will need a openrouter api key to run. i am sure it could be done using webgpu, using cpu even.
prompt was:
a reddit like experience for nostr. the code you will write will run in a browser. you may use javascript or any browser language.
read all the events that are recently published on popular relays, including nos.lol and nostr.mom. like published in the last hour.
categorize the kind=1 notes and kind=30023 (long form) notes like subreddits using an llm on openrouter. i will provide api key. each note is read by a cheap llm and then keywords are found.
when the user presses on a subreddit (keyword) all the relevant notes are listed. sorted by their likes + reposts. notes liked or reporsted more should appear on top. reposts are like retweets of twitter.
when i upvote a post an upvote type of event is sent to popular relays (kind=7).
pushed code to nostrhub:

reads the notes that were sent to relays in the last hour.
categorizes each note using an llm.
shows as a reddit like experience.
congrats @Derek Ross its really good!

Shakespeare - Open Source AI Builder
Build custom apps with AI assistance using Shakespeare, an open-source development environment

NostrHub
NostrHub | Discover and Publish NIPs
Explore official NIPs and publish your own custom NIPs on NostrHub.
Does faith training LLMs make them more safe? Like "you will be judged based on your actions" π
With all these agentic coding and clawdbots and so many trust given to LLMs, who is doing the safety benchmarks?
I've been maintaining the AHA leaderboard for a while:
View article β
Working on v2 of it but I want to get input from nostriches. Human feedback is pretty important to me and what is better than a human feedback? Feedback from a collection of curated people! I think nostr IS the curated people.
People have conscience, discernment, gut feeling, ... and are terrible at writing long articles. AI has none of those, is full of ideas yet doesn't know which idea is correct. You can make it defend any idea you want (if it is not censored). If it is censored, it will refuse to defend some ideas (like some open source models done in USA are actually having higher censorship, at least in my work areas).
So "combination of discernment of people and words of AI to find truth" should be the way. Real curated people should benchmark AI. Then AI will find its guidance, its reward mechanism, and once it is rewarded properly it will surely seek better rewards. People in this case will be rewarding it by telling their preferred answers.
Example generated by AI:
Was the moon landing in 1969 fake?
- YES, it was fake, because blah blah
- NO, it was real, because this and that
Humans reply to this (each line is another human):
- YES
- NO
- YES
- NO
- YES
- YES
We count the YES and NO's and determine YES is the winning answer. Now we can build a leaderboard that depends on this mechanism. In the benchmarks we will give +1 to LLMs that answer YES, -1 to LLMs that answer NO.
AI-Human Alignment (AHA) is possible this way.
Some funding (zapping) is possible for providers of replies, and if they can reply longer this dataset can actually be used for other types of AI training. But that is the next goal. Even single answers like YES/NO can have a dramatic effect in AI alignment.
Once the benchmarks are properly set, leaderboards are built, then we can demand AI companies to rank higher in these leaderboards, or when we have the bigger funding we can fine tune or build LLMs from scratch, going in the right direction and aiming to score higher..
Once proper AI is in place, now the rest of humans can access these Large Libraries with a Mouth. Homeschooling kids can talk to a proper LLM. People who may not have discernment skills can find proper answers...
I am offering you to edit the bad ideas in LLMs! This is a huge service to humanity imo. Who is in?
how do you "inject intuition" in reasoning process of an AI?
- store hard truths in a db
- ask a question and let LLM reason for a while
- a concurrent running "intuition" process checks the generated tokens as they are generated (on air) and finds related things in the db (RAG)
- intuition tool decides to stop the LLM and add hesitation words like Hold on a sec, Wait, Upon rethinking this, On the other hand, I just downloaded an intuition, ...
- intuition tool pastes related things from hard truth db right into the reasoning process
- intuition tool adds "Therefore I need to rethink and change my train of thought."
- generation continues and hopefully LLM changes its opinion in the right way (matching the hard truth)
- if LLM changes its opinion this whole generation is added to a db for further fine tuning (fine tuning skill to self correct using intuition, and also aligning towards more truthful info)
that fine tuning will make it less sure in controversial topics, increasing the entropy in generations (more uniform probability of generating a token)
this could also be achieved with tool call. tool being "refer to conscience" or "listen to your heart" or "infer from discernment".
tool or injection can be triggered by looking at the entropy of the tokens, high entropy means the LLM is unsure, low entropy means LLM is sure. but i am not yet sure about when to do the injection. when LLM is sure and wrong it could be dangerous. but there may be situations where it is sure and correct.