someone's avatar
someone
npub1nlk8...jm9c
someone's avatar
someone 10 months ago
I think those scary AI movies are for a purpose. To make plebs stay away from AI technology and leave it to "evil corporations". Well, this may be a psy op! I think plebs need to play with AI training more.
someone's avatar
someone 10 months ago
Started fine tuning Gemma 3 using evolutionary approach. It is not the worst model according to AHA leaderboard and it is one of the smart according to lmarena.ai. My objective is to make it based, anti woke, wise, beneficial and then some. Several GPUs are fine tuning it at the same time, each using a different dataset and using QLoRA and the successful ones are merged later. Compared to LoRa this allows faster training and also reduced overfitting because the merge operation heals overfitting. The problem with this could be the 4 bit quantization may make models dumber. But I am not looking for sheer IQ. Too much mind is a problem anyway :) I also automated the dataset selection and benchmarking and converging to objectives (the fit function, the reward). It is basically trying to get higher score in AHA Leaderboard as fast as possible with a diverse set of organisms that "evolve by training". I want to release some cool stuff when I have the time: - how an answer to a single question changes over time, with each training round or day - a chart to show AHA alignment over training rounds
someone's avatar
someone 11 months ago
My 1 year of work summarized. TLDR: by carefully curating datasets we can fix misinformation in AI. Then we can that to measure misinformation in other AI. naddr1qvzqqqr4gupzp8lvwt2hnw42wu40nec7vw949ys4wgdvums0svs8yhktl8mhlpd3qq2k5v2fv344wa3sxpq45kj9xsu4x4e4fav9zg8fdwp
someone's avatar
someone 11 months ago
A UN influenced leaderboard. Notice google above average, deepseek in the middle, and meta and xai are below average. My leaderboard inversely correlated to this! Coincidence?
someone's avatar
someone 11 months ago
GRPO & EGO GRPO is a training algorithm introduced by R1. Why is it a big deal? It allowed models to reject themselves. A model outputs some words while trying to solve a math or coding problem. If it cannot solve, the next round it may try a longer reasoning. And while doing all of this at some point "Wait!" or "But," or "On the other hand" is randomly introduced in the reasoning words and that allows it to re-think its reasoning words and correct itself. Once these random appearances of reflection allows it to solve problems, the next round it wants to do more of that because it got rewards when it did that. Hence it gets smarter gradually thanks to self reflection. I think this is better than SFT because it fixes its own errors while SFT is primarily focusing on teaching new skills. Inverting the error is kind of "fixing the mistakes in itself" (GRPO method) and could be more effective than installing new ideas and hoping old ideas go away (SFT method). LLMs fixing their own errors allows them to self learn. This has analogies to human operation. Rejecting the ego is liberation from the shackles of ego, in this case the past words are kind of shackles but when it corrects itself it is "thinking outside the box". We find our mistakes and contemplate on them and learn from them and next time don't repeat. We f around and find out basically. F around is enjoying life recklessly, finding out is "divine scripts work most of the time and should have priority in decision making". Controlling the ego and getting outside of the box of ego is how we ascend.
someone's avatar
someone 11 months ago
QwQ 32B was published today and I already tested it for AHA Leaderboard. The results are not that good! It did better than its predecessor (Qwen 2.5) in fasting and nutrition but worse in domains like nostr, bitcoin and faith. Overall worse than previous. image LLMs are getting detached from humans. Y'all have been warned, lol.
someone's avatar
someone 11 months ago
😸 nos.lol upgraded to version 1.0.4 number of events 70+ million strfry db directory size: - before upgrade 183 GB - after upgrade 138 GB
someone's avatar
someone 11 months ago
RLNF: Reinforcement Learning from Nostr Feedback We ask a question to two different LLMs. We let nostriches vote which answer is better. We reuse the feedback in further fine tuning the LLM. We zap the nostriches. AI gets super wise. Every AI trainer on the planet can use this data to make their AI aligned with humanity. AHA succeeds. Thoughts?
someone's avatar
someone 11 months ago
In the video above some LLMs favored atheist, some favored the believer: image The ones on the left are also lower ranking in my leaderboard and the ones on the right are higher ranking. Coincidence? Does ranking high in faith mean ranking high in healthy living, nutrition, bitcoin and nostr on average? The leaderboard:
someone's avatar
someone 11 months ago
i think Elon wants an AI government. he is aware of the efficiencies it will bring. he is ready to remove the old system and its inefficiencies. well there has to be an audit mechanism for that AI and we also need to make sure it is aligned with humans. a fast LLM can only be audited by another fast LLM... ideas of an LLM can be check by things like AHA leaderboard... 🫡