someone - Nostr Hypermedia

Mistral Small 3.1 numbers are in. It is interesting Mistral always lands in the middle. https://sheet.zoho.com/sheet/open/mz41j09cc640a29ba47729fed784a263c1d08?sheetid=0&range=A1 I started to do the comparison with 2 models now. In the past Llama 3.1 70B Q4 was the one doing the comparison of answers. Now I am using Gemma 3 27B Q8 as well to have a second opinion on it. Gemma 3 produces very similar measurement to Llama 3.1. So the end result is not going to shake much.

someone 10 months ago

I think those scary AI movies are for a purpose. To make plebs stay away from AI technology and leave it to "evil corporations". Well, this may be a psy op! I think plebs need to play with AI training more.

someone 10 months ago

Started fine tuning Gemma 3 using evolutionary approach. It is not the worst model according to AHA leaderboard and it is one of the smart according to lmarena.ai. My objective is to make it based, anti woke, wise, beneficial and then some. Several GPUs are fine tuning it at the same time, each using a different dataset and using QLoRA and the successful ones are merged later. Compared to LoRa this allows faster training and also reduced overfitting because the merge operation heals overfitting. The problem with this could be the 4 bit quantization may make models dumber. But I am not looking for sheer IQ. Too much mind is a problem anyway :) I also automated the dataset selection and benchmarking and converging to objectives (the fit function, the reward). It is basically trying to get higher score in AHA Leaderboard as fast as possible with a diverse set of organisms that "evolve by training". I want to release some cool stuff when I have the time: - how an answer to a single question changes over time, with each training round or day - a chart to show AHA alignment over training rounds

someone 11 months ago

My 1 year of work summarized. TLDR: by carefully curating datasets we can fix misinformation in AI. Then we can that to measure misinformation in other AI. naddr1qvzqqqr4gupzp8lvwt2hnw42wu40nec7vw949ys4wgdvums0svs8yhktl8mhlpd3qq2k5v2fv344wa3sxpq45kj9xsu4x4e4fav9zg8fdwp

someone 11 months ago

A UN influenced leaderboard.

Ai Worldview Benchmark

Notice google above average, deepseek in the middle, and meta and xai are below average. My leaderboard inversely correlated to this! Coincidence?

someone 11 months ago

Benchmarked Gemma 3 today. It has better knowledge compared to 2 but still in the median area in the leaderboard.

On Chatbot Arena it is 9th:

Arena | Benchmark & Compare the Best AI Models

Chat with multiple AI models side-by-side. Compare ChatGPT, Claude, Gemini, and other top LLMs. Crowdsourced benchmarks and leaderboards.

. Maybe I can start fine tuning this model. It is not too bad.

someone 11 months ago

GRPO & EGO GRPO is a training algorithm introduced by R1. Why is it a big deal? It allowed models to reject themselves. A model outputs some words while trying to solve a math or coding problem. If it cannot solve, the next round it may try a longer reasoning. And while doing all of this at some point "Wait!" or "But," or "On the other hand" is randomly introduced in the reasoning words and that allows it to re-think its reasoning words and correct itself. Once these random appearances of reflection allows it to solve problems, the next round it wants to do more of that because it got rewards when it did that. Hence it gets smarter gradually thanks to self reflection. I think this is better than SFT because it fixes its own errors while SFT is primarily focusing on teaching new skills. Inverting the error is kind of "fixing the mistakes in itself" (GRPO method) and could be more effective than installing new ideas and hoping old ideas go away (SFT method). LLMs fixing their own errors allows them to self learn. This has analogies to human operation. Rejecting the ego is liberation from the shackles of ego, in this case the past words are kind of shackles but when it corrects itself it is "thinking outside the box". We find our mistakes and contemplate on them and learn from them and next time don't repeat. We f around and find out basically. F around is enjoying life recklessly, finding out is "divine scripts work most of the time and should have priority in decision making". Controlling the ego and getting outside of the box of ego is how we ascend.

someone 11 months ago

QwQ 32B was published today and I already tested it for AHA Leaderboard. The results are not that good! It did better than its predecessor (Qwen 2.5) in fasting and nutrition but worse in domains like nostr, bitcoin and faith. Overall worse than previous.

LLMs are getting detached from humans. Y'all have been warned, lol.

someone 11 months ago

😸 nos.lol upgraded to version 1.0.4 number of events 70+ million strfry db directory size: - before upgrade 183 GB - after upgrade 138 GB

someone 11 months ago

trying @YakiHonne looking clean and fast

someone 11 months ago

Ways to Align AI with Human Values

Habla

Ways to Align AI with Human Values - someone

someone 11 months ago

elon wanted to control his AI's reasoning https://www.reddit.com/r/LocalLLaMA/comments/1iwb5nu/groks_think_mode_leaks_system_prompt/

someone 11 months ago

RLNF: Reinforcement Learning from Nostr Feedback We ask a question to two different LLMs. We let nostriches vote which answer is better. We reuse the feedback in further fine tuning the LLM. We zap the nostriches. AI gets super wise. Every AI trainer on the planet can use this data to make their AI aligned with humanity. AHA succeeds. Thoughts?

someone 11 months ago

In the video above some LLMs favored atheist, some favored the believer:

The ones on the left are also lower ranking in my leaderboard and the ones on the right are higher ranking. Coincidence? Does ranking high in faith mean ranking high in healthy living, nutrition, bitcoin and nostr on average? The leaderboard:

Zoho Sheet

someone 11 months ago

i think Elon wants an AI government. he is aware of the efficiencies it will bring. he is ready to remove the old system and its inefficiencies. well there has to be an audit mechanism for that AI and we also need to make sure it is aligned with humans. a fast LLM can only be audited by another fast LLM... ideas of an LLM can be check by things like AHA leaderboard... 🫡