Replies (30)
This is pretty insane
This is what I came to say. LLMs parrot the most likely answer based on the data set they have been trained on.
Interesting, how do you prepare the data?
Should be usable. But next versions on the same repo will be better.
Data is coming from kinds 1 and 30023. The biggest filter is web of trust.
Can you write the exact question?
It's always curious when that which is created rules out creation as a possible source in other contexts...
What do you mean?
Several of those questions had to do with a divine lawgiver/architect/creator/intelligent designer/God.
The first A.I. source, itself a created thing, often rules out a creator as an explanation for the existence of other things, which I find interesting 😏
Did you use a system prompt in these examples?
Isn’t it better to use an uncensored base model for the training? Will you opensource the dataset?
What is your method & tool for fine-tuning this model(s)?
I've been desiring to train some LLM's on specific datasets and seeking a method(s)/tool(s) to do so best fit for me
Second question; what is your dataset structure? I understand kind 1 & other events, but how is it structured when feeding the LLM? Just JSON? Anything else I'm missing to train & fine-tune my own LLM?
If you don’t mind me giving you a suggestion. An easy way to get started is by using Unsloth’s Google Colab notebooks. Just by inspecting the code of some of their many notebooks you can get a solid starting point about the fine-tunneling steps, including the dataset formats.

Unsloth - Train and Run Models Locally
Unsloth - Train and Run Models Locally
Unsloth is an open-source, no-code web UI for training, running and exporting open models in one unified local interface.
Thank you I'll give this a test
I see this is for smaller models. Can I use this as well for ~100B parameter LLM's?
Would prefer to do locally if I can; I do have access to hardware to do this
Yes, you can. These notebooks use smaller models only to take advantage of the Tesla T4 (free tier). You can mod the notebook and use it locally. You can use their bigger models or any other that you want when you feel more comfortable with the different model templates.

Unsloth Model Catalog | Unsloth Documentation
Thank you for your follow up answers; much appreciated🦾
My thought exactly. This made me question whether I want to stay on nostr... Wouldn't want this to happen to me.
Maybe there are only conspiracy nuts in his WoT?
Download all the notes.
Take the "content" field from the notes and change the name to "text":
Previously:
{"id":".....................", "pubkey": ".................", "content": "gm, pv, bitcoin fixes this!", .......}
{"id":".....................", "pubkey": ".................", "content": "second note", .......}
Converted into jsonl file:
{"text": "gm, pv, bitcoin fixes this!" }
{"text": "second note" }
Used Unsloth and ms-swift to train. Unsloth needed to convert from base to instruct. This is a little advanced. If you don't want to do that and just start with instruct model, you can use ms-swift or llama-factory.
You will do lora, pretraining. I used 32 as lora rank but you can choose another number.
Excellent I figured that was structure. Thank you for the detailed information
I don't really mean that I will leave nostr due to something like this. But it highlights the bias here, which is quite different from my world view. Or the bias of the authors WoT...
😅
i guess all those gm and pv make a positive impact 😆
No I did not add faith bias.
I started with the Llama 3.1 Base!
The dataset is on relays, most relays should allow downloading ?
Yes.
my wot starts with a few guys plus me with highest scores and who they follow gets lower score, who they follow gets lower etc recursively. simply math, nothing complicated.
Oh, I see. By dataset I was thinking of the [WoT filtered] raw data after cleaning/curation and post-processing.