can’t remember where i saw it but recently read a post that basically stated that given a specific dataset, all/any LLMs trained on it converged to the same set of “understanding” capabilities

Replies (1)