LLMs are actually two files: One is a parameters file (e.g. 140GB for Llama2-70B, holding 70 billion parameters) and a run file.
These two files create a self-contained model that can run on a device like a MacBook for inference.
LLMs are like a 1TB zip file of the internet, compressing 10TB of text into 100GB of parameters.
They don’t store facts as they predict the next word based on patterns. Mind-blowing how this creates “knowledge”!
The largest LLMs are trained on text data equivalent to reading for 200,000 years straight.
To put that in perspective, if you started reading when the first modern humans left Africa, you'd just be finishing now.