๐ช๐ต๐ฒ๐ฟ๐ฒ ๐ฑ๐ผ ๐๐๐ ๐ ๐๐๐๐ก๐ก๐ฎ ๐ฆ๐ผ๐๐ฟ๐ฐ๐ฒ ๐๐ต๐ฒ๐ถ๐ฟ ๐๐ฎ๐๐ฎ?
Thereโs a common myth going around that ChatGPT was trained on โ๐ฉ๐๐ ๐ฌ๐๐ค๐ก๐ ๐๐ฃ๐ฉ๐๐ง๐ฃ๐๐ฉโ. If you thought that, youโre not alone. This is a common misconception.
Itโs time we dispel this myth once and for all.โฌ๏ธ
The truth is, the amount of data that #LLMs are trained on is ๐ฉ๐๐ฃ๐ฎ - ๐ข๐ต ๐ญ๐ฆ๐ข๐ด๐ต ๐ช๐ฏ ๐ค๐ฐ๐ฎ๐ฑ๐ข๐ณ๐ช๐ด๐ฐ๐ฏ ๐ต๐ฐ ๐ต๐ฉ๐ฆ ๐ข๐ฎ๐ฐ๐ถ๐ฏ๐ต ๐ฐ๐ง ๐ข๐ท๐ข๐ช๐ญ๐ข๐ฃ๐ญ๐ฆ ๐ฅ๐ข๐ต๐ข ๐ฐ๐ถ๐ต ๐ต๐ฉ๐ฆ๐ณ๐ฆ. ChatGPT, for example, was trained on less than 0.000000001% of the internet, according to most internet size estimates.
For perspective, if all the data on the internet was represented by ๐๐ต๐ฒ ๐ฒ๐ป๐๐ถ๐ฟ๐ฒ ๐๐๐ฟ๐ณ๐ฎ๐ฐ๐ฒ ๐ผ๐ณ ๐๐ต๐ฒ ๐๐ฎ๐ฟ๐๐ต, then ๐ข๐ญ๐ญ of ChatGPTโs data would only be represented by about 478 square centimeters (or about 74 square inches), or approximately the area taken up by ๐ฎ ๐๐๐ฝ๐ถ๐ฐ๐ฎ๐น ๐ฑ๐ถ๐ป๐ป๐ฒ๐ฟ ๐ฝ๐น๐ฎ๐๐ฒ.
๐๐๐ฎ ๐๐จ ๐ฉ๐๐๐ฉ ๐จ๐ค?
Itโs because most of the data out there is not in a useful format for training a language model. In fact, you can think of data like untapped, raw materials: it has to be cleaned and refined, before it can be used.
Then how can LLMs respond to questions as well as they do?
To answer this, itโs important to understand that Large Language Models are really just sophisticated probability machines. They are trained on the relationship between words and sentences. What they produce is a *probability* that one word will follow after another. ๐๐๐๐ฃ๐ ๐ค๐ ๐ฉ๐๐๐ข ๐๐จ ๐ข๐ช๐๐ ๐ข๐ค๐ง๐ ๐๐๐ฅ๐๐๐ก๐ ๐ซ๐๐ง๐จ๐๐ค๐ฃ๐จ ๐ค๐ ๐ฅ๐ง๐๐๐๐๐ฉ๐๐ซ๐ ๐ฉ๐๐ญ๐ฉ ๐ค๐ฃ ๐ฎ๐ค๐ช๐ง ๐ฅ๐๐ค๐ฃ๐.
How can probability machines do so much with so little? How can they make any sense of the ๐ฆ๐น๐ข๐ฃ๐บ๐ต๐ฆ๐ด of cat videos, fake news, podcasts, articles, NSFW content, social media posts, music, app downloads, and more? The answer: ๐ฉ๐ถ๐ฎ๐ข๐ฏ๐ด.
๐๐๐บ๐ฎ๐ป๐ ๐ฎ๐ฟ๐ฒ ๐ฒ๐๐๐ฒ๐ป๐๐ถ๐ฎ๐น ๐ณ๐ผ๐ฟ ๐๐ฒ๐ฝ๐ฎ๐ฟ๐ฎ๐๐ถ๐ป๐ด ๐๐ต๐ฒ ๐๐ถ๐ด๐ป๐ฎ๐น ๐ณ๐ฟ๐ผ๐บ ๐๐ต๐ฒ ๐ป๐ผ๐ถ๐๐ฒ. Which touches on another myth: that #AI will replace humans in their work. But thatโs for next time. ๐
Did this help you understand AI and LLMs better? Give it ๐๐ถ๐ธ๐ฒ๐ค
Know anyone with this misconception? ๐ฆ๐ต๐ฎ๐ฟ๐ฒ๐ it with them.
Have AI-related questions for me? Drop them in the ๐ฐ๐ผ๐บ๐บ๐ฒ๐ป๐๐๐
The truth is, the amount of data that #LLMs are trained on is ๐ฉ๐๐ฃ๐ฎ - ๐ข๐ต ๐ญ๐ฆ๐ข๐ด๐ต ๐ช๐ฏ ๐ค๐ฐ๐ฎ๐ฑ๐ข๐ณ๐ช๐ด๐ฐ๐ฏ ๐ต๐ฐ ๐ต๐ฉ๐ฆ ๐ข๐ฎ๐ฐ๐ถ๐ฏ๐ต ๐ฐ๐ง ๐ข๐ท๐ข๐ช๐ญ๐ข๐ฃ๐ญ๐ฆ ๐ฅ๐ข๐ต๐ข ๐ฐ๐ถ๐ต ๐ต๐ฉ๐ฆ๐ณ๐ฆ. ChatGPT, for example, was trained on less than 0.000000001% of the internet, according to most internet size estimates.
For perspective, if all the data on the internet was represented by ๐๐ต๐ฒ ๐ฒ๐ป๐๐ถ๐ฟ๐ฒ ๐๐๐ฟ๐ณ๐ฎ๐ฐ๐ฒ ๐ผ๐ณ ๐๐ต๐ฒ ๐๐ฎ๐ฟ๐๐ต, then ๐ข๐ญ๐ญ of ChatGPTโs data would only be represented by about 478 square centimeters (or about 74 square inches), or approximately the area taken up by ๐ฎ ๐๐๐ฝ๐ถ๐ฐ๐ฎ๐น ๐ฑ๐ถ๐ป๐ป๐ฒ๐ฟ ๐ฝ๐น๐ฎ๐๐ฒ.
๐๐๐ฎ ๐๐จ ๐ฉ๐๐๐ฉ ๐จ๐ค?
Itโs because most of the data out there is not in a useful format for training a language model. In fact, you can think of data like untapped, raw materials: it has to be cleaned and refined, before it can be used.
Then how can LLMs respond to questions as well as they do?
To answer this, itโs important to understand that Large Language Models are really just sophisticated probability machines. They are trained on the relationship between words and sentences. What they produce is a *probability* that one word will follow after another. ๐๐๐๐ฃ๐ ๐ค๐ ๐ฉ๐๐๐ข ๐๐จ ๐ข๐ช๐๐ ๐ข๐ค๐ง๐ ๐๐๐ฅ๐๐๐ก๐ ๐ซ๐๐ง๐จ๐๐ค๐ฃ๐จ ๐ค๐ ๐ฅ๐ง๐๐๐๐๐ฉ๐๐ซ๐ ๐ฉ๐๐ญ๐ฉ ๐ค๐ฃ ๐ฎ๐ค๐ช๐ง ๐ฅ๐๐ค๐ฃ๐.
How can probability machines do so much with so little? How can they make any sense of the ๐ฆ๐น๐ข๐ฃ๐บ๐ต๐ฆ๐ด of cat videos, fake news, podcasts, articles, NSFW content, social media posts, music, app downloads, and more? The answer: ๐ฉ๐ถ๐ฎ๐ข๐ฏ๐ด.
๐๐๐บ๐ฎ๐ป๐ ๐ฎ๐ฟ๐ฒ ๐ฒ๐๐๐ฒ๐ป๐๐ถ๐ฎ๐น ๐ณ๐ผ๐ฟ ๐๐ฒ๐ฝ๐ฎ๐ฟ๐ฎ๐๐ถ๐ป๐ด ๐๐ต๐ฒ ๐๐ถ๐ด๐ป๐ฎ๐น ๐ณ๐ฟ๐ผ๐บ ๐๐ต๐ฒ ๐ป๐ผ๐ถ๐๐ฒ. Which touches on another myth: that #AI will replace humans in their work. But thatโs for next time. ๐
Did this help you understand AI and LLMs better? Give it ๐๐ถ๐ธ๐ฒ๐ค
Know anyone with this misconception? ๐ฆ๐ต๐ฎ๐ฟ๐ฒ๐ it with them.
Have AI-related questions for me? Drop them in the ๐ฐ๐ผ๐บ๐บ๐ฒ๐ป๐๐๐
๐๐ป๐๐ถ๐ด๐ต๐ #๐ญ: ๐ฆ๐ฐ๐ฎ๐ฟ๐ฐ๐ถ๐๐ ๐ฎ๐ป๐ฑ ๐ ๐ผ๐ป๐ฒ๐
Scarcity forces individuals and societies to make difficult decisions about allocating limited resources, which shapes behaviors as people navigate trade-offs. Money is an indispensable tool when making those decisions and trade-offs, while also enabling exchange when thereโs a lack of mutual trust.
๐๐ป๐๐ถ๐ด๐ต๐ #๐ฎ: ๐ง๐ต๐ฒ ๐๐ฎ๐ป๐ด๐ฒ๐ฟ๐ ๐ผ๐ณ ๐๐ฎ๐๐ ๐ ๐ผ๐ป๐ฒ๐
While increasing the money supply can stimulate growth in the short term, itโs always a form of debt taken from the future, and one day it will come due. Under fiat currency, debt often gets out of control, increasing poverty and sacrificing future prosperity for present desires.
๐๐ป๐๐ถ๐ด๐ต๐ #๐ฏ: ๐๐ฒ๐ฐ๐ฒ๐ป๐๐ฟ๐ฎ๐น๐ถ๐๐ฎ๐๐ถ๐ผ๐ป ๐ฎ๐ป๐ฑ ๐ฆ๐ฒ๐น๐ณ-๐ฆ๐ผ๐๐ฒ๐ฟ๐ฒ๐ถ๐ด๐ป๐๐
Bitcoinโs transparency and consensus mechanisms make it a trustless system without centralized power. By distributing authority across a network, Bitcoinโs decentralization enables self-sovereignty, resilience, and prosperity unmatched by its centralized counterparts.
๐๐ป๐๐ถ๐ด๐ต๐ #๐ฐ: ๐๐ถ๐๐ฐ๐ผ๐ถ๐ป'๐ ๐๐ป๐ต๐ฒ๐ฟ๐ฒ๐ป๐ ๐ฉ๐ฎ๐น๐๐ฒ
Far from being purely speculative, Bitcoin's value stems from its scarcity, utility, network effects, and role as an escape from the fiat system. Every day, more are discovering that Bitcoin can be relied upon to always be the best form of money.
๐๐ป๐๐ถ๐ด๐ต๐ #๐ฑ: ๐ง๐ต๐ฒ ๐๐ถ๐๐ฐ๐ผ๐ถ๐ป ๐ฃ๐ต๐ถ๐น๐ผ๐๐ผ๐ฝ๐ต๐
Bitcoinโs purpose was not so early adopters could profit from new users pushing the price higher. Those who hold it longer will naturally do better financially, but ๐๐ถ๐๐ฐ๐ผ๐ถ๐ป ๐๐ฎ๐ ๐ฐ๐ฟ๐ฒ๐ฎ๐๐ฒ๐ฑ ๐๐ผ ๐ณ๐ฟ๐ฒ๐ฒ ๐ต๐๐บ๐ฎ๐ป๐ถ๐๐ ๐ณ๐ฟ๐ผ๐บ ๐๐ต๐ฒ ๐๐ต๐ฎ๐ฐ๐ธ๐น๐ฒ๐ ๐ผ๐ณ ๐ณ๐ถ๐ฎ๐, using the only corruption-resistant form of money.
This educational workbook provides a comprehensive introduction to Bitcoin that anyone can understand. Each chapter builds on the last with interactive exercises, real-world examples, and important insights into how Bitcoin can empower individuals.
Make sure you follow
Satoshi followed that opening statement with a short explanation of Bitcoin's technical properties, and a link to the Bitcoin Whitepaper.
Most of the cypherpunks on the mailing list were able to understand it well enough, but itโs so technical, often even long-time bitcoiners have difficulty following what it says.
Fortunately, I'm here to break it down for you, section by section.
And what better way to celebrate Whitepaper Day, than with my easy-to-read summary of the #Bitcoin Whitepaper? I mean, youโre not *seriously* going to spend your scarce time in a costume, going door-to-door asking for fiat treats, are you? ๐
I didnโt think so.
Letโs get started๐งต๐