What exactly is a Token? A must-learn introductory course to understanding AI 1. Large AI models cannot directly process the raw text we input. The first step in all content processing is to convert the text into tokens. 2. Simply put, a token is the smallest processing unit into which the text is broken down before being fed to the model. 3. A token can be an entire word, a part of a word, a punctuation mark, or even just a space. 4. Common words are usually broken down into only one token, while long or uncommon words are often broken down into smaller fragments. For example, the English word "encoding" is broken down into "encod" and "ing". 5. Here's a general conversion reference: one token roughly corresponds to four English characters, or three to four English words; however, this value is not fixed and will vary depending on the language. The differences between tokenizers and tokenizers are as follows: 6. The complete processing flow is as follows: First, the text is segmented and converted into tokens. Then, each token is mapped to a corresponding numeric ID. Next, the ID is converted into a vector that the model can recognize. Only after these three steps will the model officially begin processing your content. 7. The "context window," which everyone often hears about, is also measured in tokens—the upper limit of the window's tokens directly determines how much content the model can "remember" in a single dialogue. 8. Finally, and this is something everyone is definitely concerned about: Tokens are also the core unit of account for generative AI. The money we spend on AI is all settled based on the amount of tokens used. What's mentioned above is just the tip of the iceberg; the underlying logic behind tokens is far more interesting than you might imagine.
This article is machine translated
Show original

From Twitter
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments
Share
Relevant content





