{{ message }}
-
Beta Was this translation helpful? Give feedback.
Answered by
rasbt
Dec 28, 2023
Replies: 1 comment 2 replies
-
|
Good question. It should be equal to to the maximum context length, which is usually smaller than the vocabulary size. E.g., for GPT-2 that would be 1024 but for modern LLMs that usually somewhere above 2056. I think in the recent GPT-4 model it's >100k now. I will modify this using a separate parameter to make it more clear. E.g., token_embedding_layer = torch.nn.Embedding(vocab_size, output_dim)
pos_embedding_layer = torch.nn.Embedding(context_len, output_dim) |
Beta Was this translation helpful? Give feedback.
2 replies
Answer selected by
rasbt
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment

Good question. It should be equal to to the maximum context length, which is usually smaller than the vocabulary size. E.g., for GPT-2 that would be 1024 but for modern LLMs that usually somewhere above 2056. I think in the recent GPT-4 model it's >100k now.
I will modify this using a separate parameter to make it more clear. E.g.,