pos_embedding_layer question #11

nicolaleo · 2023-12-28T16:36:59Z

nicolaleo
Dec 28, 2023

Answered by rasbt

Dec 28, 2023

Good question. It should be equal to to the maximum context length, which is usually smaller than the vocabulary size. E.g., for GPT-2 that would be 1024 but for modern LLMs that usually somewhere above 2056. I think in the recent GPT-4 model it's >100k now.

I will modify this using a separate parameter to make it more clear. E.g.,

token_embedding_layer = torch.nn.Embedding(vocab_size, output_dim)
pos_embedding_layer = torch.nn.Embedding(context_len, output_dim)

View full answer

rasbt · 2023-12-28T16:56:54Z

rasbt
Dec 28, 2023
Maintainer

Good question. It should be equal to to the maximum context length, which is usually smaller than the vocabulary size. E.g., for GPT-2 that would be 1024 but for modern LLMs that usually somewhere above 2056. I think in the recent GPT-4 model it's >100k now.

I will modify this using a separate parameter to make it more clear. E.g.,

token_embedding_layer = torch.nn.Embedding(vocab_size, output_dim)
pos_embedding_layer = torch.nn.Embedding(context_len, output_dim)

2 replies

nicolaleo Dec 28, 2023
Author

exactly what I meant. Thanks and congratulations for the good work

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pos_embedding_layer question #11

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Sunbelt Computer Software

PL/B Language Development and Support

pos_embedding_layer question #11

Uh oh!

nicolaleo Dec 28, 2023

Replies: 1 comment · 2 replies

Uh oh!

rasbt Dec 28, 2023 Maintainer

Uh oh!

nicolaleo Dec 28, 2023 Author

Uh oh!

rasbt Dec 28, 2023 Maintainer

nicolaleo
Dec 28, 2023

Replies: 1 comment 2 replies

rasbt
Dec 28, 2023
Maintainer

nicolaleo Dec 28, 2023
Author

rasbt Dec 28, 2023
Maintainer