description: Assigns values to the items chosen for masking.
Assigns values to the items chosen for masking.
text.MaskValuesChooser(
vocab_size, mask_token, mask_token_rate=0.8, random_token_rate=0.1
)
MaskValuesChooser encapsulates the logic for deciding the value to assign
items that where chosen for masking. The following are the behavior in the
default implementation:
For mask_token_rate of the time, replace the item with the [MASK] token:
my dog is hairy -> my dog is [MASK]
For random_token_rate of the time, replace the item with a random word:
my dog is hairy -> my dog is apple
For 1 - mask_token_rate - random_token_rate of the time, keep the item
unchanged:
my dog is hairy -> my dog is hairy.
The default behavior is consistent with the methodology specified in
Masked LM and Masking Procedure described in BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
(https://arxiv.org/pdf/1810.04805.pdf).
Users may further customize this with behavior through subclassing and
overriding get_mask_values().
| `mask_token` | |
| `random_token_rate` | |
| `vocab_size` | |
get_mask_values(
masked_lm_ids
)
Get the values used for masking, random injection or no-op.
| Args | |
|---|---|
| `masked_lm_ids` | a `RaggedTensor` of n dimensions and dtype int32 or int64 whose values are the ids of items that have been selected for masking. |
