text/docs/api_docs/python/text/find_source_offsets.md at master · tensorflow/text · GitHub
Skip to content

Latest commit

 

History

History
97 lines (79 loc) · 2.68 KB

File metadata and controls

97 lines (79 loc) · 2.68 KB

description: Maps the input post-normalized string offsets to pre-normalized offsets.

text.find_source_offsets

View source

Maps the input post-normalized string offsets to pre-normalized offsets.

text.find_source_offsets(
    offsets_map, input_offsets, name=None
)

Returns the source (i.e. pre-normalized) string offsets mapped from the input post-normalized string offsets using the input offsets_map, which is an output from the normalize_utf8_with_offsets_map op. offsets_map can be indexed or sliced along with the input_offsets.

Examples:

>>> # input: <string>[num_strings]
>>> post_normalized_str, offsets_map = normalize_utf8_with_offsets_map(
...     ["株式会社", "KADOKAWA"])
>>> # input: <variant>[num_strings], <int64>[num_strings, num_offsets]
>>> find_source_offsets(offsets_map, [[0, 1, 2], [0, 1, 2]])
>>> # output: <int64>[num_strings, num_offsets]
<tf.Tensor: shape=(2, 3), dtype=int64, numpy=array([[0, 1, 2], [0, 3, 6]])>
>>> # Offsets map can be indexed.
>>> find_source_offsets(offsets_map[1], [[0, 1, 2]])
<tf.Tensor: shape=(1, 3), dtype=int64, numpy=array([[0, 3, 6]])>

Args

`offsets_map` A `Tensor` or `RaggedTensor` of type `variant`, used to map the post-normalized string offsets to pre-normalized string offsets. offsets_map is an output from `normalize_utf8_with_offsets_map` function.
`input_offsets` A `Tensor` or `RaggedTensor` of type int64 representing the the post-normalized string offsets,
`name` The name for this op (optional).

Returns

`results` A `Tensor` or `RaggedTensor` of type int64, with pre-normalized string offsets.