text/docs/api_docs/python/text/normalize_utf8.md at master · tensorflow/text · GitHub
Skip to content

Latest commit

 

History

History
83 lines (67 loc) · 1.99 KB

File metadata and controls

83 lines (67 loc) · 1.99 KB

description: Normalizes each UTF-8 string in the input tensor using the specified rule.

text.normalize_utf8

View source

Normalizes each UTF-8 string in the input tensor using the specified rule.

text.normalize_utf8(
    input, normalization_form='NFKC', name=None
)

See http://unicode.org/reports/tr15/

Examples:

>>> # input: <string>[num_strings]
>>> normalize_utf8(["株式会社", "KADOKAWA"])
>>> # output: <string>[num_strings]
<tf.Tensor: shape=(2,), dtype=string, numpy=
array([b'\xe6\xa0\xaa\xe5\xbc\x8f\xe4\xbc\x9a\xe7\xa4\xbe', b'KADOKAWA'],
      dtype=object)>

Args

`input` A `Tensor` or `RaggedTensor` of type string. (Must be UTF-8.)
`normalization_form` One of the following string values ('NFC', 'NFKC', 'NFD', 'NFKD'). Default is 'NFKC'.
`name` The name for this op (optional).

Returns

A `Tensor` or `RaggedTensor` of type string, with normalized contents.