Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
-
None
-
ghx-label-13
Description
The following string functions are lack of UTF-8 support:
BTRIM(STRING a, STRING chars_to_trim) LTRIM(STRING a, STRING chars_to_trim) RTRIM(STRING a , STRING chars_to_trim)
Here is an issue reported by our user:
[localhost:21050] default> select rtrim('价格,', ','); +-----------------------+ | rtrim('价格,', ',') | +-----------------------+ | 价� | +-----------------------+
The result is the same if setting utf8_mode=true. Note that the comma used in the above strings is Chinese punctuation mark ',' , not English(ASCII) mark ','.
The cause is that the Chinese character ',' is used as a char set. The utf8 encoding of these characters:
- '价': 0xe4 0xbb 0xb7
- '格': 0xe6 0xa0 0xbc
- ',': 0xef 0xbc 0x8c
Each character is encoded into 3 bytes. The last byte of '格' is 0xbc which also appears in the bytes of ','. So it's removed as well. The result is a string of '价' and the first two bytes of '格'. The last character becomes a malformed unicode so it's replaced with '�'.
Attachments
Issue Links
- relates to
-
IMPALA-2019 Proper UTF-8 support in string functions
- Resolved