Thursday, August 21, 2008

Python Unicode exception

While using Python I encounter one common exception:

Traceback (most recent call last):

File "./unicode-test.py", line 564, in ?
main()
File "./unicode-test.py", line 553, in main
print "Value: %s" % unicodeString
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 87: ordinal not in range(128)

The reason is that we want to print Unicode object but it cannot be converted to appropriate characters automaticaly. Solution is easy, we have to convert the object before printing it:

print ("Value: %s" % unicodeString).encode('utf-8')

optionaly:
# visible char
print ("Value: %s" % unicodeString).encode('ascii', 'replace')
# zero char
print ("Value: %s" % unicodeString).encode('ascii', 'ignore')
# exception (default)
print ("Value: %s" % unicodeString).encode('ascii', 'strict')

Additional reading: source

Note: We have to convert the whole object after joining ascii with unicode. Otherwise Python would join the ascii part and utf8 part and convert the result to Unicode and the error appears again.

You can use unicodedata to normalize string:
import unicodedata
unicodedata.normalize('NFKD', 'p\xc5\x99\xc3\xadli\xc5\xa1 \xc5\xbelu\xc5\xa5ou\xc4\x8dk\xc3\xbd k\xc5\xaf\xc5\x88'.decode('utf-8')).encode('utf-8')

Source: Python doc

No comments: