Add-on unicode paths

This page describes how to prevent common problems with non latin characters in Kodi or Add-on paths.

Unicode paths
If you want to write an add-on which is able to work with paths like 'd:\apps\éîäß\' or 'opt/Kodi/àí' at first you should read http://docs.python.org/2/howto/unicode.html#python-2-x-s-unicode-support

After reading, you should know: "Software (Python) should only work with unicode strings internally, converting to a particular encoding on output. (or input)". To make string literals unicode by default, add

at the top of the module. See https://docs.python.org/2/reference/simple_stmts.html#future for details.

Kodi outputs UTF-8 encoded strings. Input can be unicode or UTF-8 encoded, but there are rumors that some functions don't work with unicode input parameters.

Therefore the simplest way to deal with non-ASCII characters is to pass every parameter as UTF-8 encoded string to Kodi and to convert Kodi's UTF-8 output back to unicode.

Windows
Windows' NTFS is unicode aware but Windows still uses codepages like cp-850 for Western Europe.

If you use Python file functions with string parameters then internally the strings will be converted to the Windows codepage which means that you cannot access a file with greek characters from an english Windows. But if you pass unicodes to the file functions then everything will work as expected!

Linux
When locale is set to C or POSIX Python will assume file system is ascii only and try to encode all unicode inputs to ascii. In reality file system does not have a specific encoding and utf-8 is a much better guess. Because of this you must not pass unicode to Python file functions!

Instead always use UTF-8 encoded strings.

Conclusion
Since your add-on should work with all supported OS, use the following approach:

Addon path
The first path an add-on has to deal with is it's own add-on path: Kodi's getAddonInfo returns an UTF-8 encoded string and we decode it an unicode.

Browse dialog
dialog.browse returns an UTF-8 encoded string which perhaps contains some non latin characters. Therefore decode it to unicode!

Path joins
If path and filename are unicodes then everthing will work as expected. But what will happen if filename is an UTF-8 encoded string which contains "öäü.jpg"?

Python always uses unicodes to join a string with an unicode. Therefore Python will decode the string with it's default encoding (ascii). Due to the missing öäü within the ASCII codepage you'll get an unicode exception! That's the reason why you must explicitly convert the string to unicode!

Logging
"print" and xbmc.log does not support unicode. Always encode unicode strings to utf-8.

Alternatively, the following function can be used, where msg can be everything from string to unicode to class:

However, it's highly recommender to never mix byte strings and unicode strings in your program, in which case the 'if isinstanceof' is unnecessary.