Python on Windows – Unicode environment variables

Say you want to open a file picker dialog in the User’s profile root, or log to a file under AppData, or do anything else involving environment variables in Windows relating to file paths. You could use os.environ/os.getenv() for this however both these methods only return ASCII. If your user happens to have characters higher than codepoint 128 in their name (using some system codepage) then these methods will likely return a mangled approximation of the path. If the user has codepoints higher than 255 then it’ll just return question marks for the most part. Hence these paths:

"C:\Users\Rosnička"
"C:\Users\发涩"

Are returned as:

"C:\Users\Rosnicka"
"C:\Users\??"

Which is clearly unacceptable.

The function os.path.expanduser() suffers from the same problem since it uses environment variables internally.

Given that these paths have already been mangled in converting them to ASCII you can’t decode them using the system encoding (as you can do for some other file paths on windows using the sys.getsystemencoding() function as a second argument to unicode()).

The solution is to use ctypes to query the win32 API and get the actual unicode values of the environment variables. This function (cribbed from here) allows you to do this and returns a python-native unicode string.

def getEnvironmentVariable(name):
n= ctypes.windll.kernel32.GetEnvironmentVariableW(name, None, 0)
if n==0:
return None
buf= ctypes.create_unicode_buffer(u''*n)
ctypes.windll.kernel32.GetEnvironmentVariableW(name, buf, n)
return buf.value

Python 3.0 also solves this issue as it can query unicode environment variables as well.

Advertisements