Webware encoding issues also known as “UnicodeDecodeError: 'ascii' codec can't decode”

November 12, 2008 – 02:07

Setup:

  1. You have html snippets with unicode.
  2. Your application is written in webware that some time ago was forked from main development
  3. You have to show this unicode text to user

Result:

Something like:

File "....app/Webware/WebKit/HTTPResponse.py", line 370, in rawResponse [edit]
     return {
File "..../app/Webware/WebKit/ASStreamOut.py", line 96, in buffer [edit]
     return ''.join(self._chunks)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 22:
ordinal not in range(128)

Or, something in line with this experience.

Solution that worked for us:

Change

        return ''.join(self._chunks)

to

        #This is dirty hack to correctly operate with utf8
        new_chunks = []
        for chunk in self._chunks:
            try:
                _chunk = chunk.encode('ascii', 'xmlcharrefreplace')
            except:
                import chardet
                _chunk = unicode(chunk, chardet.detect(chunk)("encoding"]).encode('utf-8')
            new_chunks.append(_chunk)
        self._chunks = new_chunks
        #End of dirty hack
        return ''.join(self._chunks)

It uses chardet library from feedparser project.