Webware encoding issues also known as “UnicodeDecodeError: 'ascii' codec can't decode”

November 12, 2008 – 2:07 am

Setup:

  1. You have html snippets with unicode.
  2. Your application is written in webware that some time ago was forked from main development
  3. You have to show this unicode text to user

Result:

Something like:

File "....app/Webware/WebKit/HTTPResponse.py", line 370, in rawResponse [edit]
     return {
File "..../app/Webware/WebKit/ASStreamOut.py", line 96, in buffer [edit]
     return ''.join(self._chunks)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 22:
ordinal not in range(128)

Or, something in line with this experience.

Solution that worked for us:

Change

        return ''.join(self._chunks)

to

        #This is dirty hack to correctly operate with utf8
        new_chunks = []
        for chunk in self._chunks:
            try:
                _chunk = chunk.encode('ascii', 'xmlcharrefreplace')
            except:
                import chardet
                _chunk = unicode(chunk, chardet.detect(chunk)("encoding"]).encode('utf-8')
            new_chunks.append(_chunk)
        self._chunks = new_chunks
        #End of dirty hack
        return ''.join(self._chunks)

It uses chardet library from feedparser project.

blog comments powered by Disqus
192.168.1.1Linksys Router SetupLinksys Default Password
192.168.1.1
192.168.1.1
MP4
192.168.1.1
192.168.1.1
Linksys Router
192 168 1 1
flush dns
MP4
MP4
Linksys Router Setup
spdif
192.168.1.1
192.168.1.1
Linksys Router
0x80070005 error