Webware encoding issues also known as “UnicodeDecodeError: 'ascii' codec can't decode”
November 12, 2008 – 02:07Setup:
- You have html snippets with unicode.
- Your application is written in webware that some time ago was forked from main development
- You have to show this unicode text to user
Result:
Something like:
File "....app/Webware/WebKit/HTTPResponse.py", line 370, in rawResponse [edit]
return {
File "..../app/Webware/WebKit/ASStreamOut.py", line 96, in buffer [edit]
return ''.join(self._chunks)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 22:
ordinal not in range(128)
Or, something in line with this experience.
Solution that worked for us:
Change
return ''.join(self._chunks)
to
#This is dirty hack to correctly operate with utf8
new_chunks = []
for chunk in self._chunks:
try:
_chunk = chunk.encode('ascii', 'xmlcharrefreplace')
except:
import chardet
_chunk = unicode(chunk, chardet.detect(chunk)("encoding"]).encode('utf-8')
new_chunks.append(_chunk)
self._chunks = new_chunks
#End of dirty hack
return ''.join(self._chunks)
It uses chardet library from feedparser project.