unicode output

Hanno Schlichting

2008-09-01 16:16:23 UTC

Hi.

we feel that it would better to output unicode (always); this does
mean that we have to assume UTF-8 string encoding, and there's a
performance penalty (17.5X vs. 19X when using "-m benchmark").

Always outputting Unicode is fine with me.

But why do we need to assume utf-8 anywhere? The correct behavior in my
opinion is to use Unicode throughout. That means also variables or the
result of function calls inserted into templates are required to be
Unicode all the time.

That is at least the zope.tal/pagetemplate policy and the only one that
makes sense to me. Allowing any kind of encoded strings at any place
inside the machinery only causes a gazillion of problems. I think we
have seen quite many of those already in the current implementation.

The (View)PageTemplateFile's need to have the 'get encoding from file'
code then to read the XML header or meta content type stuff in the same
way as zope.pagetemplate.pagetemplatefile has it.

A non-file-based template should just assume or assert to get Unicode.

This would be in line with the common "encode/decode on i/o boundaries"
policy that seems to be wildly accepted. Besides file access there
should be no encode/decode boundaries in the template engine in my opinion.

This Unicode-only policy should make the codebase a lot simpler in many
cases and remove the need for the 'unicode_required_flag' setting as
well as many "if instance('text', unicode)" code paths in the rendered code.

Hanno

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "z3c.pt" group.
To post to this group, send email to ***@googlegroups.com
To unsubscribe from this group, send email to z3c_pt+***@googlegroups.com
For more options, visit this group at http://groups.google.com/group/z3c_pt?hl=en
-~----------~----~----~----~------~----~------~--~---